FDP

In recent years, large-scale behavioral cloning has emerged as a promising paradigm for training general-purpose robot policies. However, effectively fitting policies to complex task distributions is often challenging, and existing models often underfit the action distribution. In this paper, we present a novel modular diffusion policy framework that factorizes modeling the complex action distributions as a composition of specialized diffusion models, each capturing a distinct sub-mode of the multimodal behavior space. This factorization enables each composed model to specialize and capture a subset of the task distribution, allowing the overall task distribution to be more effectively represented. In addition, this modular structure enables flexible policy adaptation to new tasks by simply fine-tuning a subset of components or adding new ones for novel tasks, while inherently mitigating catastrophic forgetting. Empirically, across both simulation and real-world robotic manipulation settings, we illustrate how our method consistently outperforms strong modular and monolithic baselines, achieving a 24% average relative improvement in multitask learning and a 34% improvement in task adaptation across all settings.

MetaWorld Multitask	RLBench Multitask	Real-world Multitask
MetaWorld Adaptation	RLBench Adaptation	Real-world Adaptation

Flexible Multitask Learning with

Factorized Diffusion Policy

Abstract

Component Behavior Visualization

Assembly

Hammer

More Task Execution Videos

Door Open

Drawer Open

Assembly

Window Close

Hammer

Peg Insert

Door Close

Drawer Close

Disassemble

Window Open

Toilet Seat Up

Open Box

Open Drawer

Take Umbrella Out

Toilet Seat Down

Close Box

Cube Red

Hang Low

Cube Blue

Hang High