S2-Diffusion: Generalizing from Instance-level to Category-level Skills in Robot Manipulation

Quantao Yang∗,1, Michael C. Welle∗,1,2, Danica Kragic1, and Olov Andersson1

Recent advances in skill learning has propelled robot manipulation to new heights by enabling it to learn complex manipulation tasks from a practical number of demonstrations. However, these skills are often limited to the particular action, object, and environment \textit{instances} that are shown in the training data, and have trouble transferring to other instances of the same category. In this work we present an open-vocabulary Spatial-Semantic Diffusion policy (S$^2$-Diffusion) which enables generalization from instance-level training data to category-level, enabling skills to be transferable between instances of the same category. We show that functional aspects of skills can be captured via a promptable semantic module combined with a spatial representation. We further propose leveraging depth estimation networks to allow the use of only a single RGB camera. Our approach is evaluated and compared on a diverse number of robot manipulation tasks, both in simulation and in the real world. Our results show that S$^2$-Diffusion is invariant to changes in category-irrelevant factors as well as enables satisfying performance on other instances within the same category, even if it was not trained on that specific instance.

S2-Diffusion

Download Preprint

  • These authors contributed equally.
  • 1 KTH Royal Institute of Technology, Stockholm, Sweden
  • 2 INCAR Robotics AB, Stockholm, Sweden

Expert Demonstrations

Red whiteboard wiping demonstrations

Bowl-to-bowl rice scooping demonstrations

Close-Container demonstrations

Experiments

We evaluated policies trained on only red marker on red, green, and black marker instances

Red whiteboard wiping task

Green whiteboard wiping task

Black whiteboard wiping task

All experiments:

Whiteboard wiping

We evaluated policies trained on only rice scooping marker on rice, choco, hearts, and mixed cerial instances

Rice Bowl-to-bowl scooping task

Choco Bowl-to-bowl scooping task

Hearts Bowl-to-bowl scooping task

Mixed Bowl-to-bowl scooping task

All experiments:

We evaluated ablation policies trained on only rice close container on rice and instances

Showcase RGB-Diffusion limitation

Close-container choco S2-Diffusion

All experiments: