1-Minute Submission Video
Captioned anonymized cut prepared for the paper submission website.
Abstract
Dexterous hands enable concurrent prehensile and nonprehensile manipulation, such as holding one object while interacting with another, a capability essential for everyday tasks yet underexplored in robotics. Learning such long-horizon, contact-rich multi-stage behaviors is challenging because demonstrations are expensive to collect and end-to-end policies require substantial data to generalize across varied object geometries and placements. We present DexMulti, a sample-efficient approach for real-world dexterous multi-task manipulation that decomposes demonstrations into object-centric skills with well-defined temporal boundaries. Rather than learning monolithic policies, our method retrieves demonstrated skills based on current object geometry, aligns them to the observed object state using an uncertainty-aware estimator that tracks centroid and yaw, and executes them via a retrieve-align-execute paradigm. We evaluate on three multi-stage tasks requiring concurrent manipulation (Grasp + Pull, Grasp + Open, and Grasp + Grasp) across two dexterous hands (Allegro and LEAP) in over 1,000 real-world trials. Our approach achieves an average success rate of 66% on training objects with only 3-4 demonstrations per object, outperforming diffusion policy baselines by 2-3x while requiring far fewer demonstrations. Results demonstrate robust generalization to held-out objects and spatial variations up to +/-25 cm.
Task Suite
The paper studies three multi-stage dexterous tasks where the robot must keep one object secure while completing a second interaction.
Grasp + Pull
Maintain a grasp, pull open the drawer, and place the held object inside.
Grasp + Open
Hold an object, open the target container, and complete the place-in-container step.
Grasp + Grasp
Sequentially acquire two objects without releasing the first grasp.
Comparison with Demonstration-Free Methods
A natural question is whether demonstration-free approaches, such as reinforcement learning or grasp synthesis, could replace our demonstration-based pipeline. In practice, these methods rely on carefully engineered reward or energy functions and well-chosen initializations to elicit the desired behavior, which becomes increasingly difficult for multi-stage tasks.
Failure: Stable Grasp, Poor Task Compatibility
The method finds a stable grasp on the bottle, but the grasp is not compatible with the follow-up manipulation. The bottle remains secure while the second object is never picked up, illustrating that grasp stability alone is not enough for a multi-stage task.
Success: Task-Compatible Initialization
A different initialization yields a task-compatible grasp that keeps the bottle secure while allowing the second object to be acquired. This contrast highlights how strongly demonstration-free optimization depends on initialization and objective design.
Representative Full Rollouts
A compact 3x3 montage highlights complete executions across the task suite using the web-sized teaser assets.
Full rollout montage spanning Grasp + Pull, Grasp + Open, and Grasp + Grasp.
Robustness and Embodiment Transfer
The website includes dedicated clips for perturbation robustness and transfer across two different dexterous hands.
External Perturbations
DexMulti remains stable under disturbances while continuing the multi-stage task.
LEAP and Allegro Hands
Side-by-side embodiment transfer across mechanically different dexterous hands.
Acknowledgement
Website designed and implemented with OpenAI Codex.