DexMulti: Concurrent Prehensile and Nonprehensile Manipulation

1-Minute Submission Video

Captioned anonymized cut prepared for the paper submission website.

Abstract

Dexterous hands enable concurrent prehensile and nonprehensile manipulation, such as holding one object while interacting with another, a capability essential for everyday tasks yet underexplored in robotics. Learning such long-horizon, contact-rich multi-stage behaviors is challenging because demonstrations are expensive to collect and end-to-end policies require substantial data to generalize across varied object geometries and placements. We present DexMulti, a sample-efficient approach for real-world dexterous multi-task manipulation that decomposes demonstrations into object-centric skills with well-defined temporal boundaries. Rather than learning monolithic policies, our method retrieves demonstrated skills based on current object geometry, aligns them to the observed object state using an uncertainty-aware estimator that tracks centroid and yaw, and executes them via a retrieve-align-execute paradigm. We evaluate on three multi-stage tasks requiring concurrent manipulation (Grasp + Pull, Grasp + Open, and Grasp + Grasp) across two dexterous hands (Allegro and LEAP) in over 1,000 real-world trials. Our approach achieves an average success rate of 66% on training objects with only 3-4 demonstrations per object, outperforming diffusion policy baselines by 2-3x while requiring far fewer demonstrations. Results demonstrate robust generalization to held-out objects and spatial variations up to +/-25 cm.

Task Suite

The paper studies three multi-stage dexterous tasks where the robot must keep one object secure while completing a second interaction.

Grasp + Pull

Maintain a grasp, pull open the drawer, and place the held object inside.

Grasp + Open

Hold an object, open the target container, and complete the place-in-container step.

Grasp + Grasp

Sequentially acquire two objects without releasing the first grasp.

Comparison with Demonstration-Free Methods

A natural question is whether demonstration-free approaches, such as reinforcement learning or grasp synthesis, could replace our demonstration-based pipeline. In practice, these methods rely on carefully engineered reward or energy functions and well-chosen initializations to elicit the desired behavior, which becomes increasingly difficult for multi-stage tasks.

Failure: Stable Grasp, Poor Task Compatibility

The method finds a stable grasp on the bottle, but the grasp is not compatible with the follow-up manipulation. The bottle remains secure while the second object is never picked up, illustrating that grasp stability alone is not enough for a multi-stage task.

Success: Task-Compatible Initialization

A different initialization yields a task-compatible grasp that keeps the bottle secure while allowing the second object to be acquired. This contrast highlights how strongly demonstration-free optimization depends on initialization and objective design.

Representative Full Rollouts

A compact 3x3 montage highlights complete executions across the task suite using the web-sized teaser assets.

Full rollout montage spanning Grasp + Pull, Grasp + Open, and Grasp + Grasp.

Robustness and Embodiment Transfer

The website includes dedicated clips for perturbation robustness and transfer across two different dexterous hands.

External Perturbations

DexMulti remains stable under disturbances while continuing the multi-stage task.

LEAP and Allegro Hands

Side-by-side embodiment transfer across mechanically different dexterous hands.

Acknowledgement

Website designed and implemented with OpenAI Codex.