DexMulti: Concurrent Prehensile and Nonprehensile Manipulation

Abstract

Dexterous hands enable concurrent prehensile and nonprehensile manipulation, such as holding one object while interacting with another, a capability essential for everyday tasks yet underexplored in robotics. Learning such long-horizon, contact-rich multi-stage behaviors is challenging because demonstrations are expensive to collect and end-to-end policies require substantial data to generalize across varied object geometries and placements. We present DexMulti, a sample-efficient approach for real-world dexterous multi-task manipulation that decomposes demonstrations into object-centric skills with well-defined temporal boundaries. Rather than learning monolithic policies, our method retrieves demonstrated skills based on current object geometry, aligns them to the observed object state using an uncertainty-aware estimator that tracks centroid and yaw, and executes them via a retrieve-align-execute paradigm. We evaluate on three multi-stage tasks requiring concurrent manipulation (Grasp + Pull, Grasp + Open, and Grasp + Grasp) across two dexterous hands (Allegro and LEAP) in over 1,000 real-world trials. Our approach achieves an average success rate of 66% on training objects with only 3-4 demonstrations per object, outperforming diffusion policy baselines by 2-3x while requiring far fewer demonstrations. Results demonstrate robust generalization to held-out objects and spatial variations up to +/-25 cm.

1-Minute Submission Video

Captioned anonymized cut prepared for the paper submission website.

Method Overview

DexMulti decomposes multi-stage manipulation into a retrieve-align-execute pipeline. Given a new scene, the system retrieves the most similar demonstrated skill based on object geometry, aligns the skill trajectory to the current object pose using an uncertainty-aware estimator, and executes the aligned action sequence.

Task Suite

Three multi-stage dexterous tasks where the robot must keep one object secure while completing a second interaction.

Grasp + Pull

Maintain a grasp, pull open the drawer, and place the held object inside.

Grasp + Open

Hold an object, open the target container, and complete the place-in-container step.

Grasp + Grasp

Sequentially acquire two objects without releasing the first grasp.

Quantitative Results

Success rates (%) on training objects (overall) and held-out test objects. Click any cell to view all trial rollouts for that task/method combination.

Training Objects (Overall)

Method	Grasp + Pull	Grasp + Open	Grasp + Grasp
DexMulti (Ours)	64.7 22/34	67.6 23/34	44.4 12/27
Object-Centric DP3	20.6 7/34	35.3 12/34	29.6 8/27

Test Objects (Generalization)

Method	Grasp + Pull	Grasp + Open	Grasp + Grasp
DexMulti (Ours)	77.4 24/31	71.0 22/31	20.0 3/15
Object-Centric DP3	25.8 8/31	38.7 12/31	46.7 7/15

Representative Full Rollouts

A compact 3x3 montage highlights complete executions across the task suite.

Comparison with Demonstration-Free Methods

Demonstration-free approaches such as reinforcement learning or grasp synthesis rely on carefully engineered reward functions and initializations, which becomes increasingly difficult for multi-stage tasks.

Failure: Stable Grasp, Poor Task Compatibility

The method finds a stable grasp on the bottle, but the grasp is not compatible with the follow-up manipulation.

Success: Task-Compatible Initialization

A different initialization yields a task-compatible grasp. This contrast highlights how strongly demonstration-free optimization depends on initialization.

Robustness to Perturbations

DexMulti remains stable under external disturbances while continuing multi-stage tasks.

Embodiment Transfer

The same approach transfers across mechanically different dexterous hands without retraining.

LEAP Hand vs Allegro Hand

Side-by-side embodiment transfer on Grasp + Pull across two dexterous hands.