Technical Documentation

Deformable manipulation with X1D

Status: Experimental infrastructure

X1D includes the model, runtime, data, and evaluation interfaces required to begin developing deformable-object manipulation capabilities.

This does not mean that current X1D checkpoints can reliably fold garments. A deformable-manipulation checkpoint, suitable training data, and real-hardware validation are still required.

What is available

Dual-arm runtime contract

Xolver provides an experimental 14-dimensional ALOHA-style dual-arm contract defining:

Left and right arm joint actions
Two normalized gripper actions
Multi-step trajectory chunks
Robot proprioception
One workspace camera
Left and right wrist cameras
Execution timing
Inter-arm collision requirements
Joint, velocity, acceleration, and workspace limits
Mandatory human-supervised operation

The included limits are conservative placeholders. They must be replaced with hardware-specific values and validated before physical actuation.

Committed-action training

Asynchronous execution means a robot may already be performing part of an approved action chunk while X1D generates the next one.

Committed-action training allows a batch to identify those movements using:

extra_modalities["committed_action_mask"]

Expected shape:

[batch, action_horizon]

Committed actions remain unchanged and are provided as conditioning context. The training loss is applied only to the uncommitted future portion of the action chunk.

Enable it with:

enable_committed_action_training: true

The feature is disabled by default.

Corrective demonstrations

Xolver provides a structured record for human teaching interventions. A corrective demonstration includes:

Episode and task identifiers
Operator identity
Intervention interval
Observation references
Original policy actions
Human expert actions
Reason for intervention
Deployment evidence
Training approval state
Reviewer identity

Corrective demonstrations are separate from Safety Shield interventions.

A Safety Shield intervention indicates that an action was blocked or modified. It does not necessarily provide the correct expert action and must not automatically become a training label.

A correction cannot be approved for training without a reviewer.

Deformable-task evaluation

The evaluation interface records:

Full-task success
Human-intervention rate
Safety-intervention rate
Recovery attempts
Recovery success
Completion time
Whether the object left the validated workspace

These metrics evaluate task outcomes and operational supervision—not merely whether the model generated motion.

Intended development path

The recommended first task is a narrow, measurable activity such as towel spreading and folding.

Simulation demonstrations
→ randomized evaluation
→ supervised robot rollout
→ human correction
→ review
→ retraining
→ comparative evaluation

Training should gradually cover:

Different starting configurations
Fabric sizes and materials
Corner and edge grasping
Cloth spreading and alignment
Missed grasps
Slippage
Incorrect intermediate folds
Recovery from partially completed tasks

Safety requirements

Experimental deformable manipulation requires:

Hardware-specific dual-arm calibration
Accurate collision geometry
Inter-arm collision enforcement
Validated workspace boundaries
Joint, velocity, and acceleration limits
Human takeover capability
Watchdog enforcement
Replay and intervention recording
Supervised commissioning

No experimental contract should be treated as production certification.

Current limitations

The current implementation does not include:

A trained deformable-manipulation checkpoint
General garment-folding capability
Production-certified dual-arm limits
A complete cloth simulator
Public performance benchmarks
Evidence of reliable real-world folding