Technical Documentation

Deformable manipulation with X1D

Status: Experimental infrastructure

X1D includes the model, runtime, data, and evaluation interfaces required to begin developing deformable-object manipulation capabilities.

This does not mean that current X1D checkpoints can reliably fold garments. A deformable-manipulation checkpoint, suitable training data, and real-hardware validation are still required.

What is available

Dual-arm runtime contract

Xolver provides an experimental 14-dimensional ALOHA-style dual-arm contract defining:

  • Left and right arm joint actions
  • Two normalized gripper actions
  • Multi-step trajectory chunks
  • Robot proprioception
  • One workspace camera
  • Left and right wrist cameras
  • Execution timing
  • Inter-arm collision requirements
  • Joint, velocity, acceleration, and workspace limits
  • Mandatory human-supervised operation

The included limits are conservative placeholders. They must be replaced with hardware-specific values and validated before physical actuation.

Committed-action training

Asynchronous execution means a robot may already be performing part of an approved action chunk while X1D generates the next one.

Committed-action training allows a batch to identify those movements using:

extra_modalities["committed_action_mask"]

Expected shape:

[batch, action_horizon]

Committed actions remain unchanged and are provided as conditioning context. The training loss is applied only to the uncommitted future portion of the action chunk.

Enable it with:

enable_committed_action_training: true

The feature is disabled by default.

Corrective demonstrations

Xolver provides a structured record for human teaching interventions. A corrective demonstration includes:

  • Episode and task identifiers
  • Operator identity
  • Intervention interval
  • Observation references
  • Original policy actions
  • Human expert actions
  • Reason for intervention
  • Deployment evidence
  • Training approval state
  • Reviewer identity

Corrective demonstrations are separate from Safety Shield interventions.

A Safety Shield intervention indicates that an action was blocked or modified. It does not necessarily provide the correct expert action and must not automatically become a training label.

A correction cannot be approved for training without a reviewer.

Deformable-task evaluation

The evaluation interface records:

  • Full-task success
  • Human-intervention rate
  • Safety-intervention rate
  • Recovery attempts
  • Recovery success
  • Completion time
  • Whether the object left the validated workspace

These metrics evaluate task outcomes and operational supervision—not merely whether the model generated motion.

Intended development path

The recommended first task is a narrow, measurable activity such as towel spreading and folding.

Simulation demonstrations
→ randomized evaluation
→ supervised robot rollout
→ human correction
→ review
→ retraining
→ comparative evaluation

Training should gradually cover:

  • Different starting configurations
  • Fabric sizes and materials
  • Corner and edge grasping
  • Cloth spreading and alignment
  • Missed grasps
  • Slippage
  • Incorrect intermediate folds
  • Recovery from partially completed tasks

Safety requirements

Experimental deformable manipulation requires:

  • Hardware-specific dual-arm calibration
  • Accurate collision geometry
  • Inter-arm collision enforcement
  • Validated workspace boundaries
  • Joint, velocity, and acceleration limits
  • Human takeover capability
  • Watchdog enforcement
  • Replay and intervention recording
  • Supervised commissioning

No experimental contract should be treated as production certification.

Current limitations

The current implementation does not include:

  • A trained deformable-manipulation checkpoint
  • General garment-folding capability
  • Production-certified dual-arm limits
  • A complete cloth simulator
  • Public performance benchmarks
  • Evidence of reliable real-world folding

Related documentation