Safe Play Pretraining: learning reusable manipulation abilities under explicit safety contracts.
We are exploring how robots can learn fundamental manipulation abilities through simulated play before being fine-tuned for precise industrial tasks.
Initial embodiment
Franka Panda research platform
Downstream benchmark
Contact-rich peg insertion
Research boundary
Contract-bound actions and evidence-backed evaluation
Precise robotic assembly depends on contact, alignment, recovery, and small corrections that are expensive to demonstrate manually. Learning each task from scratch is inefficient and can leave the policy's training history difficult to inspect.
Safe Play Pretraining takes a different path. A robot first develops reusable abilities through diverse simulated interaction, then transfers that manipulation prior to a task such as peg insertion.
Xolver's contribution is the boundary around learning: policy proposals are checked against an explicit robot contract, while interventions, constraint margins, configurations, checkpoints, evaluations, and operator decisions remain connected through an evidence record.
Can a robot learn precise contact-rich tasks faster after developing general manipulation abilities through play?
Reusable experience
Reaching, grasping, transport, orientation, and recovery may provide a stronger starting point than random initialization.
Safety feedback
A policy can observe interventions and learn better proposals, without becoming the system that decides what is safe.
Persistent evidence
Configuration, contracts, simulator assumptions, evaluations, and approvals should remain attached to every artifact.
Why play?
A final assembly objective provides a narrow signal. Play exposes a policy to useful behaviours across varied objects, goals, and initial conditions—inside a defined simulation and safety envelope.
Object reaching
Stable grasp acquisition
Transport and reorientation
Disturbance recovery
Precision pose reaching
How the system is structured
1. Sample
Generate an object, initial state, goal, and controlled physical variation.
2. Propose
A compact motor policy observes simulated state and proposes the next action.
3. Constrain
Check the proposal against the action schema and project it into the allowed region when needed.
4. Learn
Combine task progress with intervention events and proximity to constraint boundaries.
5. Tighten
Increase pose precision through a curriculum with explicit advancement criteria.
6. Export
Package the prior with contract identities, configuration, lineage, evaluations, and hashes.
From play to precise assembly.
The first study compares two otherwise matched peg-insertion training paths. Simulator configuration, task distribution, optimisation budget, evaluation set, and safety boundary remain fixed.
Prior-initialised
Begins from the manipulation ability learned during simulated play.
From scratch
Uses the same policy architecture with fresh initialization.
The initial target is a median improvement of at least 2× in sample efficiency, without increased critical violations or meaningful regression in safety-intervention rate. This is an evaluation target—not a published result.
Safety is not learned behaviour.
A policy may learn better actions. It does not become the authority that decides whether an action is safe. Learning can reduce interventions; it cannot bypass enforcement.
MODEL → proposes
CONTRACT BOUNDARY → constrains
RUNTIME → executes or refuses
Governed through Xolver Console.
The Play Training Control Plane manages configuration and operator workflow. Canonical state remains in typed, persisted records, and an edited form cannot directly queue training.
- 1.Configure
- 2.Validate
- 3.Propose
- 4.Review
- 5.Approve or reject
- 6.Queue
- 7.Evaluate
- 8.Promote or retain
Research status.
Implemented software is separated from pending experimental claims.
| Capability | Status |
|---|---|
| Learning architecture | Implemented |
| Safety-aware training boundary | Implemented |
| Checkpoint and artifact provenance | Implemented |
| Console governance | Implemented |
| Deterministic end-to-end validation | Implemented |
| Isaac Lab integration boundary | Implemented |
| Large-scale Isaac training | Pending |
| Sample-efficiency result | Pending |
| Physical robot validation | Pending |
| X1-D integration | Future research |
Safe Play Pretraining is a research capability. It does not certify a deployment, prove real-world transfer, or guarantee performance on arbitrary assembly tasks. Simulator results remain simulator results until separately validated on physical hardware.
This work is inspired in part by Play2Perfect by Lum, Kedia, Liu, and Bohg. Xolver's independently owned implementation focuses on runtime contracts, deterministic enforcement, evidence provenance, artifact compatibility, and operator approval.
Read the Play2Perfect paperInterested in controlled evaluation or collaboration?
We welcome robotics researchers, industrial automation teams, simulator developers, and hardware partners.
hello@xolver.ai