Xolver Research

Safe Play Pretraining: learning reusable manipulation abilities under explicit safety contracts.

We are exploring how robots can learn fundamental manipulation abilities through simulated play before being fine-tuned for precise industrial tasks.

Research implementationSoftware pipeline completeSimulator validation in progress

Explore the study Collaborate with us

Initial embodiment

Franka Panda research platform

Downstream benchmark

Contact-rich peg insertion

Research boundary

Contract-bound actions and evidence-backed evaluation

Abstract

Precise robotic assembly depends on contact, alignment, recovery, and small corrections that are expensive to demonstrate manually. Learning each task from scratch is inefficient and can leave the policy's training history difficult to inspect.

Safe Play Pretraining takes a different path. A robot first develops reusable abilities through diverse simulated interaction, then transfers that manipulation prior to a task such as peg insertion.

Xolver's contribution is the boundary around learning: policy proposals are checked against an explicit robot contract, while interventions, constraint margins, configurations, checkpoints, evaluations, and operator decisions remain connected through an evidence record.

Research question

Can a robot learn precise contact-rich tasks faster after developing general manipulation abilities through play?

Reusable experience

Reaching, grasping, transport, orientation, and recovery may provide a stronger starting point than random initialization.

Safety feedback

A policy can observe interventions and learn better proposals, without becoming the system that decides what is safe.

Persistent evidence

Configuration, contracts, simulator assumptions, evaluations, and approvals should remain attached to every artifact.

Why play?

A final assembly objective provides a narrow signal. Play exposes a policy to useful behaviours across varied objects, goals, and initial conditions—inside a defined simulation and safety envelope.

Object reaching

Stable grasp acquisition

Transport and reorientation

Disturbance recovery

Precision pose reaching

How the system is structured

1. Sample

Generate an object, initial state, goal, and controlled physical variation.

2. Propose

A compact motor policy observes simulated state and proposes the next action.

3. Constrain

Check the proposal against the action schema and project it into the allowed region when needed.

4. Learn

Combine task progress with intervention events and proximity to constraint boundaries.

5. Tighten

Increase pose precision through a curriculum with explicit advancement criteria.

6. Export

Package the prior with contract identities, configuration, lineage, evaluations, and hashes.

Prior vs. scratch

From play to precise assembly.

The first study compares two otherwise matched peg-insertion training paths. Simulator configuration, task distribution, optimisation budget, evaluation set, and safety boundary remain fixed.

Prior-initialised

Begins from the manipulation ability learned during simulated play.

From scratch

Uses the same policy architecture with fresh initialization.

The initial target is a median improvement of at least 2× in sample efficiency, without increased critical violations or meaningful regression in safety-intervention rate. This is an evaluation target—not a published result.

Success rate

Steps to target success

Final insertion accuracy

Completion time

Recovery after misalignment

Safety-intervention rate

Minimum constraint margin

Invalid-action and physics violations

Training throughput and wall-clock cost

Safety is not learned behaviour.

A policy may learn better actions. It does not become the authority that decides whether an action is safe. Learning can reduce interventions; it cannot bypass enforcement.

MODEL → proposes

CONTRACT BOUNDARY → constrains

RUNTIME → executes or refuses

Governed through Xolver Console.

The Play Training Control Plane manages configuration and operator workflow. Canonical state remains in typed, persisted records, and an edited form cannot directly queue training.

1.Configure
2.Validate
3.Propose
4.Review
5.Approve or reject
6.Queue
7.Evaluate
8.Promote or retain

Research status.

Implemented software is separated from pending experimental claims.

Capability	Status
Learning architecture	Implemented
Safety-aware training boundary	Implemented
Checkpoint and artifact provenance	Implemented
Console governance	Implemented
Deterministic end-to-end validation	Implemented
Isaac Lab integration boundary	Implemented
Large-scale Isaac training	Pending
Sample-efficiency result	Pending
Physical robot validation	Pending
X1-D integration	Future research

Scope and inspiration

Safe Play Pretraining is a research capability. It does not certify a deployment, prove real-world transfer, or guarantee performance on arbitrary assembly tasks. Simulator results remain simulator results until separately validated on physical hardware.

This work is inspired in part by Play2Perfect by Lum, Kedia, Liu, and Bohg. Xolver's independently owned implementation focuses on runtime contracts, deterministic enforcement, evidence provenance, artifact compatibility, and operator approval.

Read the Play2Perfect paper

Follow the research

Interested in controlled evaluation or collaboration?

We welcome robotics researchers, industrial automation teams, simulator developers, and hardware partners.

hello@xolver.ai