Xolver Research

Safe Play Pretraining: learning reusable manipulation abilities under explicit safety contracts.

We are exploring how robots can learn fundamental manipulation abilities through simulated play before being fine-tuned for precise industrial tasks.

Research implementationSoftware pipeline completeSimulator validation in progress

Initial embodiment

Franka Panda research platform

Downstream benchmark

Contact-rich peg insertion

Research boundary

Contract-bound actions and evidence-backed evaluation

Abstract

Precise robotic assembly depends on contact, alignment, recovery, and small corrections that are expensive to demonstrate manually. Learning each task from scratch is inefficient and can leave the policy's training history difficult to inspect.

Safe Play Pretraining takes a different path. A robot first develops reusable abilities through diverse simulated interaction, then transfers that manipulation prior to a task such as peg insertion.

Xolver's contribution is the boundary around learning: policy proposals are checked against an explicit robot contract, while interventions, constraint margins, configurations, checkpoints, evaluations, and operator decisions remain connected through an evidence record.

Research question

Can a robot learn precise contact-rich tasks faster after developing general manipulation abilities through play?

Reusable experience

Reaching, grasping, transport, orientation, and recovery may provide a stronger starting point than random initialization.

Safety feedback

A policy can observe interventions and learn better proposals, without becoming the system that decides what is safe.

Persistent evidence

Configuration, contracts, simulator assumptions, evaluations, and approvals should remain attached to every artifact.

Why play?

A final assembly objective provides a narrow signal. Play exposes a policy to useful behaviours across varied objects, goals, and initial conditions—inside a defined simulation and safety envelope.

Object reaching

Stable grasp acquisition

Transport and reorientation

Disturbance recovery

Precision pose reaching

How the system is structured

1. Sample

Generate an object, initial state, goal, and controlled physical variation.

2. Propose

A compact motor policy observes simulated state and proposes the next action.

3. Constrain

Check the proposal against the action schema and project it into the allowed region when needed.

4. Learn

Combine task progress with intervention events and proximity to constraint boundaries.

5. Tighten

Increase pose precision through a curriculum with explicit advancement criteria.

6. Export

Package the prior with contract identities, configuration, lineage, evaluations, and hashes.

Prior vs. scratch

From play to precise assembly.

The first study compares two otherwise matched peg-insertion training paths. Simulator configuration, task distribution, optimisation budget, evaluation set, and safety boundary remain fixed.

Prior-initialised

Begins from the manipulation ability learned during simulated play.

From scratch

Uses the same policy architecture with fresh initialization.

The initial target is a median improvement of at least 2× in sample efficiency, without increased critical violations or meaningful regression in safety-intervention rate. This is an evaluation target—not a published result.

Success rate
Steps to target success
Final insertion accuracy
Completion time
Recovery after misalignment
Safety-intervention rate
Minimum constraint margin
Invalid-action and physics violations
Training throughput and wall-clock cost

Safety is not learned behaviour.

A policy may learn better actions. It does not become the authority that decides whether an action is safe. Learning can reduce interventions; it cannot bypass enforcement.

MODEL → proposes

CONTRACT BOUNDARY → constrains

RUNTIME → executes or refuses

Governed through Xolver Console.

The Play Training Control Plane manages configuration and operator workflow. Canonical state remains in typed, persisted records, and an edited form cannot directly queue training.

  1. 1.Configure
  2. 2.Validate
  3. 3.Propose
  4. 4.Review
  5. 5.Approve or reject
  6. 6.Queue
  7. 7.Evaluate
  8. 8.Promote or retain

Research status.

Implemented software is separated from pending experimental claims.

CapabilityStatus
Learning architectureImplemented
Safety-aware training boundaryImplemented
Checkpoint and artifact provenanceImplemented
Console governanceImplemented
Deterministic end-to-end validationImplemented
Isaac Lab integration boundaryImplemented
Large-scale Isaac trainingPending
Sample-efficiency resultPending
Physical robot validationPending
X1-D integrationFuture research
Scope and inspiration

Safe Play Pretraining is a research capability. It does not certify a deployment, prove real-world transfer, or guarantee performance on arbitrary assembly tasks. Simulator results remain simulator results until separately validated on physical hardware.

This work is inspired in part by Play2Perfect by Lum, Kedia, Liu, and Bohg. Xolver's independently owned implementation focuses on runtime contracts, deterministic enforcement, evidence provenance, artifact compatibility, and operator approval.

Read the Play2Perfect paper
Follow the research

Interested in controlled evaluation or collaboration?

We welcome robotics researchers, industrial automation teams, simulator developers, and hardware partners.

hello@xolver.ai