<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0" 
  xmlns:content="http://purl.org/rss/1.0/modules/content/"
  xmlns:wfw="http://wellformedweb.org/CommentAPI/"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:atom="http://www.w3.org/2005/Atom"
  xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
  xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
>
<channel>
  <title>Xolver Blog</title>
  <atom:link href="https://xolver.ai/feed.xml" rel="self" type="application/rss+xml" />
  <link>https://xolver.ai</link>
  <description>Physical intelligence platform for safe robotic operations.</description>
  <language>en-us</language>
  <lastBuildDate>Tue, 31 Mar 2026 12:12:01 GMT</lastBuildDate>
  
    <item>
      <title>The calculus of smooth motion. Solving for robotic snap</title>
      <link>https://xolver.ai/blog/calculus-of-smooth-motion</link>
      <guid isPermaLink="true">https://xolver.ai/blog/calculus-of-smooth-motion</guid>
      <pubDate>Tue, 31 Mar 2026 00:00:00 GMT</pubDate>
      <description>To avoid mechanical stress and rapid hardware degradation, an AI cannot simply connect dots. We explore the higher-order derivatives of physical motion and why deterministic interpolation must sit between intent and actuation.</description>
      <content:encoded><![CDATA[Anyone who has ever deployed a heavy industrial manipulator knows the sound of 'robotic snap'. It is the violent, shuddering crack that occurs when a high-torque actuator is given an abrupt change in trajectory. In the digital realm, a command can change instantly. In the physical realm, it collides with inertia.<p></p>In a simulation, a foundation model can easily output a piecewise trajectory. The arm is at Point A, and in the next timestep, it is told to be at Point B. Many early Vision-Language-Action (VLA) implementations treated robot pathing like digital drawing, connecting dots with straight lines. When these discrete, angular paths are fed directly into the high-gain feedback loops of physical motor controllers, the result is crippling mechanical stress, jitter, and rapid hardware degradation.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>The Higher-Order Derivatives of Motion</h2><p></p>Moving a physical mass smoothly requires respecting the calculus of motion. It is not enough to guarantee continuous position ($C^0$ continuity) or even continuous velocity ($C^1$ continuity). To avoid snapping a gearbox or dropping a payload, the control system must ensure continuous acceleration ($C^2$ continuity) and manage the rate of change of acceleration, known as jerk.<p></p>This turns trajectory generation from a simple geometric problem into a non-linear optimal control problem involving higher-order derivatives of position with respect to time. Mathematically, true fluidity requires minimizing an objective function that penalizes abruptness, the most common being the integral of squared jerk:<p></p>$$ J = \frac{1}{2} \int_0^T \left\| \frac{d^3 \mathbf{x}(t)}{dt^3} \right\|^2 dt $$<p></p>subject to the boundary conditions of initial and final position, velocity, and acceleration. Applying the calculus of variations and Pontryagin's Minimum Principle to this functional reveals that the unconstrained optimal trajectory in 1D space is a fifth-order (quintic) polynomial:<p></p>$$ x(t) = a_0 + a_1 t + a_2 t^2 + a_3 t^3 + a_4 t^4 + a_5 t^5 $$<p></p>The coefficients $\{a_0, \dots, a_5\}$ strictly depend on the boundary states. However, in physical robotics, the problem is highly constrained. We must enforce hard limits on joint velocities $\dot{q}_{max}$, accelerations $\ddot{q}_{max}$, and torques $\tau_{max}$. Thus, the true constrained optimization problem becomes:<p></p>$$ \min_{\mathbf{x}(t)} \int_0^T \left\| \dddot{\mathbf{x}}(t) \right\|^2 dt \quad \text{subject to} \quad \mathbf{x}(t) \in \mathcal{C}_{free}, \; \left\| \dot{\mathbf{q}}(t) \right\| \le \dot{\mathbf{q}}_{max}, \; \left\| \boldsymbol{\tau}(t) \right\| \le \boldsymbol{\tau}_{max} $$<p></p>This requires solving inverse kinematics (mapping task-space $\mathbf{x}$ to joint-space $\mathbf{q}$ via the Jacobian $\mathbf{J}(\mathbf{q})$) and recursive Newton-Euler dynamics at high frequency.<p></p>When a neural network outputs discrete action chunks, it is fundamentally ignorant of these continuous-time constraints. It proposes where the arm should go, not the physics of how it gets there.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>The Algorithmic Shock Absorber</h2><p></p>This is exactly why a foundation model should never directly command a motor. At Xolver, we structure our control spine to respect this boundary.<p></p>In our architecture, the foundation model acts as an intent engine. It operates at a relatively low frequency (e.g., 10 Hz), outputting a sequence of semantic waypoints or latent action chunks based on its interpretation of the scene.<p></p>These discrete outputs are then intercepted by the Deterministic Enforcement Layer. This layer acts as an algorithmic shock absorber. It takes the rough, probabilistic waypoints and fits a physically realizable, $C^2$-continuous spline interpolation that strictly respects the hardware's maximum torque, velocity, and jerk envelopes.<p></p>The enforcement layer effectively translates the VLA's step-functions into smooth, drivable trajectories. It then streams this optimized signal to the edge runtime and hardware controllers at high frequency (e.g., 500 Hz or 1000 Hz). The physical robot never feels the 'thought process' of the neural network, it only feels the mathematically verified momentum.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>Intelligence Requires Grounding</h2><p></p>Intelligence without physical grounding is destructive. By architecturally separating the 'nervous system' that plans from the 'spinal cord' that executes, we ensure that the scale and complexity of modern foundation models do not destroy the machinery they are tasked to operate.<p></p><span class='font-semibold text-terracotta'>At Xolver, we believe that true physical AI must speak the language of continuous calculus, not just discrete representation.</span>]]></content:encoded>
    </item>
    <item>
      <title>The mathematics of spatial constraints. Why foundation models need bounded execution</title>
      <link>https://xolver.ai/blog/mathematics-of-spatial-constraints</link>
      <guid isPermaLink="true">https://xolver.ai/blog/mathematics-of-spatial-constraints</guid>
      <pubDate>Tue, 10 Mar 2026 00:00:00 GMT</pubDate>
      <description>To guarantee safe actuation, we must look beyond the weights of a neural network and return to the mathematics of spatial constraints. We explore why implicit learning is insufficient for physical autonomy and how differentiable optimization at the edge guarantees safe execution.</description>
      <content:encoded><![CDATA[The arrival of Vision-Language-Action (VLA) models represents a paradigm shift, seemingly solving two of the hardest problems in robotics, intent interpretation and visual perception. By scaling up foundation models, we can now issue a high-level semantic command and have a system parse the visual scene to propose a surprisingly coherent sequence of actions.<p></p>Yet, moving from digital token generation to physical actuation introduces a critical, often overlooked disconnect. The digital realm is forgiving. A hallucinated word merely requires a backspace. The physical world, however, is bound by hard, unforgiving geometry and classical mechanics. When building infrastructure for physical autonomy, we quickly confront a stark reality. Implicit learning, simply scaling a model until it "learns" not to crash, is fundamentally insufficient for safety-critical operations. The long tail of physical edge cases cannot be brute-forced by data alone.<p></p>To guarantee safe actuation, we must look beyond the weights of a neural network and return to the rigorous mathematics of spatial constraints.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>The Geometry of Reachability and Configuration Spaces</h2><p></p>When a robotics foundation model proposes an action, it typically operates in task space, a human-interpretable 3D Cartesian coordinate system. However, the robot itself moves in Configuration Space ($\mathcal{C}$-space), a high-dimensional manifold where every point represents a complete specification of every joint angle and actuator state. For a standard 6-DOF or 7-DOF robotic manipulator, this space is incredibly complex.<p></p>Within this manifold, the space is partitioned into $\mathcal{C}_{free}$ (the set of all valid, collision-free configurations) and $\mathcal{C}_{obs}$ (the set of configurations that result in self-collisions, environmental collisions, or kinematic singularities). The boundary separating these two sets is highly non-linear, non-convex, and computationally intensive to map.<p></p>Foundation models, by their very nature, are probabilistic mapping functions. They output a distribution over possible next tokenized actions. Even a highly capable, massively scaled model might assign a 99.9% probability to a trajectory that exists entirely within $\mathcal{C}_{free}$, while assigning a 0.1% probability to a trajectory that barely intersects $\mathcal{C}_{obs}$.<p></p>In a text-generation task, a 0.1% error rate is an engineering triumph. In physical operations, whether navigating a cluttered manufacturing cell, orchestrating precision logistics, or operating in unstructured field environments, a 0.1% chance of executing a catastrophic trajectory is unacceptable. The foundation model cannot be trusted to independently self-enforce the rigid mathematical boundaries of $\mathcal{C}_{free}$.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>Formulating the Deterministic Enforcement Layer</h2><p></p>To bridge the gap between probabilistic intent and physical safety, advanced AI architectures must strictly decouple proposal from execution. The neural model proposes an intent, but a deterministic enforcement layer must constrain it.<p></p>Mathematically, this translates to a constrained non-linear optimization problem. Given a proposed state $x_{prop}$ generated by the foundation model, we must find an executable state $x_{exec}$ that minimizes the deviation from the model's proposal, subject to the strict condition that $x_{exec}$ lies safely within the bounds of physical reality.<p></p>$$ \min_{x_{exec}} || x_{exec} - x_{prop} ||^2 $$<p></p>$$ \text{subject to } x_{exec} \in \mathcal{C}_{free} $$<p></p>This optimization acts as a mathematical projection, requiring the system to account for multiple intersecting constraints simultaneously.<p></p><ul class='list-disc list-inside space-y-2 mt-4 ml-4'><li><b>Kinematic Constraints.</b> Ensuring target poses are mathematically reachable given the manipulator's inverse kinematics and joint limits.</li><li><b>Spatial Constraints.</b> Preventing intersection with known geometric obstacles, dynamic actors, and defined keep-out zones using bounding volumes and spatial fields.</li><li><b>Dynamic Constraints.</b> Bounding velocity, acceleration, and jerk to maintain physical stability and protect hardware integrity.</li></ul><p></p>By treating the foundation model's output as an unconstrained structural prior and passing it through this deterministic projection, we guarantee that the final execution trace is mathematically bounded, strictly safe, and auditable.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>Differentiable Optimization at the Edge</h2><p></p>Historically, solving these non-convex spatial optimization problems in real-time was a massive bottleneck. Traditional motion planners and sequential solvers require significant CPU overhead, making them poorly suited for the low-latency, high-frequency control loops required to keep pace with modern AI inference limits.<p></p>However, the mathematics of modern AI provides a remarkably elegant solution to the very problem it created.<p></p>By formulating these spatial, kinematic, and dynamic constraints as entirely differentiable functions, we can harness the exact same hardware-accelerated tensor primitives (GPUs/TPUs) used to run the foundation models. Utilizing modern frameworks designed for high-performance numerical computing (such as JAX or PyTorch), we can evaluate massive batches of spatial constraints in parallel.<p></p>Instead of searching for a path through discrete sampling, differentiable optimization allows us to compute the gradients of the constraint violations. If a proposed trajectory intersects an obstacle, the gradient points precisely in the direction of the safest, most efficient route back into $\mathcal{C}_{free}$. This allows systems to optimize reachability, layout compliance, and collision avoidance simultaneously in a matter of milliseconds.<p></p>This is the profound architectural shift happening in physical AI. We are moving constraint solving out of the realm of slow, classical CPU loops and directly into the realm of hardware-accelerated, differentiable tensor operations.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>Intelligence, Constrained</h2><p></p>The future of robotics AI will not be defined by a single, omniscient neural network that perfectly predicts every granular physical interaction. The real world contains too much sensor noise, environmental entropy, and geometric complexity for pure deep learning to operate safely in an unbounded manner.<p></p>The path to scalable, safe, and deployable autonomy lies in rigorous system architecture. It requires foundation models that intuitively propose state and intent, inextricably coupled with a deterministic enforcement layer that speaks the immutable language of spatial mathematics. By translating probabilistic inferences into guaranteed physical boundaries, we can finally leverage the power of modern AI in environments where failure is expensive.]]></content:encoded>
    </item>
    <item>
      <title>The geometry of non-uniqueness and why robotics is also a diffusion problem</title>
      <link>https://xolver.ai/blog/geometry-of-non-uniqueness</link>
      <guid isPermaLink="true">https://xolver.ai/blog/geometry-of-non-uniqueness</guid>
      <pubDate>Sat, 21 Feb 2026 00:00:00 GMT</pubDate>
      <description>Real-world robotics is multi-modal. We explore why traditional regression fails in the face of non-uniqueness and how diffusion models provide a mathematical bridge between noise and physical intent.</description>
      <content:encoded><![CDATA[As we move deeper into the era of general-purpose physical AI, we are forced to confront a mathematical reality that the symbolic world of LLMs rarely touches, the problem of non-uniqueness. In language, there may be many ways to say the same thing, but in robotics, there are many ways to *do* the same thing, and picking the wrong combination results in physical failure.<p></p>At Xolver, we believe that the transition from traditional robotics to truly agentic systems depends on moving away from deterministic "correctness" and toward the geometry of uncertainty. This is why we see robotics not as a regression problem, but as a diffusion problem.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>The Fallacy of the Average</h2><p></p>Traditional robotics models, including many early Vision-Language-Action (VLA) architectures, are built on a dangerous assumption, that for every observation $s$, there exists a single optimal action vector $a$. This is the mathematical framework of regression.<p></p>The problem arises when the world offers more than one valid path. Imagine a robot tasked with picking up a tool that can be approached from either the left or the right. A deterministic model, trained on both types of demonstrations, will attempt to minimize the mean squared error. Mathematically, it computes the expectation<p></p>$$a_{pred} = \int a \cdot p(a|s) \, da$$<p></p>This is the "regression to the mean." If the distribution $p(a|s)$ is bi-modal, the average of two valid paths is often a path that goes directly through the center, colliding with the table or grasping at thin air. In the physical world, the "average" of two right answers is almost always a wrong answer.<p></p>Intelligence is not about finding the one "right" answer. It is about acknowledging the infinite set of almost-right answers while strictly avoiding the sea of impossible ones.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>Action Manifolds and Multi-modality</h2><p></p>To solve for non-uniqueness, we must redefine the "Action Space." It is not merely a 7-DOF vector of joint angles. It is a complex, non-Euclidean manifold $\mathcal{M}$ shaped by the constraints of physics and the requirements of the task.<p></p>Real-world tasks are inherently multi-modal. A single visual input does not map to a point, it maps to a distribution. When a human reaches for a cup, their nervous system isn't solving for a single coordinate. It is navigating a high-probability "valley" in a manifold where millions of trajectories are equally valid.<p></p>Deterministic models collapse this manifold into a single, fragile point. To build robust robots, we need models that can represent the entire distribution, preserving the "valleys" of intent while respecting the high-energy "ridges" of physical impossibility.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>Diffusion as a Vector Field of Intent</h2><p></p>This is where diffusion models change the game. Instead of predicting a path directly, diffusion treats action generation as a process of refinement. It starts with noise and iteratively "pulls" it toward a valid state.<p></p>This is formalized through Stochastic Differential Equations (SDEs). The model learns the Score Function, which is the gradient of the log-density of the data<p></p>$$\mathbf{s}(x, t) = \nabla_x \log p_t(x)$$<p></p>Think of this score function as a vector field or a "current." No matter where you start in the action space, the score function points you toward the nearest valid manifold.<p></p>Diffusion doesn't just "generate" an action. It refines it against the implicit constraints of the environment. It is the mathematical bridge between "noise" (uncertainty) and "score" (intent). By learning the vector field rather than the point, the robot gains the ability to recover from perturbations and handle multi-modal choices without ever "averaging" its way into a collision.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>From Brains to Nervous Systems</h2><p></p>The philosophical shift here is profound. LLMs deal with discrete tokens where "correctness" is a matter of sequence and probability over a finite vocabulary. Robotics deals with continuous flows where "correctness" is a matter of survival in a non-linear physical world.<p></p>Diffusion models represent the first time we've had a mathematical framework that respects the "messiness" of reality without attempting to simplify it. By embracing the stochastic nature of motion, we move closer to how biological nervous systems operate, not by executing a pre-computed script, but by continuous, score-driven refinement.<p></p>Traditional models are fragile because they expect the world to match their single prediction. Diffusion models are resilient because they are designed to walk through the probability of the world, constantly correcting their course toward the manifold of success.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>Conclusion. Solving for the Infinitesimal</h2><p></p>The next era of robotics won't be defined by bigger models or more parameters, but by models that can navigate the geometry of uncertainty. We are moving away from the "Fallacy of the Average" and toward a physics-grounded understanding of non-uniqueness.<p></p>To touch the world, we must first learn to walk through the probability of it. At Xolver, we are building the mathematical scaffolding that allows machines to do exactly that, solving for the infinitesimal adjustments that turn a noisy intent into a certain action.<p></p><span class='font-semibold text-terracotta'>Intelligence is the ability to navigate the many ways to be right, while understanding exactly how to not be wrong.</span>]]></content:encoded>
    </item>
    <item>
      <title>The new mathematics of touch, solving for tactile intelligence</title>
      <link>https://xolver.ai/blog/tactile-intelligence-2026</link>
      <guid isPermaLink="true">https://xolver.ai/blog/tactile-intelligence-2026</guid>
      <pubDate>Fri, 16 Jan 2026 00:00:00 GMT</pubDate>
      <description>Touch is no longer an auxiliary sense. It is the central bottleneck of general purpose physical intelligence. We explore how tactile intelligence is formalized through continuum mechanics, information theory, and control.</description>
      <content:encoded><![CDATA[As 2026 begins, a fundamental truth is becoming unavoidable in robotics. Touch is no longer an auxiliary sense. It is the central bottleneck of general purpose physical intelligence.<p></p>Vision provides geometry. Language provides intent. Touch provides the closed loop reality check. The moment a robot makes contact with the world, abstraction ends and physics begins.<p></p>Tactile intelligence is not a signal processing problem. It is a problem of non smooth dynamics, stochastic control, and energy transfer across matter. Scaling models helps, but only up to the point where the laws of mechanics reassert themselves.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>Touch as a physical tensor field, not a signal</h2><p></p>Modern tactile sensors do not observe a scalar pressure value. They sample a discretized version of the Cauchy stress tensor $\boldsymbol{\sigma}(x, t)$ at the contact interface.<p></p>Touch is best described as a time varying mapping from the contact manifold $\mathcal{M}_c$ to force and torque space,<p></p>$$\mathbf{f}(t) = \int_{\mathcal{M}_c} \boldsymbol{\sigma}(x, t)\,\mathbf{n}(x)\,dA$$<p></p>where $\mathbf{n}$ is the surface normal.<p></p>Unlike vision, which passively observes photons, touch measures the transmission of energy through deformable matter. The so called messiness of tactile data is not noise. It is the high frequency structure of shear stress $\tau$ and normal pressure $p$ that determines whether an object remains stable or begins to slip.<p></p>This is why touch scales differently from vision. Increasing taxel density without understanding the physics simply produces more chaos.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>The discontinuity problem, contact is not smooth</h2><p></p>The hardest part of tactile intelligence is not dimensionality. It is discontinuity.<p></p>The transition from free space to contact is governed by the Signorini complementarity condition,<p></p>$$g_n \ge 0,\quad \lambda_n \ge 0,\quad g_n \lambda_n = 0$$<p></p>where $g_n$ is the contact gap and $\lambda_n$ is the normal force.<p></p>This is a true mathematical discontinuity. There is no smooth interpolation between touching and not touching. Classical approaches tried to smooth this transition away. In 2026, the shift is toward embracing it.<p></p>Differentiable contact models now allow gradients to flow through stick slip transitions defined by the Coulomb friction cone,<p></p>$$\|\boldsymbol{\tau}\| \le \mu \lambda_n$$<p></p>This matters because manipulation lives at the boundary of stability. Slip, micro vibration, and deformation are not edge cases. They are the task.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>From tactile fields to latent manifolds</h2><p></p>Raw tactile observations $x_t$ live in an extremely high dimensional space. Thousands of taxels across time quickly become intractable unless structured.<p></p>The modern approach is to project these observations onto a lower dimensional tactile manifold $\mathcal{Z}$, learned through interaction rather than geometry.<p></p>This is increasingly formalized through variational information bottlenecks. We seek a latent representation $z_t$ that preserves predictive power while discarding irrelevant variation,<p></p>$$\max I(z_t; x_{t+1}) - \beta I(z_t; x_t)$$<p></p>These latent variables function as tactile tokens. They are not symbolic labels like "slip" or "stable." They are coordinates in a physical interaction space where distance encodes risk. Moving closer to the boundary of a cluster corresponds to a rising probability of failure.<p></p>In this framing, tactile intelligence is not classification. It is navigation on a learned manifold shaped by physics.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>Prediction, not reaction, the active inference view</h2><p></p>Humans do not respond to touch after the fact. We anticipate it.<p></p>This is best captured through predictive processing and active inference. The robot maintains an internal generative model that predicts expected tactile feedback $\hat{x}_t$ given vision $x_v$ and action $u_t$,<p></p>$\hat{x}_t = g(x_v, u_t)$<p></p>The key signal is not touch itself, but surprisal,<p></p>$\delta_t = x_t - \hat{x}_t$<p></p>This prediction error drives immediate belief updates in the internal state $b_t$, often bypassing higher level reasoning. A spike in $\delta_t$ is what causes instant grip correction when an object turns out heavier, softer, or slipperier than expected.<p></p>In physical systems, prediction error is faster and more reliable than symbolic reasoning. This is why tactile control cannot wait for language level planning.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>Hierarchical control under physical constraints</h2><p></p>Tactile intelligence operates across time scales.<p></p>At the millisecond level, reflexive control loops stabilize contact. At the second level, higher policies reason about task completion. This structure is naturally modeled as a hierarchical stochastic optimal control problem.<p></p>At the low level, stability is governed by energy dissipation. At the high level, value functions encode task intent. The unifying object is the Hamilton Jacobi Bellman equation,<p></p>$$V(x) = \min_u \mathbb{E} \left[ \int_0^T \left( \|\delta_t\|^2 + \lambda \|\tau(t)\|^2 \right) dt \right]$$<p></p>When tactile error $\delta_t$ exceeds a safety threshold, the value function collapses around stability. The policy shifts instantly from goal seeking to damage prevention. This is not a heuristic. It is optimal behavior under physical risk.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>Why touch is the ultimate test of truth</h2><p></p>In virtual domains, models can hallucinate. In the physical world, conservation of momentum and energy act as non negotiable loss functions.<p></p>The shift in 2026 is the recognition that physics cannot be trained away. It must be embedded into representation, prediction, and control.<p></p>Tactile intelligence is where geometry meets dynamics, where probability meets friction, and where intelligence finally becomes accountable to reality.<p></p><span class='font-semibold text-terracotta'>At Xolver, we see touch not as another modality, but as the grounding layer of physical intelligence. It is at the contact patch, where bits meet atoms, that artificial intelligence stops being impressive and starts being real.</span>]]></content:encoded>
    </item>
    <item>
      <title>Predictions for the mathematics of robotics AI in 2026, from tokens to touch</title>
      <link>https://xolver.ai/blog/ai-robotics-2026</link>
      <guid isPermaLink="true">https://xolver.ai/blog/ai-robotics-2026</guid>
      <pubDate>Fri, 02 Jan 2026 00:00:00 GMT</pubDate>
      <description>If 2024 and 2025 were about giving robots a brain through LLMs and VLMs, 2026 feels like the year we finally give them a functioning nervous system. We explore how physics, control theory, and modern machine learning merge in earnest.</description>
      <content:encoded><![CDATA[Looking ahead to 2026, we see a clear shift in the industry's trajectory. If 2024 and 2025 were about giving robots a brain through LLMs and VLMs, 2026 feels like the year we finally give them a functioning nervous system.<p></p>We are moving past the novelty of robots that can see and speak, and into the much harder engineering reality of robots that can act with the subtle, contact-rich fidelity of a human. This transition is not cosmetic. It is rooted in changes to how we model physics, control, and learning itself.<p></p>Below is how we expect the mathematics and physics of robotics AI to evolve in 2026.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>1. The tokenization of continuous physics</h2><p></p>The most profound shift in 2026 is mathematical. We are dissolving the long-standing barrier between discrete reasoning and continuous control.<p></p>Until recently, these were treated as separate domains. Language models predicted the next text token, while control policies minimized a cost function $J$ over continuous state-space trajectories $x(t)$, $u(t)$. The coupling between the two was fragile and largely hand-engineered.<p></p>In 2026, Vision-Language-Action architectures mature into systems that treat physical force and motion as just another language. The core idea is the discretization of the manifold of useful actions. Instead of outputting a continuous voltage or torque command for each motor, the model predicts a sequence of action or motion tokens. These tokens are then decoded by a low-level diffusion or control policy into high-frequency actuation, often at 100 Hz or higher.<p></p>What emerges is a unified latent space $z$, where semantic intent, such as "twist the cap," aligns with dynamic affordances like torque vectors, friction cones, and compliance regions. We are also seeing the quantization of force itself. By training on large-scale teleoperation datasets, models learn to predict action tokens $a_t$ that implicitly encode compliance and contact dynamics. The math shifts from explicit inverse kinematics toward probabilistic token prediction,<p></p>$$a_t \sim p(a \mid o_{\le t}, c)$$<p></p>where $a_t$ is the action token, $o_{\le t}$ is the history of observations, and $c$ is the high-level language command. Control becomes a sequence modeling problem, without sacrificing physical realism.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>2. Solving contact-rich manipulation through differentiable physics</h2><p></p>For decades, the sim-to-real gap has been the graveyard of robotics startups. Robots trained in rigid-body simulators failed in the real world because simulators could not model deformation, frictional slip, or micro-collisions. A rubber seal, a greasy bolt, or a slightly misaligned part was enough to cause failure.<p></p>In 2026, the key breakthrough is the practical adoption of differentiable soft-body simulation. Instead of using non-differentiable physics engines that block gradient flow, the physics equations themselves become part of the computational graph.<p></p>This enables end-to-end learning across perception, control, and physics. A physical failure, such as dropping a cup, generates an error signal that propagates backward through the simulator,<p></p>$$\frac{\partial \mathcal{L}}{\partial \theta} = \frac{\partial \mathcal{L}}{\partial x_T} \cdot \frac{\partial x_T}{\partial \theta}$$<p></p>where $\theta$ includes both policy parameters and physical simulation parameters. The robot can effectively learn by dreaming physics.<p></p>This reframing also changes how classic manipulation problems are solved. Peg-in-hole assembly is no longer treated as a purely geometric constraint. Instead, it is modeled as an energy minimization problem involving friction, deformation, and contact forces. The robot learns to introduce small corrective motions that reduce system energy, allowing parts to slide into place rather than jam.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>3. Visuotactile foundation models</h2><p></p>Vision alone is insufficient for manipulation. You cannot see the weight of a hammer or the slipperiness of a soap bar. Humans rely heavily on touch to regulate grip force and adapt instantly to unexpected changes.<p></p>We expect 2026 to be the year visuotactile multimodality becomes foundational. Robots move beyond RGB inputs toward dense tactile fields generated by high-resolution sensors. Many of these sensors use internal cameras to observe gel deformation, producing high-dimensional tactile signals.<p></p>Mathematically, the goal is to fuse visual observations $x_v$ and tactile observations $x_t$ into a shared representation $z$. The likely breakthrough lies in cross-modal prediction. Before contact occurs, the visual system predicts the expected tactile signal $\hat{x}_t$,<p></p>$\hat{x}_t = f(x_v)$<p></p>Once contact happens, the difference between expected and actual touch,<p></p>$\delta_t = x_t - \hat{x}_t$<p></p>becomes the primary learning signal. This surprisal drives rapid adaptation of grip and force. It mirrors how humans instantly adjust when an object turns out to be heavier or slipperier than it appears.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>4. Energy-optimal control as a first-class objective</h2><p></p>A critical but often overlooked dimension is energy. Generative models are computationally expensive, and humanoid actuators are power-hungry. In 2026, efficiency becomes a first-class term in the objective function.<p></p>Control policies are no longer optimized solely for task success. Instead, they incorporate energy directly, following the principle of least action,<p></p>$$\min \int_0^T \tau(t)^2 \, dt$$<p></p>where $\tau(t)$ represents torque or energy expenditure. By penalizing energy use during training, including through RLHF-style objectives for efficiency, robots learn to exploit momentum, gravity, and passive dynamics. The resulting motion looks fluid and almost lazy, and can extend battery life by 20 to 30 percent without hardware changes.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>The convergence that defines 2026</h2><p></p>What makes 2026 distinctive is not a single breakthrough paper, but the convergence of these mathematical streams.<p></p>Tokenized action allows the brain to communicate fluently with the body. Differentiable physics teaches the brain the body's constraints. Visuotactile sensing gives the body real feedback grounded in contact and force.<p></p><span class='font-semibold text-terracotta'>We believe the winners of 2026 will not be the teams with the largest language models, but the ones who successfully ground those models in the unforgiving, nonlinear mathematics of the physical world.</span>]]></content:encoded>
    </item>
    <item>
      <title>How to train a RFM (Robotics Foundation Model)</title>
      <link>https://xolver.ai/blog/how-to-train-rfm</link>
      <guid isPermaLink="true">https://xolver.ai/blog/how-to-train-rfm</guid>
      <pubDate>Tue, 16 Dec 2025 00:00:00 GMT</pubDate>
      <description>Training a robotics foundation model is not an exercise in scaling parameters. It is an exercise in deciding what kind of world you want a machine to survive in. Unlike language or vision models, an RFM lives in time, friction, latency, contact, failure, and recovery.</description>
      <content:encoded><![CDATA[Training a robotics foundation model is not an exercise in scaling parameters. It is an exercise in deciding what kind of world you want a machine to survive in. Unlike language or vision models, an RFM does not live in files, tokens, or frozen datasets. It lives in time, friction, latency, contact, failure, and recovery.<p></p>This is why most attempts at large scale robotics learning fail quietly. They begin with the wrong abstraction. They assume robotics is just another multimodal problem. It is not. Robotics is a closed loop system where perception, reasoning, and action continuously interfere with one another.<p></p>An RFM is best understood as a general policy that can span tasks, environments, and embodiments. Not a task specific controller. Not a demo trained for one arm, one table, one lighting condition. It is a system that can perceive intent, reason under uncertainty, and act in ways that remain stable when the world pushes back.<p></p>Today we share, from our experience, how such a model is actually trained, end to end, without pretending that physics can be abstracted away.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>Define intelligence before you define models</h2><p></p>Before collecting data or choosing architectures, define what intelligence means in your system.<p></p>Is the robot expected to manipulate objects, navigate spaces, or collaborate with humans. Is it required to learn new tasks from language or only execute known skills. Are failures acceptable if recoverable, or must the system be conservative by default. What latency budget does the system have. Does inference need to run at 5 Hz, 10 Hz, or 50 Hz.<p></p>These questions are not philosophical. They directly constrain architecture, training signals, and deployment.<p></p>Most RFM projects fail because teams gather data before defining the distribution shift the model must survive. As a result, the model performs well in controlled settings and collapses under mild perturbations.<p></p>If you cannot articulate the failure modes you are designing for, you are not training a foundation model. You are collecting demos.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>Data is not volume, it is causality</h2><p></p>RFM training is often framed as a data scaling problem. That is only partially true. The real challenge is not how much data you have, but whether the data teaches causality rather than correlation.<p></p>Robotics data is expensive, biased, and shaped by embodiment. Sensors encode perspective. Actuators encode constraints. Human operators encode habits. If you simply aggregate trajectories, the model learns these biases instead of the task.<p></p>Robust RFM pipelines deliberately combine four classes of data.<p></p>Real world demonstrations anchor the model to physics. Teleoperation, kinesthetic teaching, or expert policies teach contact dynamics, friction, and feasibility.<p></p>Simulation rollouts provide breadth. They allow exploration of rare events, edge cases, and failures that are unsafe or slow to produce in reality. Domain randomization here is not about noise, but about uncertainty that mirrors reality.<p></p>Corrective and recovery data is the most valuable and the most ignored. Successful trajectories teach optimism. Near misses, aborts, and human interventions teach robustness. Without this data, models fail catastrophically instead of gracefully.<p></p>Language grounded annotations provide abstraction. Not just task names, but intent, constraints, and success conditions. This is what allows generalisation beyond memorised trajectories.<p></p>The objective is not balance. The objective is coverage of cause and effect.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>Embodiment is a first class problem</h2><p></p>Generalising across embodiments is not a slogan. It is a technical challenge.<p></p>A 7 degree of freedom arm, a 4 degree of freedom arm, and a quadruped do not share an action space. If you ignore this, cross embodiment generalisation is impossible.<p></p>Most successful RFMs standardise actions and state through abstraction. Proprioception is normalised into relative joint states, velocities, or end effector frames. Actions are tokenised into latent representations rather than raw torques.<p></p>Common approaches include action chunking, trajectory prediction, or latent action codes learned via VQ style encoders. The model does not predict individual motor commands. It predicts short horizon behaviours that can be mapped onto different bodies through embodiment specific decoders.<p></p>This separation is what allows the same policy to control different hardware without retraining from scratch.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>Architecture follows control, not fashion</h2><p></p>Most modern RFMs use a vision language action structure. Cameras provide state. Language provides goal conditioning. The model outputs actions or plans.<p></p>The critical architectural decision is not the transformer variant. It is how time and feedback are handled.<p></p>High capacity models are slow. Motors are fast. Physics does not wait for attention layers to converge.<p></p>For this reason, RFMs rarely operate at motor control frequencies. Instead, they act at a semantic rate. They predict short horizon trajectories, action chunks, or goal states. Classical controllers handle interpolation, stabilisation, and safety at high frequency.<p></p>This frequency separation is not a compromise. It is how biological systems work.<p></p>A typical stack looks like this. The RFM runs at low frequency and reasons about intent and strategy. A mid level controller translates these outputs into feasible motion plans. A low level controller enforces safety, smoothness, and constraints.<p></p>End to end purity is attractive in papers. Hybrid systems survive contact with reality.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>Training happens in stages, not once</h2><p></p>RFMs are not trained end to end in a single pass. They are grown through stages.<p></p>First comes representation learning. Vision, proprioception, and language are aligned into a shared latent space. Masked prediction, contrastive objectives, and future state modelling are common here. Control is not yet involved.<p></p>Second comes imitation. The model learns to map states and goals to action representations using demonstrations. Losses are supervised. Stability matters more than optimality.<p></p>Third comes interaction. Reinforcement learning, online fine tuning, or human in the loop correction exposes the model to its own mistakes. This is where robustness is learned.<p></p>These stages are not linear. Teams loop between them continuously as new data exposes new failure modes.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>Sim to real is a loop, not a bridge</h2><p></p>Sim to real transfer is often described as a milestone. In practice, it is a gradient.<p></p>Simulation enables scale. Reality provides truth.<p></p>Early training leans heavily on simulation to explore. As deployment begins, real world logs are fed back into the simulator. Physics parameters are recalibrated. Contact models are refined. Latency and sensor noise are updated.<p></p>This real to sim feedback creates a living digital twin. Simulation becomes less idealised and more predictive. It stops being a sandbox and starts becoming an instrument.<p></p>If real failures do not exist in your simulator, your simulator is lying to you.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>Safety is trained and enforced</h2><p></p>Safety in RFMs is both architectural and learned.<p></p>Certain constraints must be hard. Joint limits. Collision boundaries. Emergency stops. These are enforced outside the model.<p></p>Other behaviours can and should be learned. When to slow down. When to abort. How to behave under uncertainty.<p></p>This requires explicit signals. Unsafe actions are penalised. Near misses are logged. Human overrides are treated as supervision, not noise.<p></p>Evaluation must reflect this. Success rate alone is meaningless. Intervention frequency, recovery time, and degradation under stress matter far more.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>Evaluate how systems degrade, not how they peak</h2><p></p>RFMs are often judged by demos. This is misleading.<p></p>The real test is degradation. How performance changes as lighting shifts. How behaviour changes when latency increases. What happens after hours of continuous operation.<p></p>Foundation models are valuable not because they never fail, but because they fail predictably and recoverably.<p></p><hr class='my-8 border-subtle' /><p></p><h2 class='text-xl font-semibold text-charcoal mt-8 mb-4'>Why this approach matters</h2><p></p>Teams that make real progress in robotics do not treat it as a model problem. They treat it as a system problem. At Xolver, this methodology did not emerge from imitation. It emerged from first principles. From building systems that must operate in the physical world, under uncertainty, at scale. If intelligence is to move beyond screens and into environments, it must be trained with respect for physics, obsession with feedback loops, and humility about what models can and cannot do.<p></p><span class='font-semibold text-terracotta'>A robotics foundation model is not trained once. It is raised.</span>]]></content:encoded>
    </item>
    <item>
      <title>Why we chose to open source.</title>
      <link>https://xolver.ai/blog/opensource</link>
      <guid isPermaLink="true">https://xolver.ai/blog/opensource</guid>
      <pubDate>Mon, 15 Dec 2025 00:00:00 GMT</pubDate>
      <description>We chose to open source part of Xolver not as a marketing gesture, but as an architectural decision. Closed systems create the illusion of progress. Open systems reveal where reality pushes back.</description>
      <content:encoded><![CDATA[We chose to open source part of Xolver not as a marketing gesture, but as an architectural decision. Physical intelligence is not something that can be built in isolation. It sits at the intersection of perception, control, systems engineering, and real world messiness. No single company, no matter how well funded or well staffed, has a monopoly on insight in this space.<p></p><span class='font-semibold text-terracotta'>Closed systems create the illusion of progress. Open systems reveal where reality pushes back.</span><p></p>When intelligence leaves the screen and enters the physical world, assumptions break quickly. Latency matters. Sensors drift. Edge cases dominate. Open sourcing core components forces us to confront these truths early. It exposes our ideas to environments we did not anticipate and to scrutiny we cannot control. That is uncomfortable, but it is also how systems mature.<p></p>We also believe that trust in physical intelligence cannot be earned through claims alone. When software is responsible for actions in the real world, seeing how it works matters. Operators, partners, and developers need to understand behavior, failure modes, and limits. Open code makes this possible. It turns black boxes into inspectable systems and fear into informed judgment.<p></p>Another reason is ecosystem health. Physical intelligence is still early. Tools, data formats, simulators, and evaluation methods are fragmented. By open sourcing parts of our stack, especially runtimes, interfaces, and tooling, we reduce friction for others building adjacent systems. A healthier ecosystem benefits everyone, including us. Standards emerge faster when they are built in the open.<p></p>Open source also keeps us honest. It creates a forcing function against brittle design and hidden shortcuts. If something only works under perfect conditions, it will be discovered quickly. That pressure improves quality far more effectively than internal reviews alone.<p></p>Importantly, open source does not mean giving away the business. We are deliberate about what we open and what we keep proprietary. Core research ideas, production hardened intelligence, safety layers, and customer specific systems remain closed. What we open are the foundations that should be shared, the scaffolding that helps the field move forward together.<p></p>Many of the most important infrastructure layers in technology followed this path. Operating systems. Databases. Cloud primitives. They succeeded not because they were closed, but because they were trusted, extensible, and shaped by real use. Physical intelligence will follow a similar trajectory.<p></p>Finally, this is about alignment with our long term vision. Xolver is not trying to win by secrecy. We are trying to win by building systems that work, systems that last, and systems others rely on. Open sourcing part of our work is a signal of confidence in our direction and respect for the community building alongside us.<p></p><span class='font-semibold text-charcoal'>Physical intelligence will define how machines coexist with people in the real world. That responsibility is too large to keep entirely behind closed doors.</span>]]></content:encoded>
    </item>
    <item>
      <title>How we think and what we do.</title>
      <link>https://xolver.ai/blog/manifesto</link>
      <guid isPermaLink="true">https://xolver.ai/blog/manifesto</guid>
      <pubDate>Sun, 14 Dec 2025 00:00:00 GMT</pubDate>
      <description>Xolver starts from a simple belief: Intelligence only matters when it survives contact with the real world. Our work begins where clean data ends and uncertainty begins.</description>
      <content:encoded><![CDATA[Xolver starts from a simple belief. Intelligence only matters when it survives contact with the real world. We are not interested in models that look impressive in isolation but fail under noise, delay, and unpredictability. Our work begins where clean data ends and uncertainty begins.<p></p>We see the physical world as a continuous stream, not a sequence of snapshots. Objects move, environments drift, and intent changes over time. Any system that treats perception as a one-time act and decision making as a static output will eventually fail. Xolver is built around closed loops where seeing, reasoning, acting, and verifying happen continuously.<p></p>We do not separate intelligence from responsibility. Every decision a system makes in the physical world has consequences. Safety, explainability, and failure handling are not add-ons. They are part of the core architecture. If a system cannot explain why it acted or recognize when it is wrong, it does not belong in production.<p></p>Xolver is software first but not software only. We design our systems to run on existing cameras, drones, machines, and edge devices. At the same time, we accept that some problems demand full stack ownership. We will let product market fit decide when hardware becomes necessary. Until then, we remain hardware flexible and architecture disciplined.<p></p>We believe platforms matter more than point solutions. Features solve today’s problem. Platforms survive tomorrow’s variability. Xolver builds reusable intelligence systems that can be adapted across environments rather than rebuilt for each use case. This allows our customers to scale capability without multiplying complexity.<p></p>We are not a services company disguised as a product. Integration and deployment exist to unlock value, not to become the business. Our goal is to build systems that work out of the box, improve over time, and reduce operational burden rather than add to it.<p></p>We measure success differently. Not by demo accuracy, but by uptime. Not by benchmark scores, but by trust earned in real environments. When operators stop watching dashboards and start relying on the system, we know we are doing our job.<p></p>Xolver is being built for the long term. Physical intelligence is infrastructure, not a trend. It will quietly sit beneath cities, factories, vehicles, and public spaces, making them safer, more efficient, and more aware. We intend to build that layer with humility, rigor, and respect for the real world it serves.]]></content:encoded>
    </item>
</channel>
</rss>