Premji Invest | Goodbye VLAs. Why we led Rhoda's Series A

We've spent years investing in deep tech — from autonomous vehicle software (Applied Intuition) to AI foundation models (Poolside, Runway) to semis (Auradine, Upscale). For most of that time, robotics sat at the bottom of our list: too slow to grow, too low-margin, too operationally complex. But the biggest issue was data. Every approach to building intelligent robots required collecting enormous volumes of demonstration data — and it was never clear how much would be enough to reach commercial viability, or whether convergence would ever come. So we stood back and watched.

The industry's answer to the data problem is wrong.

Waymo spent 15 years and tens of billions of dollars painstakingly collecting driving data, mile by mile, city by city, hoping that convergence would eventually come. And it did — but the first 90% of driving capability came relatively quickly, while the final 10% took dramatically longer and cost dramatically more. This is the long-tail problem - the closer you get to full capability, the harder and more expensive each incremental gain becomes.

‍

The dominant approach in robotics today — Vision-Language-Action models, or VLAs — is walking the same path. The way state-of-the-art robotics foundation models are trained is by taking an existing pre-trained image model — a Vision Language Model (VLM) — and post-training it on robot manipulation demonstrations. The VLM provides the visual understanding; the demonstrations teach the model what to do given a particular image, and in doing so, impart an explicit understanding of physics.

‍

The problem is that this architecture is fundamentally bottlenecked by the very data it depends on. VLAs need massive volumes of robot-specific demonstrations — collected through expensive teleoperation or noisy simulation plagued by the well-known sim-to-real gap — and even with both approaches, a handful of tasks requires hundreds of thousands of hours of demonstrations. This is analogous to asking OpenAI's engineers to hand-write Shakespeare to generate training data. It is slow, expensive, and fundamentally incomplete — because no matter how much data you collect, you will never cover all the edge cases a robot will encounter in production.

‍

But here's what makes it worse than Waymo: at least Waymo was collecting data for one embodiment (a car) in one environment (roads). General-purpose robotics demands data across hundreds of different robot form factors, in unstructured environments, performing thousands of different manipulation tasks. The combinatorial space explodes.

‍

This isn't just an efficiency problem — it's a scaling ceiling. VLAs are architecturally constrained: they will always be bottlenecked by the availability of expensive, embodiment-specific demonstration data. The billions being poured into brute-force data collection are betting that scale will eventually solve this. We believe that bet is wrong. The industry needs an approach that starts from data abundance, not data scarcity.

The VLA needs 100,000 hours of demonstrations. Rhoda's model needs 10.

Rhoda's Direct Video Action (DVA) model — a new class of robotics foundation model — pre-trains on internet-scale video to learn motion priors and physical interaction patterns. It then post-trains only lightly on embodiment-specific data — requiring roughly four orders of magnitude less data than the teleoperation-heavy VLA paradigm. Rather than treating data scarcity as a constraint to work around, Rhoda reframes the problem entirely: replacing the need for exhaustive demonstrations with high-fidelity video predictions generated in real-time for closed-loop control.

‍

This is the key insight: we don't need to spend hundreds of $M collecting demonstration data and hope that the robot's situational awareness eventually becomes good enough. The DVA pre-trains on the abundant data that already exists -- internet video -- and as a generative model, predicts the future by sampling from the distribution it's learned from this training. The post-training required is minimal. And once deployed, the robot can continue learning new skills on the job, expanding its task repertoire in months, not years.

‍

This new paradigm has compounding commercial implications. Faster implementation cycles (driven by minimal post-training data requirements) and stronger robustness (from physics intuition learned implicitly across millions of videos) translate directly into greater pricing power, accelerated revenue growth, and structurally higher gross margins.

And the conditions to capitalize on this have never been better.

Rhoda's DVA architecture solves the data problem — the issue that kept us on the sidelines for years. But the structural challenges we flagged earlier — slow growth, low margins, operational complexity — are also being addressed. Several conditions have come together simultaneously to make the next generation of robotics and physical AI companies far more attractive than anything we had seen previously.

‍

Those conditions are:

(1) Better — and cheaper — hardware. Actuators, sensors, and compute have improved steadily over the past decade while simultaneously dropping in cost, enabling a broader range of physical tasks to emerge — from locomotion to, more interestingly, dexterous manipulation.

(2) A maturing robotics ecosystem. Perhaps the most underappreciated factor is the quiet maturation of the surrounding infrastructure — operating systems, simulation environments, and developer tooling — that allows hardware and software to be leveraged together far more efficiently than before.

(3) The rise of Robotics-as-a-Service (RaaS). The shift from one-time hardware sales to recurring, subscription-like revenue is fundamentally rewriting the margin story for robotics companies — bringing them structurally closer to the software business models investors have always preferred.

(4) A deepening manufacturing labor shortage, with nearly 2 million jobs projected to go unfilled by 2030, creating genuine urgency.

Team, Tech, TAM, and Syndicate — the four pillars of how we evaluate companies still early on the maturity curve

Despite a substantial raise, Rhoda is still early in its journey — barely 18 months old as a company. At this stage of maturity, our investment thesis revolves around four things: team, tech, TAM, and syndicate.

‍

We've covered the technology on the AI side earlier in this post, and the company has meaningful innovations on the hardware side as well — including building their own actuators. More on that in a future piece.

‍

On TAM, robotics companies broadly have two markets to sell into, each requiring very different go-to-market muscles: consumer and B2B. We believe consumer robotics will eventually be a large market, but the near-term serviceable opportunity sits firmly in industrial. Industrial buyers — like Rhoda's existing partners — are sophisticated in their robotics procurement, deeply ROI-driven, and remarkably sticky once a robot is deployed in production. The TAM to augment human labor in industrial automation is so large that it almost doesn't need to be modeled out. It speaks for itself.

‍

On syndicate, we believe the best companies need a village to reach their full potential. That's why we're thrilled to be partnering alongside Khosla Ventures and Mayfield — close friends and frequent collaborators across many of our investments.

‍

And lastly — and most importantly — is our conviction in the team. Building a robotics company is hard. There are unknown unknowns at every turn, from hardware innovation to manufacturing, supply chain, change management, and robot financing. That's precisely why Jagdeep Singh is, in our view, the ideal person to take this on. Jagdeep is a Silicon Valley legend — someone who has spent his career tackling extraordinarily hard technical problems and driving them all the way to commercialization. Having built Infinera and QuantumScape before this, robotics might actually be the easiest technical challenge he's ever set out to solve.

‍

Alongside Jagdeep is an exceptional team: Chief Scientist Eric Chan, who already made his mark in computer vision and generative modelling while at Stanford; Chief Research Officer Changan Chen, who was previously a post-doctoral researcher with Fei-Fei Li's computer vision lab at Stanford; Andrew Wooten, who runs operations with military-grade precision; VP of Hardware Engineering Vincent Clerc, who's built and shipped multiple generations of humanoids; Chief Data Officer Alex Bergman, who owns the entire data pipeline; and Steve Tirado, who brings a wealth of company-building experience.

Closing

For the first time, robotics companies are beginning to exhibit the characteristics that have long defined the best technology investments — large, serviceable TAMs unlocked by genuine technology breakthroughs, high revenue growth enabled by compressed training and implementation cycles, and a margin structure that finally rewards long-term holders.

‍

We're thrilled to have led Rhoda's Series A, and even more thrilled to be partnering with a world-class team to build it. Onwards.

Perspectives

The industry's answer to the data problem is wrong.

The VLA needs 100,000 hours of demonstrations. Rhoda's model needs 10.

And the conditions to capitalize on this have never been better.

Team, Tech, TAM, and Syndicate — the four pillars of how we evaluate companies still early on the maturity curve

Closing

Related articles

CEO Series | DOSS: Turning operational complexity into competitive advantage

CEO Series | DOSS: Turning operational complexity into competitive advantage

Why I take founders on a 3-mile hike before writing a check

Why I take founders on a 3-mile hike before writing a check

Rearchitecting the Rigid ERP Core. Why we co-led Doss's $55M Series B

Rearchitecting the Rigid ERP Core. Why we co-led Doss's $55M Series B

Goodbye VLAs. Why we led Rhoda's Series A

Goodbye VLAs. Why we led Rhoda's Series A