Premji Invest | Inference pricing is a race to the bottom. Why Baseten is different

We are excited to announce our investment in Baseten as part of a $150M series D. We're thrilled to partner with the amazing team - Tuhin, Amir and Phil - and the rest of the Baseten crew. Baseten helps developers and enterprises run their AI applications at the lowest possible cost and latency while maintaining incredibly high reliability.

‍

The ghosts of MLOps—and why we waited‍

The precursor to a product like Baseten were products that promised to make classical ML (not the genAI we know today) faster, cheaper and more reliable. They belonged to a class of products loosely classified as "ML Ops". The bulk of spend for these products, however, sat in a handful of use-case-constrained, internal deployments (fraud, recommendations, risk). Tooling flourished, but profit pools didn’t; and the market never compounded because the surface area was too narrow. We needed to see the potential of a large company, driven by a larger set of mission critical use cases and waited it out.

‍

When foundation models exploded, a similar pitch reappeared under a new banner: AI inference optimization. By January 2024, we tracked ~25–30 credible companies, almost all sub-$10M ARR with overlapping offerings (quantization, caching, routing, pruning, compilers, schedulers, etc.). It was obvious the category was early and noisy.

‍

Internally we use a Horizon 1-2-3 framework to characterize markets and leaders. Horizon 1 means neither the market nor leader is clear, Horizon 2 means there is a clear market and need for the product but its hard to pick the leaders and Horizon 3 means the leaders in the market have been established. It felt the market wasn't just on Horizon 1, it was Horizon (-1). We made a deliberate call to observe but not allocate capital, waiting to see true enterprise adoption and the path to stickiness. No FOMO bets. We wanted the dust to settle, learn from customers and revisit with a prepared mind.

‍

What changed from 2024 to 2025

21 months later, the market has consolidated. Many vendors shut down, got acqui-hired, or pivoted to pure GPU resale. A small set of leaders emerged that: (a) serve production at scale, (b) sell to enterprises without alignment to model or cloud, and (c) package optimization inside an operationally reliable system.

‍

Equally important, demand matured. Enterprises began embedding LLMs into customer-facing workflows and back-office automations; agentic and long-running jobs appeared; and expectations shifted from “cheap tokens” to SLA-backed services. That shift created room for systems companies, not single tricks.

‍

The hard questions we still had to answer

It felt like we could finally make a bet in the category, but it still wasn't a straight path to the finish line. We had to first answers a few questions ourselves -

‍

Why does inference make money in the long term? If you only sell a point optimization (e.g., KV-cache hygiene), you invite churn and price erosion. The buyer eventually in-sources or swaps you out because switching costs are low. How do you prevent that?
Why wouldn't hyperscalers just do this? If your primitives are very similar to a hyperscaler's core competency, they can eventually build it and crush you with distribution. Are the primitives different?
Why Baseten, of the inference providers that remain? Yes, the number of companies in the category has consolidated, but we still need to choose carefully. We don't make competing investments and typically like to size up massively once we're in a company. Getting it wrong is not an option.
How do you get access when the winners are apparent and what is a fair entry valuation? If a category consolidates to 2-3 winners, you can be sure competition only heats up and entry valuations will rise. Why do we win the deal?

‍

Lets dive into each -

[1] Why does inference make money in the long term?

Our underwriting focused on one core belief: inference makes money when it’s delivered as a reliable system. Baseten puts reliability at the center. Built across multiple regions and clouds, it keeps customers online even when an individual provider has issues. For enterprises with mission-critical AI, every second of downtime can mean lost revenue and reputational risk. Baseten targets four to five nines of availability (99.99–99.999%) not only by optimizing the model (quantization, pruning, speculative decoding) but by providing state-of-the-art infrastructure to scale instances up or down and manage burst capacity. As AI apps scale, traffic peaks get higher with more concurrent users; managing that load is no small feat.

‍

Reliability alone isn’t enough to drive long-term stickiness. The other pillar is developer workflow. Baseten is where AI engineers—not the data-scientist persona MLOps once targeted—deploy, stage, roll back, test, and evaluate models. As we’ve seen with our investments in Writer and EvenUp, embedding in day-to-day workflows creates durable attachment.

‍

Finally, customers repeatedly described Baseten as a strategic partner, not just another vendor—a team they call to improve overall service behavior. That advisory posture deepens trust and increases expansion potential.

‍

[2] Why wouldn't hyperscalers just do this?

There’s growing evidence that, at this layer, hyperscalers are reduced to compute and storage. Building a seamless developer-first inference product is a different muscle. Inference is less about one fast model and more about choosing which model to run, where, when, how to roll back versions, and how to define autoscaling and service level objectives —all with great ergonomics.

‍

Crucially, inference is an incredibly high-priority workload; putting all your eggs in one basket (one cloud) doesn’t cut it. Recent cloud outages (e.g. Google Cloud outage in June 2025) underscored the risk. Apps on Baseten can stay online thanks to instant failover across providers.

‍

[3] Why Baseten, of the inference providers that remain? We bias toward founders who understand how the enterprise buys—see our investments in Poolside, Galileo, Anaplan, Coupa, Zuora, MuleSoft, and others. Baseten’s product experience endears itself to enterprise buyers: a full system that lets AI developers get to day zero quickly (not just an optimized model that requires the buyer to run themselves) and the ability to deliver strong product value and margins without owning the compute. Enterprises prefer to use pre-purchased capacity across multiple hyperscalers, not be forced to buy captive hardware.

‍

Beyond product, the GTM discipline is clear: efficient direct + channel motions, outsized revenue per ramped rep, and recent leadership upgrades—including the hire of Dannie Herzberg, who led Slack’s enterprise motion and is among the most highly regarded sales leaders in the Valley.

‍

[4] How do you get access when the winners are known and what is a fair entry valuation?

Earning our way into this round was a village effort plus serendipity. We were introduced by DJ Patil (ex-CTO at Devoted Health, a Premji Invest portfolio company) via a tip from our friend Matt Carbonara. From there, we kept meeting Tuhin, Amir, and Phil at conferences and AI dinners, invited them to Fortune 500 pitches, and built mutual respect. Adam Bain at 01A was another lynchpin—vouching for both sides during what became a competitive process.

‍

On price, we recognize there are fewer massive outcomes in infrastructure than in applications; past a certain entry price, it’s often IPO-or-bust. We believe Baseten can be one of those generational outliers—thanks to a first-rate team, a product that compounds, and a secular tailwind that’s only accelerating. AI will be baked into modern applications; agents and long-running workflows will drive background jobs and tool-use that make tens of model calls per user flow. That explosion in inference demand is the wave Baseten will ride. Through our Horizon 1–2–3 lens, the market has moved from Horizon (−1) to Horizon 2 → 3: the category is clear and leaders are breaking out.

‍

Closing

Yes, inference can be a race to the bottom—but only if you sell a point solution that’s easy to swap out. Baseten sells a system with reliability at its core, aiming for four to five nines of availability, built for enterprise buyers and the developers who serve them. That combination earns pricing power, stickiness, and scale. We couldn’t be more excited to partner with the team.

‍

PS - Many months ago, we used to think the inference market was basically this Dilbert strip come to life: endless price-cutting with no moats in sight. Baseten flipped the script—reminding us reliability is far more valuable than racing to the bottom.