AI & Tech31 mai 2026·By Valentin Boulaire·5 min read

NVIDIA Paid $20B for Groq. Cerebras Still Got $5.5B.

NVIDIA spent $20B on Groq in December 2025. Five months later, Cerebras IPO'd 20x oversubscribed at $5.5B raised, then dropped 24%. Inference quietly won.

Two checks, both with nine zeros. The first was NVIDIA writing twenty billion dollars in cash for Groq in December 2025, the largest deal in NVIDIA's history. The second was Cerebras pricing its IPO on May 14, 2026, raising $5.5 billion at a $56 billion fully-diluted valuation. The panda has been reading AI press for two years, and the script keeps getting rewritten. In 2026 the question is no longer who builds the smartest model. It is who runs it for the lowest dollar per token.

$20 Billion for Groq, $5.5 Billion for Cerebras

According to CNBC, NVIDIA closed the Groq deal on December 24, 2025, paying roughly $20 billion in cash for a perpetual, non-exclusive license to Groq's patent portfolio plus an acqui-hire of the leadership team. The previous biggest deal in NVIDIA's history was the 2019 Mellanox purchase at roughly $7 billion. The Groq cloud business itself stayed independent, which is why it kept raising funding rounds from existing investors weeks after the sale closed. AI in 2026 has stopped surprising people.

Five months later, on May 14, Cerebras Systems went public on the Nasdaq. The order book closed 20x oversubscribed, the price range was raised twice, and shares opened at $350 against a $185 IPO price, briefly pushing the market cap above $95 billion intraday. Then the discipline phase started. By the May 30 close, CBRS was at $236.99, down roughly 24% from the opening peak, per Motley Fool's post-IPO recap. The bid for inference is real. The bid for whoever happens to be selling inference is, apparently, less stable.

What is the inference economy, and why does it matter?

Training got the headlines for three years. Inference, which is what happens every time a model actually answers a prompt, is bigger and growing faster. It is the recurring tax on every chatbot, every code completion, every AI agent doing one tiny task on someone's behalf.

A frontier model is trained once. It is inferenced billions of times. As AI agents become routine, the inference-to-training ratio gets worse for the model labs and better for whoever owns the silicon. According to Cerebras' own S-1 disclosures cited by Motley Fool, the company posted roughly $510 million in 2025 revenue at a 47% net margin, with a $24.6 billion backlog driven mostly by a 750 megawatt compute deal with OpenAI. Inference is a business. The pricing question is whether it stays one.

The Token Price Collapse Nobody Wanted

The cost of running a token through a frontier-class model has collapsed faster than almost any commodity in recent memory. Wholesale rates on open-weight gpt-oss-120B and Llama 4 endpoints now sit well under a dollar per million tokens at most serverless providers, with custom-silicon offerings pricing aggressively below incumbent GPU clouds.

The good news: AI applications that needed cheap inference to be viable are now viable. The bad news: margins on the inference layer are getting eaten by the same competition that built it. The panda notes that a 20x oversubscribed IPO followed by a 24% drawdown in 16 trading days is not what a market without nerves does.

Two structural forces drive the collapse. First, open-weight frontier models trained by the labs themselves are now hostable by anyone with the silicon. Second, custom accelerators (Groq LPUs, Cerebras WSE, SambaNova RDU) bypass NVIDIA's gross margin on a per-token basis. NVIDIA's response was to write the largest check in its history and absorb one of them. The other one just IPO'd at a $56 billion valuation. The third sells through Oracle. The market is choosing more than one winner, which usually means it has not chosen at all.

What This Means for Decentralized Compute

The crypto sub-genre that bet on this exact moment was DePIN compute. Render, Akash, io.net, Bittensor: each pitched a thesis where decentralized GPU networks would undercut centralized clouds and let demand find the cheapest spare silicon. The thesis was not wrong. The window just got tighter.

CoinGecko reports total crypto market capitalization at $2.59 trillion as of May 31, 2026, and DefiLlama tracks total DeFi TVL at $80.94 billion across all chains. Both are dwarfed by the capital flowing into inference silicon. NVIDIA, Cerebras, and Groq alone moved more dollars in six months than most DePIN networks have processed in lifetime fees. We covered the supply-side pressure on these networks in our analysis of DePIN GPU networks and the AI squeeze, and the demand-side mirror image in the AI grid bet on crypto miners. The broader thread sits at our AI agents topic hub.

The honest read: decentralized inference still has a real role for sovereignty, censorship-resistance, and very-long-tail use cases. It does not get to win on price alone anymore. Wholesale rates at fractions of a cent per million tokens leave very little room for a tokenized middle layer to insert a margin and still feel decentralized.

What to Watch Next

Three things over the next ninety days. One: whether Cerebras can hold above its $185 IPO price as the post-lockup conversation begins. Two: whether NVIDIA actually integrates Groq's LPU into its production stack or quietly mothballs the architecture (the second outcome would tell you what NVIDIA really bought, which is a competitor, not a technology). Three: whether any of the DePIN compute networks pivot from a "cheaper inference" pitch to a "sovereign inference" pitch, which is the only narrative the math still supports.

Zentrix sits in the part of this story nobody is pricing yet. AI-native game logic eats inference the same way a chatbot does, except it has to be cheaper, faster, and offline-tolerant. Whoever wins the cheap-token race wins the right to put smart NPCs on phones. The panda will keep counting. The dollars-per-token line keeps going down. The question is who is still standing when it stops.

#ai#compute#ai-infrastructure#depin#nvidia

$20 Billion for Groq, $5.5 Billion for Cerebras

What is the inference economy, and why does it matter?

The Token Price Collapse Nobody Wanted

What This Means for Decentralized Compute

What to Watch Next

Related reading

The panda's weekly take, in your inbox

Join the Newsletter

Keep reading

io.net Burns 12M Tokens: Why Crypto Compute Just Got Real

Why DePIN GPU Networks Survive the 2026 AI Squeeze

Bittensor Rallied 30% When the US Banned Claude Fable 5