AI & Tech08 juin 2026·By Valentin Boulaire·5 min read

AI Compute 2026: The TPU Push, Nvidia, and DePIN's Wedge

Nvidia's quiet $4T cap, Google's TPU push, and DePIN's compute wedge. Three numbers tell you who actually controls AI inference in 2026. Bill not included.

Nvidia is worth more than every crypto asset on earth this morning. The panda checked twice. Meanwhile Google quietly moved more Gemini training onto TPUv6, AI inference cost is doing things to model labs' budgets nobody priced in, and a handful of DePIN compute tokens are stacking GPU hours in the corner. Some of it even matters.

What changed in AI compute in the last 90 days?

Three things, all underplayed by the cycle.

First, Google moved more of Gemini's training onto its own TPU silicon. According to Google Cloud's AI infrastructure blog, TPUv6 (codename Trillium) is the default for new internal workloads, with Pathways scheduling across pods. Second, Nvidia's Blackwell B200 finally hit volume shipment after a difficult Q4, and the next-gen Rubin architecture slipped from "imminent" to a 2027 roadmap window. Third, AI inference, not training, became the new bottleneck. Reasoning models burn an order of magnitude more tokens per query than the chat models they replaced, and per The Verge's coverage of AI cost curves, inference cost per task has roughly doubled for the most capable models since 2024.

That last point is the one nobody priced in.

The Nvidia $4T problem

Nvidia's market cap sits north of the entire crypto market today. The total crypto market cap is $2.24T per CoinGecko Global Charts. One chipmaker is worth almost two entire crypto markets stacked on top of each other. That is not a flex by Nvidia. It is a problem for everyone else.

When 80%+ of frontier AI compute runs on a single vendor's silicon, the platform tax is whatever Jensen says it is. Cloud providers know this. So do model labs. So does every CFO signing a multi-year GPU commit. Hence the TPU push at Google, the Trainium ramp at AWS, the rumors that Meta is taping out its own MTIA v3, and the persistent murmurs about Microsoft's Athena chip having finally left the lab.

The interesting tell is concentration: Nvidia's datacenter segment now accounts for more than 85% of its quarterly revenue, up from 60% in 2023. Every customer that can afford to diversify is doing the work. Every customer that cannot is locking in 18-month commits at whatever price clears. The Nvidia roadshow is no longer a hardware sale, it is a capacity allocation auction.

This is not a story about Nvidia losing. It is a story about everyone else trying to stop paying full retail.

Google's TPU bet, in numbers

Google's case is the most concrete. TPU pods scale through optical interconnects rather than NVLink. They are cheaper per FLOP for the workloads Google runs internally, and the DeepMind discovery blog on Gemini infrastructure lays out the rationale without spin. Training a frontier model on internal silicon avoids the Nvidia premium and the queue at the colos.

But here's the catch. TPU is not generally rentable the way an H100 is. Outside Google Cloud customers using specific managed services, the rest of the AI economy cannot just spin up TPU pods. So the savings are Google-internal. For everyone else, the choice remains: pay Nvidia, ramp Trainium with AWS's still-thin tooling, or wait for Rubin.

That asymmetry shapes the next two years of the market more than any new model release will.

Why is AI inference cost the new bottleneck?

Because the workload mix flipped. Training a frontier model is a one-off expense amortized across millions of queries. Inference is what users pay for, every single tap. Anthropic and OpenAI both ship reasoning models that think before they answer, which means tokens per task is up by an order of magnitude on the harder questions. According to Anthropic's research page, the extended-thinking modes are deliberately costly per query because the per-task quality justifies it. Maybe. The unit economics still hurt.

Layer in the agent workloads that loop, retry, and self-correct, and the bill goes from "annoying" to "existential" for any product whose margin assumes cheap inference. Ars Technica's chip coverage has called this the silent crisis of the 2026 AI cycle. The panda agrees, dryly. Closed-model labs cannot subsidize the loss forever. Someone has to find cheaper hardware, or charge a lot more per query, or both.

This is where decentralized compute markets actually have a wedge.

The DePIN, Dadacoin, and Zentrix angle

Three protocols try to fill the spill: Akash, Render, and Bittensor's compute subnet. None of them have Nvidia's economics. None of them need to.

The pitch is not "replace Nvidia". The pitch is "absorb the workloads the hyperscalers do not bother to serve cheaply": fine-tuning on consumer GPUs, batch image generation, agent test loops, the inference jobs where 200ms of extra latency is fine if the bill drops 40%. For context on the broader cluster, our AI agents pillar is the entry point, and the original DePIN GPU networks thesis from May still holds up. The reasoning-model cost wall we flagged on June 5 is exactly the bottleneck DePIN compute is built to undercut.

Three forward catalysts to watch through Q3. First, the spot price of consumer-grade H100 hours on DePIN marketplaces. If it drops below $1.50 per hour, the spill thesis has teeth. If it stays above $2.20, hyperscalers are still cheap enough to ignore. Second, TAO subnet 27 emissions and the share of compute paid in token versus stable. Third, Akash active leases as a percentage of registered capacity. Token price is a downstream proxy for these, never the other way around.

For Dadacoin on BSC, the connection is downstream. Zentrix-style AI gaming runs on inference, not training. When that inference can route to cheap, distributed GPU rather than full-retail OpenAI credits, the unit economics of an AI-generated game session collapse from "venture-backed only" to "memecoin treasury can afford it". That shift takes years. The plumbing is being laid now, mostly without the press releases.

The panda would rather pay $0.40 per inference than $4.

#ai#ai-infrastructure#compute#ai-industry

What changed in AI compute in the last 90 days?

The Nvidia $4T problem

Google's TPU bet, in numbers

Why is AI inference cost the new bottleneck?

The DePIN, Dadacoin, and Zentrix angle

Related reading

The panda's weekly take, in your inbox

Join the Newsletter

Keep reading

Why DePIN GPU Networks Survive the 2026 AI Squeeze

Gemma 4 Is Open-Source: What On-Chain AI Agents Inherit

Bittensor Rallied 30% When the US Banned Claude Fable 5