How Google's TPUs are changing the economics of artificial intelligence at scale

For over a decade, Nvidia GPUs have underpinned almost every major advance in state-of-the-art artificial intelligence. This position is currently being questioned.

Frontier models like Google’s Gemini 3 and Anthropic’s Claude 4.5 Opus were trained not on Nvidia hardware but on Google’s latest Tensor processing units, the Ironwood-based TPUv7. This signals that a viable alternative to the GPU-centric AI stack has now emerged – with real implications for the economics and architecture of training on a pioneering scale.

Nvidia MIRACLES (Compute Unified Device Architecture), a platform that provides access to the massive parallel architecture of the GPU and its surrounding tools, has created what many call a “CUDA moat”; once the team builds pipelines on CUDA, switching to another platform is too costly due to dependencies on the Nvidia software stack. This, combined with Nvidia’s first-mover advantage, helped the company achieve an astonishing 75% gross margin.

Unlike GPUs, TPUs were designed from the ground up as silicon specifically designed for machine learning. With each generation, Google has moved further toward large-scale AI acceleration, but now, as the hardware behind two of the most effective AI models ever trained, TPUv7 signals a broader strategy to challenge Nvidia’s dominance.

Both GPUs and TPUs accelerate machine learning, but they reflect different design philosophies: GPUs are general-purpose parallel processors, while TPUs are purpose-built systems optimized almost exclusively for large-scale matrix multiplication. With TPUv7, Google has taken this specialization further by tightly integrating high-speed links directly onto the chip, enabling TPU modules to scale like a single supercomputer, and reducing the costs and latency penalties typically associated with GPU-based clusters.

TPUs are “designed as a complete ‘system,’ not just a chip” – Val Bercovici, director of artificial intelligence at the company WEKAhe told VentureBeat.

Google’s commercial shift from internal to industry-wide

Historically, Google has confined access to TPUs only via cloud rentals on the Google Cloud platform. In recent months, Google has begun offering hardware directly to third-party customers, effectively separating the chip from the cloud service. Customers can choose between treating compute power as an operational expense by renting through the cloud or as a capital expense (purchasing the hardware outright), eliminating a major sticking point for vast AI labs that prefer to own their own hardware and effectively bypassing the “cloud rent” fee for core hardware.

Central to Google’s strategy shift is a landmark deal with Anthropic that will give the maker of Claude 4.5 Opus access to up to 1 million TPUv7 chips — more than a gigawatt of computing power. Through Broadcom, Google’s longtime physical design partner, approximately 400,000 chips are sold directly to Anthropic. The remaining 600,000 chips are leased through customary Google Cloud contracts. Anthropic’s involvement increases Google’s profits by billions of dollars and locks out one of OpenAI’s key competitors in the Google ecosystem.

Erosion of the “CUDA moat”

For years, Nvidia GPUs have been the clear leader in the AI infrastructure market. In addition to powerful hardware, Nvidia’s CUDA ecosystem includes an extensive library of optimized kernels and frameworks. Combined with extensive developer knowledge and a huge installed base of solutions, enterprises were gradually locked into a “CUDA moat”, a structural barrier that made abandoning GPU-based infrastructure impractically costly.

One of the key obstacles preventing wider adoption of TPU is friction in the ecosystem. In the past, TPUs have worked best with JAX, Google’s own numerical computing library designed for artificial intelligence/learning research. However, mainstream AI development relies heavily on PyTorch, an open-source ML platform that can be tuned for CUDA.

Google is currently addressing this vulnerability directly. TPUv7 supports native integration with PyTorch, incl willing executionfull distributed API support, torch.compile, and custom TPU kernel support within the PyTorch toolkit. The goal is for PyTorch to run as easily on TPUs as it does on Nvidia GPUs.

Google also contributes a lot to this vLLM AND SGLangtwo popular open source inference platforms. By optimizing these commonly used tools for TPUs, Google gives developers the ability to change hardware without rewriting the entire code base.

Advantages and disadvantages of TPU compared to GPU

For enterprises comparing TPUs and GPUs for large-scale ML workloads, the benefits focus primarily on cost, performance, and scalability. SemiAnalytics recently published a deep dive weighing the advantages and disadvantages of both technologies, measuring cost effectiveness as well as technical efficiency.

Thanks to its specialized architecture and greater energy efficiency, TPUv7 offers significantly improved throughput per dollar for large-scale training and large-scale inference. This allows enterprises to reduce operational costs associated with power, cooling and data center resources. SemiAnalytic estimates that for Google’s internal systems, the total cost of ownership (TCO) of an Ironwood-based server is approximately 44% lower than the TCO of an equivalent Nvidia GB200 Blackwell server. Even after accounting for Google and Broadcom profit margins, third-party customers like Anthropic see a ~30% cost reduction compared to Nvidia. “When cost is a concern, TPUs make sense for massive-scale AI projects. With TPUs, hyperscalers and AI labs can achieve a 30-50% reduction in total cost of ownership, which can translate into billions in savings,” Bercovici said.

This economic leverage is already transforming the market. The mere existence of a viable alternative allowed OpenAI negotiate a ~30% discount. on your own Nvidia hardware. OpenAI, however, is one of the largest buyers of Nvidia GPUs, however, earlier this year the company added Google TPU via Google Cloud to meet growing computational demands. Meta is also reportedly in advanced discussions about it acquire Google TPU for your data centers.

At this stage, it may seem that Ironwood is the ideal solution for enterprise architecture, but it comes with many trade-offs. While TPUs are excellent at handling specific deep learning workloads, they are much less pliant than GPUs, which can run a wide range of algorithms, including non-AI tasks. If a up-to-date AI technique is invented tomorrow, the GPU will immediately launch it. This makes GPUs more suitable for organizations that run a wide range of computational workloads beyond standard deep learning.

Migrating from a GPU-centric environment can also be costly and time-consuming, especially for teams using existing CUDA-based pipelines, custom GPU kernels, or using frameworks not yet optimized for TPU.

Bercovici recommends that companies “choose GPUs when they need to go to market quickly and quickly. GPUs leverage standard infrastructure and the largest developer ecosystem, handle dynamic and complex workloads for which TPUs are not optimized, and deploy them in existing on-premises, standards-based data centers without the need for custom power and network rebuilds.”

Additionally, the ubiquity of GPUs means that more engineering talent is available. TPUs require a uncommon skill set. “Leveraging the capabilities of TPUs requires engineering expertise on the part of organizations, which means being able to recruit and retain rare engineering talent who can write custom kernels and optimize compilers,” Bercovici said.

In practice, Ironwood’s advantages can be leveraged mainly in enterprises with massive tensor workloads. Organizations requiring greater hardware flexibility, a hybrid cloud strategy, or HPC-like versatility may find that GPUs are a better choice. In many cases, a hybrid approach combining the two can provide the best balance of specialization and flexibility.

The future of AI architecture

The competition for AI hardware supremacy is heating up, but it’s far too early to predict a winner — or whether there will be a winner at all. Thanks to Nvidia and Google innovating at such a rapid pace like Amazon joining the fray, the most effective AI systems of the future may be hybrid, integrating both TPU and GPU.

“Demand is growing on Google Cloud for both our custom TPUs and Nvidia GPUs,” a Google spokesperson told VentureBeat. “As a result, we are significantly expanding our Nvidia GPU portfolio to meet significant customer demand. The reality is that the majority of our Google Cloud customers use both GPUs and TPUs. With our broad selection of the latest Nvidia GPUs and seven generations of custom TPUs, we offer customers the flexibility to choose optimizations for their specific needs.”

Categories

How Google’s TPUs are changing the economics of artificial intelligence at scale

Google’s commercial shift from internal to industry-wide

Erosion of the “CUDA moat”

Advantages and disadvantages of TPU compared to GPU

The future of AI architecture

Science says left-handed people are more competitive

OpenAI delays ChatGPT ‘adult mode’ again

5 useful Python scripts to automate exploratory data analysis

Sleep apnea often goes undetected in women. This is starting to change

Anthropic’s contract with the Pentagon is a warning to startups chasing federal contracts

More News

When AI companies go to war, security gets left behind

War with Iran threatens global chip supplies and the expansion of artificial intelligence

ByteDance’s artificial intelligence ambitions are hampered by computational limitations and copyright concerns

OpenAI banned military applications. The Pentagon tested its models through Microsoft anyway

Science says left-handed people are more competitive

OpenAI delays ChatGPT ‘adult mode’ again

5 useful Python scripts to automate exploratory data analysis