Follow BigDATAwire:

April 10, 2025

Google Cloud Unveils Ironwood TPU for Large-Scale AI Inference

LAS VEGAS, April 10, 2025 — At Google Cloud Next 25, Google introduced Ironwood, its seventh-generation Tensor Processing Unit (TPU). Ironwood is the company’s most performant and scalable custom AI accelerator to date—and the first designed specifically for inference.

For more than a decade, TPUs have powered Google’s most demanding AI training and serving workloads, and have enabled Cloud customers to do the same. Ironwood is Google’s most powerful, capable, and energy-efficient TPU yet, purpose-built to support large-scale inferential AI models.

Ironwood also marks a broader shift in AI infrastructure development—from responsive models that deliver real-time information for human interpretation, to AI systems that proactively generate insights and interpret data. This shift, which Google describes as the “age of inference,” will allow AI agents to retrieve and generate data collaboratively to deliver actionable insights and answers.

Ironwood is built to support this next phase of generative AI and its tremendous computational and communication requirements. It scales up to 9,216 liquid cooled chips linked with Inter-Chip Interconnect (ICI) networking spanning nearly 10 MW. It is one of several new components of Google Cloud AI Hypercomputer architecture, which optimizes hardware and software together for the most demanding AI workloads. With Ironwood, developers can also leverage Google’s own Pathways software stack to reliably and easily harness the combined computing power of tens of thousands of Ironwood TPUs.

Ironwood is designed to gracefully manage the complex computation and communication demands of “thinking models,” which encompass Large Language Models (LLMs), Mixture of Experts (MoEs) and advanced reasoning tasks. These models require massive parallel processing and efficient memory access. In particular, Ironwood is designed to minimize data movement and latency on chip while carrying out massive tensor manipulations. At the frontier, the computation demands of thinking models extend well beyond the capacity of any single chip.

For Google Cloud customers, Ironwood comes in two sizes based on AI workload demands: a 256 chip configuration and a 9,216 chip configuration.

Ironwood’s Key Features

Google Cloud is the only hyperscaler with more than a decade of experience in delivering AI compute to support cutting edge research, seamlessly integrated into planetary-scale services for billions of users every day with Gmail, Search and more. All of this expertise is at the heart of Ironwood’s capabilities.

Key features include:

  • Significant performance gains while also focusing on power efficiency, allowing AI workloads to run more cost-effectively. According to Google, Ironwood perf/watt is 2x relative to Trillium, the company’s sixth generation TPU announced last year.
  • Substantial increase in High Bandwidth Memory (HBM) capacity. Ironwood offers 192 GB per chip, 6x that of Trillium, which enables processing of larger models and datasets, reducing the need for frequent data transfers and improving performance.
  • Dramatically improved HBM bandwidth, reaching 7.2 TBps per chip, 4.5x of Trillium’s. This high bandwidth ensures rapid data access, crucial for memory-intensive workloads common in modern AI.
  • Enhanced Inter-Chip Interconnect (ICI) bandwidth. This has been increased to 1.2 Tbps bidirectional, 1.5x of Trillium’s, enabling faster communication between chips, facilitating efficient distributed training and inference at scale.

With major improvements in compute performance, memory capacity, interconnect bandwidth, and power efficiency—nearly doubling efficiency over the previous generation—Ironwood is built to meet the growing demands of both AI training and inference. From powering state-of-the-art models like Gemini 2.5 to supporting scientific breakthroughs such as AlphaFold, Google TPUs continue to serve some of the most advanced workloads in AI.

Ironwood will be available to Google Cloud customers later this year.


Source: Amin Vahdat, Google

BigDATAwire