April 5, 2023

Google Claims Its TPU v4 Outperforms Nvidia A100

Jaime Hampton

A new scientific paper from Google details the performance of its Cloud TPU v4 supercomputing platform, claiming it provides exascale performance for machine learning with boosted efficiency.

The authors of the research paper claim the TPU v4 is 1.2x–1.7x faster and uses 1.3x–1.9x less power than the Nvidia A100 in similar sized systems. The paper notes that Google has not compared TPU v4 to the newer Nvidia H100 GPUs due to their limited availability and 4nm architecture (vs. TPU v4’s 7nm architecture).

As machine learning models have grown larger and more complex, so have their compute resource needs. Google’s Tensor Processing Units (TPUs) are specialized hardware accelerators used for building machine learning models, specifically deep neural networks. They are optimized for tensor operations and can significantly boost efficiency in the training and inference of large-scale ML models. Google says the performance, scalability, and availability make TPU supercomputers the workhorses of its large language models like LaMDA, MUM, and PaLM.

Google CEO Sundar Pichai announcing TPU v4 at Google I/O 2021. (Source: Google)

The TPU v4 supercomputer contains 4,096 chips interconnected via proprietary optical circuit switches (OCS), which Google claims are faster, cheaper, and utilize less power than InfiniBand, another popular interconnect technology. Google claims its OCS technology is less than 5% of the TPU v4’s system cost and power, stating it dynamically reconfigures the supercomputer interconnect topology to improve scale, availability, utilization, modularity, deployment, security, power, and performance.

Google engineers and paper authors Norm Jouppi and David Patterson explained in a blog post that thanks to key innovations in interconnect technologies and domain-specific accelerators (DSAs), Google Cloud TPU v4 enabled a nearly 10x leap in scaling ML system performance over TPU v3. It also boosted the energy efficiency by approximately 2-3x compared to contemporary ML DSAs and reduced CO2e by approximately 20x over DSAs in what the company calls typical on-prem datacenters.

The TPU v4 system has been operational at Google since 2020. The TPU v4 chip was unveiled at the company’s 2021 I/O developer conference. Google says the supercomputers are actively used by leading AI teams for ML research and production across language models, recommender systems, and other generative AI.

Regarding recommender systems, Google says its TPU supercomputers are also the first with hardware support for embeddings, a key component of Deep Learning Recommendation Models (DLRMs) used in advertising, search ranking, YouTube, and Google Play. This is because each TPU v4 is equipped with SparseCores, which are dataflow processors that accelerate models that rely on embeddings by 5x–7x but use only 5% of die area and power.

One-eighth of a TPU v4 pod from Google’s ML cluster located in Oklahoma, which the company claims runs on ~90% carbon-free energy. (Source: Google)

Midjourney, a text-to-image AI startup, recently selected TPU v4 to train the fourth version of its image-generating model: “We’re proud to work with Google Cloud to deliver a seamless experience for our creative community powered by Google’s globally scalable infrastructure,” said David Holz, founder and CEO of Midjourney in a Google blog post. “From training the fourth version of our algorithm on the latest v4 TPUs with JAX, to running inference on GPUs, we have been impressed by the speed at which TPU v4 allows our users to bring their vibrant ideas to life.”

TPU v4 supercomputers are available to AI researchers and developers at Google Cloud’s ML cluster in Oklahoma, which opened last year. At nine exaflops of peak aggregate performance, Google believes the cluster is the largest publicly available ML hub that operates with 90% carbon-free energy. Check out the TPU v4 research paper here.

Google Cloud’s 2023 Data and AI Trends Report Reveals a Changing Landscape

Partners Line Up for Google Cloud Ready for AlloyDB Designation

[wpsr_share_icons icons="twitter,facebook,linkedin,reddit,email" icon_size="40px" icon_bg_color="" icon_shape="circle" hover_effect="opacity" sm_screen_width="768" lg_screen_action="show" sm_screen_action="show" page_url="https://www.bigdatawire.com/2023/04/05/google-claims-its-tpu-v4-outperforms-nvidia-a100/" page_title="Google Claims Its TPU v4 Outperforms Nvidia A100" page_excerpt="

A new scientific paper from Google details the performance of its Cloud TPU v4 supercomputing platform, claiming it provides exascale performance for machine learning with boosted efficiency.

The authors of the research paper claim the TPU v4 is 1.2x–1.7x faster and uses 1.3x–1.9x less power than the Nvidia A100 in similar sized systems. Read more…

" share_counter=""]

Applications: Artificial Intelligence

Technologies: Cloud, Network, Processors, Systems

Vendors: Google Cloud, NVIDIA

Tags: AI, deep learning, Google Cloud, LLMs, machine learning, ML, TPU v4, TPUs

Google Claims Its TPU v4 Outperforms Nvidia A100

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

January 21, 2025

January 17, 2025

January 16, 2025

Sponsored Partner Content

CData recognized in the 2024 Gartner ® Magic Quadrant™ Report

Introducing AIStor, the most powerful version of MinIO to date

Designing a Copilot for Data Transformation

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Transforming Healthcare with Data

IDC Spotlight: Boosting AI Impact with Data Products

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors