Follow BigDATAwire:

December 9, 2024

NVIDIA’s Blackwell Showcases the Future of AI Is Water-Cooled – For Now

Rob Enderle

(Lucas Koenig/Shutterstock)

NVIDIA’s Blackwell processor is a game changer. It is also incredibly dense and it runs hot. Apparently, this heat doesn’t become a big problem until you have a whopping 72 of the processors in a rack, but if you get to that density, air cooling just doesn’t do it anymore, so NVIDIA has released a spec rack that is water-cooled. Vendors like Dell are rapidly bringing out Blackwell servers using this, to them, new method.

On the other hand, Lenovo has argued for some time that data centers need to shift to water cooling, so it is in the lead, particularly with regard to Blackwell with its unique Neptune water cooling system. When Lenovo bought IBM’s X86 server business, it also got access to IBM’s advanced water-cooling technology, and it has leveraged that competitive edge as the current leader in that class in water cooling. Because when it comes to mixing electronics and water, you don’t want a novice. Water leaks in high amperage electronics can not only be damaging to the equipment, it can also be deadly to people.

Blackwell’s Massive Popularity

Blackwell is incredibly popular as a way to rapidly scale AI performance, so much so that NVIDIA is having trouble keeping up with demand (once again pointing to the need for more processor manufacturing facilities called FABs and foundries).

The reason behind Blackwell’s popularity is that it is a uniquely designed part by the hardware company that led the charge into generative AI, and they got to this leadership position by seeing the potential of AI about the same time IBM did, and then, unlike IBM, pretty much bet the farm on advancing the technology with no idea when or how it might become viable.

Jensen Huang, NVIDIA’s CEO, admitted that had he been leading any other company but the one he founded, he’d have likely been fired because it looked like he was throwing massive amounts of money into a black hole. Well, that black hole became a money fountain last year and turned NVIDIA into the most valuable company in the world, surpassing Apple.

However, Blackwell is just an early step into our AI future, and we know that as processors advance, they get even hotter and denser.

Why Future Data Centers Will Need to Be Water Cooled

Yes, it takes 72 processors before you have to water-cool the result, but each Blackwell throws off a lot of heat that can degrade server components over time. In addition, when using air cooling, you have to increase the air velocity as the item you are trying to cool heats up. This tends to turn datacenters into loud, hot rooms that no one really wants to work in, and with this kind of heat, there are dangers of injury to those working on operating servers.

Nvidia is using water cooling in its latest generation Blackwell gear (Image courtesy Nvidia)

As the follow-on to Blackwell comes to market along with competing parts from vendors like AMD and Intel, the need to cool the resulting servers will only increase due to the resulting density of these new parts, suggesting that very soon, air-cooled servers will become obsolete.

The good news is that current best practices for water cooling systems like Lenovo’s Neptune use warm, not cold-water cooling, which reduces substantially the cost of installing and maintaining the resulting servers. It reduces water waste as well, making the approach more environmentally friendly, and also uses less power.

While initial water-cooled systems focused on processors and memory, increasingly they are picking up more and more parts of the server like the power supplies. This is gradually turning these once hostile environments for employees into far more livable ones while potentially increasing the service life of the more effectively cooled components.

Wrapping Up: Warm Water-Cooled Data Centers

This brings me to my conclusion that as we aggressively deploy AI in our companies, the need for warm-water cooling will only increase, and planning for this in advance with vendors who understand and have a long history of bringing water-cooled solutions to market becomes increasingly important.

As I mentioned above, when mixing water and electronics, you don’t want the install team to be learning from their mistakes, you want them to already be educated. Otherwise, they might leave off a critical part that will keep your servers operating and your ever more critical AI applications running.

So, I’d advise planning to implement warm water-cooled datacenters in the second half of this decade because that’s exactly what you are likely going to need to do unless you plan to fully outsource AI to a Cloud service. While that’s a popular option, it may not provide the intellectual property protection that the CIO needs to see. Given smaller businesses are likely to go exclusively to the Cloud, I have my doubts whether these massive datacenters can keep up with the demands of an enterprise, which suggests enterprises likely need to put their most critical AI systems on premise.

Thus, the future of your datacenter is likely warm water-cooled.

About the author: As President and Principal Analyst of the Enderle Group, Rob Enderle provides regional and global companies with guidance in how to create credible dialogue with the market, target customer needs, create new business opportunities, anticipate technology changes, select vendors and products, and practice zero dollar marketing. For over 20 years Rob has worked for and with companies like Microsoft, HP, IBM, Dell, Toshiba, Gateway, Sony, USAA, Texas Instruments, AMD, Intel, Credit Suisse First Boston, ROLM, and Siemens.

Related Items:

NVIDIA Is Increasingly the Secret Sauce in AI Deployments, But You Still Need Experience

How AI Could Be Used to Improve Talent Acquisition and Management

Two Paths to AI Product Development Success

BigDATAwire