AWS Goes Big on AI with Project Rainier Super and Nova FMs
At AWS re:Invent 2024 in Las Vegas, Amazon unveiled a series of transformative AI initiatives, including the development of one of the world’s largest AI supercomputers in partnership with Anthropic, the introduction of the Nova series of AI foundation models, and the availability of the Trainium2 AI chip, positioning itself as a formidable competitor in the artificial intelligence landscape.
Amazon CEO Andy Jassy emphasized the critical role of cost efficiency in generative AI development, highlighting the industry’s growing demand for alternative AI infrastructure solutions that deliver better price performance.
“One of the big lessons that we’ve learned from having about 1,000 generative AI applications that we’re either in the process of building or have launched at Amazon, is that the cost of compute in these generative AI applications really matters, and is often the difference maker of whether you can do it or you can’t,” Jassy said in a recap video. “And to date, all of us have used just one chip in the compute for generative AI. And people are hungry for better price performance.”
Project Rainier
AWS announced Project Rainier, a groundbreaking “Ultracluster” supercomputer powered by its Trainium chips. This massive cluster will contain hundreds of thousands of Trainium2 chips, delivering more than five times the exaflops used to train Anthropic’s current generation of AI models.
AWS Trainium chips are positioned as a direct competitor to the Nvidia GPUs currently dominating the market. Project Rainier, set to be completed in 2025, could potentially set new records for size and performance.
The announcement has already excited investors, with Amazon’s stock price rising more than 1% to nearly $213 following the news. A key partner in this venture is AI startup Anthropic, valued at $18 billion. AWS has invested $8 billion in the company, and Anthropic plans to leverage Project Rainier to train its AI models. The two firms are also working together to enhance the capabilities of Amazon’s Trainium chips, signaling a deep integration of R&D efforts.
At the same time, AWS is advancing Project Ceiba, another supercomputer initiative developed in collaboration with Nvidia. Project Ceiba will feature over 20,000 Nvidia Blackwell GPUs, emphasizing AWS’s strategy to diversify its AI infrastructure offerings. While Rainier focuses on Trainium chip adoption, Ceiba highlights AWS’s ability to work with other industry leaders to support diverse AI workloads.
Amazon Nova, A New Generation of Foundation Models
The company introduced its Nova family of foundation models, spanning from lightweight text-only models to larger and more advanced language models, as well as models designed to generate images and videos.
The new Nova models will be available in Amazon Bedrock, the company’s platform for building generative AI apps.
The new models include:
- Amazon Nova Micro (a very fast, text-to-text model)
- Amazon Nova Lite, Amazon Nova Pro, and Amazon Nova Premier (multi-modal models that can process text, images, and videos to generate text)
- Amazon Nova Canvas (which generates studio-quality images)
- Amazon Nova Reel (which generates studio-quality videos).
“Our new Amazon Nova models are intended to help with these challenges for internal and external builders, and provide compelling intelligence and content generation while also delivering meaningful progress on latency, cost-effectiveness, customization, retrieval augmented generation (RAG), and agentic capabilities,” said Rohit Prasad, SVP of Amazon Artificial General Intelligence.
Jassy says the company has made “tremendous” progress on its new frontier models, noting how “they benchmark very competitively” and are cost-effective and fast: “They’re 75% less expensive than the other leading models in Bedrock. They are laser fast. They’re the fastest models you’re going to find there,” he said. “Nova models allow you to do fine tuning, and increasingly, our application builders for generative AI want to fine-tune the models with their own label data and examples. It allows you to do model distillation, which means taking a big model and infusing that intelligence in a smaller model, so that you get lower latency and lower cost.”
Addressing the fight against hallucinations and inaccuracy, AWS says Amazon Nova models are integrated with Amazon Bedrock Knowledge Bases and excel at Retrieval Augmented Generation (RAG), enabling customers to ensure the best accuracy by grounding responses in an organization’s own data.
Trainium Gets an Upgrade
Powering these exciting developments are AWS’s Trainium2 chips, now available through two new cloud services. The company announced the general availability of AWS Trainium2-powered Amazon Elastic Compute Cloud (Amazon EC2) instances, as well as new Trn2 UltraServers.
The company says these instances deliver 30–40% better price performance compared to the current generation of GPU-based EC2 P5e and P5en instances. Equipped with 16 Trainium2 chips, Trn2 instances offer 20.8 peak petaflops of compute, making them ready for training and deploying billion-parameter LLMs.
The new EC2 Trn2 UltraServers feature 64 interconnected Trainium2 chips connected via the NeuronLink interconnect. With up to 83.2 peak petaflops of compute, the UltraServers quadruple the compute, memory, and networking of a single instance.
Looking ahead, AWS unveiled its next-generation AI chip, Trainium3. This chip is designed to accelerate the development of even larger models and enhance real-time performance during deployment. Trainium3 will be available next year and will be up to twice as fast as the existing Trainium2 while being 40% more energy-efficient, AWS CEO Matt Garman revealed during his keynote on Tuesday.
The growing adoption of Trainium chips by major players, including Apple, adds to the company’s momentum. Benoit Dupin, Apple’s senior director of machine learning and AI, revealed plans to incorporate Trainium into Apple Intelligence, Apple’s AI technology platform.
These latest developments underscore AWS’s dual approach to its AI plans: innovating through proprietary technologies like Trainium while partnering with established players like Nvidia to provide comprehensive AI offerings. As AWS continues to expand its influence in AI computing, its investments and collaborations look to be setting the stage for significant industry disruption.
Related Items:
Amazon Taps Automated Reasoning to Safeguard Critical AI Systems
AWS Expands Sagemaker To Combine Data, Analytics, and AI Capabilities
Five Things to Look For at AWS re:Invent 2024
Editor’s note: This article first appeared in BigDATAwire‘s sister publication, AIwire.