
NVIDIA Pushes Boundaries of Apache Spark With RAPIDS and Project Aether

Apache Spark is one of the most widely used tools in the big data space. It excels at processing massive datasets for predictive modeling, fraud detection, and real-time analytics. As the demand for processing and understanding data continues to grow, enterprises are seeking more efficient ways to handle ever-increasing workloads.
Some of the largest companies in the world have turned to NVIDIA RAPIDS Accelerator for Apache Spark to address the growing challenges of processing massive datasets efficiently. The open-source plug-in, built on NVIDIA’s accelerated computing platform, is designed to make the data science and analytics process faster and more effective. Nvidia claims the tool enables users to manage complete data pipelines without requiring any modifications to their existing Spark code.
This week at the GTC 2025, Nvidia introduced Project Aether to make it even easier for companies to get value out of NVIDIA-accelerated Spark. Project Aether is a set of tools and processes created by the chip manufacturer to streamline data processing, offering substantial time and cost savings, according to the company.
In a blog post introducing the new innovation, Nvidia shared, “Project Aether automates the myriad steps that companies previously have done manually, including analyzing all of their Spark jobs to identify the best candidates for GPU acceleration, as well as staging and performing test runs of each job. It uses AI to fine-tune the configuration of each job to obtain the maximum performance.”
Project Aether simplifies what was once a tedious, manual process of transitioning from CPU-based systems to GPU-powered computing. By utilizing AI, it analyzes and adjusts Spark job configurations to maximize performance. Nvidia claims that the tool allows users to do “year’s worth of work in less than a week”.
Migrating Apache workloads has traditionally been a highly manual process. Users often had to analyze Spark jobs individually, determine which workloads would benefit from GPU acceleration, and then configure and run tests to optimize performance. Staging the selected workloads or adjusting the configuration further added to the complexity.
Now, with Project Ather, users can automate several steps of the process. According to Nvidia, if 100 Spark jobs require an engineer to work the entire year, Project Aether can complete each of the jobs within four days. This includes fine-tuning the configuration of the jobs for maximum Nvidia GPU acceleration.
How is this possible? Nvidia shared a case study where Australia’s largest financial institution, the Commonwealth Bank of Australia (CBA), benefitted significantly from using NVIDIA-Accelerated Apache Spark.
CBA, responsible for processing 60% of the continent’s financial transactions, faced challenges related to latency and costs running its Spark workloads. The bank was using CPU-only computing clusters and faced almost nine years of processing time in terms of training backlog, not including the time needed to handle daily data demands, which is estimated to be around 40 million transactions.
By utilizing RAPIDS Accelerator for Apache Spark on GPU-powered systems, CBA achieved a significant 640x improvement in performance. Nvidia shared that the bank completed the processing of 6.3 billion transactions for training in only five days. Additionally, CBA can now conduct inference in as little as 46 minutes and is able to reduce its costs by 80%. These results could be even more impressive with Project Aether in play.
According to McMullan, one of the advantages of using NVIDIA-accelerated Apache Spark is the ability to reduce computation time, which allows his team to create models more efficiently and at a lower cost. This means that CBA can enhance its customer service by predicting when customers may require help with its products and services.
The bank plans on taking this further by analyzing the customer’s digital journey and determining where they tend to abandon the digital process.
Several other companies are also leveraging NVIDIA RAPIDS Accelerator for Apache Spark to enhance data processing efficiency and reduce costs. Dell Technologies has announced that it is incorporating the RAPIDS Accelerator for Apache Spark into its Dell Data Lakehouse platform.
According to Dell, the core benefits of using NVIDIA RAPIDS Accelerator for Apache Spark include a massive increase in speeds, cost savings, scalability, and a unified acceleration that combines CPU and GPU processes.
“The integration of NVIDIA RAPIDS Accelerator for Apache Spark into Dell Data Lakehouse isn’t just an incremental improvement — it’s a forward-looking advancement for businesses ready to meet today’s demands and tomorrow’s scale,” shared Dell. “By reducing data complexity and accelerating AI workflows, companies can fuel growth and drive success in increasingly data-driven markets.”
Related Items
From Monolith to Microservices: The Future of Apache Spark
Apache Spark Is Great, But It’s Not Perfect
The Rise of Intelligent Machines: Nvidia Accelerates Physical AI Progress