Follow BigDATAwire:

People to Watch 2024 – Josh Patterson

Josh Patterson

CEO & Co-founder of Voltron Data












Voltron Data states that its Theseus product is for “petabyte-scale ETL.”  Why have we not been able to move beyond ETL after all these years?

A single system can’t handle all tasks today; especially as analytics and ML become more complex, there are specialized systems optimized for specific workloads. We see this in the rise of GPUs for AI. Given this continual evolution and complexity, ETL evolves into a crucial service for managing these divergent systems, and it’s now the bottleneck.

When AI/ML training adopted hardware accelerators like GPUs, it improved AI system performance by 100,000x. However, data preprocessing is still on CPUs, and performance has only grown 10X in the last decade. Organizations at the forefront of AI are constrained by data processing because they cannot afford to build out big data CPU clusters fast enough. The performance divergence between GPUs and CPUs is getting exponentially worse. Only Theseus, Voltron Data’s accelerator-native data analytics engine, is achieving a 60x performance increase with 50x cost savings leveraging the same accelerators used in AI. Until we find one singular way to draw intelligence from data, we’ll always have ETL, which will continually need to get faster and more efficient.

How did your experience working on RAPIDS at Nvidia help prepare you for Voltron Data?

My time at NVIDIA where I launched RAPIDS (an open source suite of data processing and ML libraries designed to enable data science workflows on GPU) was like working at a massive startup. It moved faster than most enterprises, focused on cutting-edge technology, pioneered new use cases and tapped into previously non-existent industries. We were relentlessly innovating.

With RAPIDS, we constantly thought of ways to accelerate adoption and maturity. Leveraging the open standards ecosystem, such as Apache Arrow, allowed us to accelerate our development and truly focus on innovation instead of redoing things that already existed – a philosophy that continues at Voltron Data today.

What role do you see Voltron Data filling in the Python data ecosystem in the years to come?

With projects like Ibis, pyArrow, and ADBC, we expect the open standards we build, promote, and maintain will underpin the Python data ecosystem. In addition, standards like Arrow and Substrait exist to support a multitude of languages beyond the pythonic ecosystems. 

Bridging these language divides so enterprises can scale out and integrate their myriad of data ecosystems is central to Voltron Data’s mission to bring a new way to design and build data systems.

 Outside of the professional sphere, what can you share about yourself that your colleagues might be surprised to learn – any unique hobbies or stories?

Most people don’t know that I come from a long line of builders. Early in my career, I was a licensed general contractor and still enjoy building things around the house or with my family.


BigDATAwire