Scaling to Great Heights at the Ray Summit
If you haven’t yet heard about Ray, the open source Python framework for building distributed applications, then next week’s Ray Summit will provide a compelling introduction to what might be one of the cornerstone technologies of the next decade.
Ray emerged several years ago from UC Berkeley’s RISELab with a goal of radically simplifying the process of developing distributed applications. The software was designed to support any application written in any language, but in practice it’s been used mostly with machine learning applications written in Python.
What it does sounds almost too good to be true. Instead of hiring a large team of engineers or Kubernetes experts to get an application running in a distributed manner on a large cluster, a single developer can enable their application to run in a parallel manner with the addition of a few lines of code and about 30 minutes of work.
“We want to give developers the experience of developing on their laptop, but with the power of the cloud,” says Robert Nishihara, the co-creator of Ray and the co-founder and CEO of Anyscale. “You don’t have to think hard about how to scale things. You don’t have to think a lot about how to configure a cluster or anything like that. We’re really trying to just make developing a distributed application and running them as easy as programming on your laptop.”
Nishihara completed his PhD at UC Berkeley about a year ago, and co-founded Anyscale with RISELab director Ion Stoica late last year. Since then, the Ray project and the Ray ecosystem has really started to flourish. The number of contributors to the Ray project has roughly doubled since December.
It’s also gaining traction at some of the biggest companies in the world. AWS has adopted Ray to help customers scale the models that customers create in Sagemaker, and Microsoft has done the same thing with Azure. Goldman Sachs and JP Morgan, two of the biggest banks in the world, are also Ray uses. So is Intel.
Representatives from some of these companies will speak about their use of the technology during next week’s Ray Summit, which Nishihara says will provide “an incredible blend of experts in machine learning systems, Python, and industry leaders.”
That may be an understatement. Sure, there might only be one Turing Award winner (UC Berkeley professor David Patterson) scheduled to speak during the three-day virtual event that starts Wednesday September 30. But for sheer concentration of brainpower, it’s tough to beat this lineup:
- Azalia Mirhoseini, a senior research scientist at Google Brain;
- Wes McKinney, creator of the Pandas project;
- Michael Jordan of UC Berkeley;
- Gaël Varoquaux, one of the leaders of the Scikit-learn project;
- Ion Stoica, leader of UC Berkeley’s RISELab and co-founder of AnyScale;
- Zoubin Ghahramani, chief scientist and VP of AI for Uber;
- Manuela Veloso, head of J.P. Morgan AI Research;
- Adrian Cockroft, VP of cloud architecture strategy for AWS;
- Oriol Vinyals, principal scientist for Google DeepMind;
- Charles He, chief architect of Ant Group
- and Raluca Ada Popa, UC Berkeley professor and co-director of RISELab.
“We’re really excited to showcase a lot of these use cases and how different companies are scaling up their application with Ray,” Nishihara says. “We’ll be announcing Ray 1.0, which is a big step forward in terms of the maturity and stability of the project. So that’s something we’re releasing at the Summit and are really excited about that. The other thing we’re showcasing is really this growing ecosystem around Ray.”
Ray is being adopted by a number of open source projects in the Python ecosystem, including natural language processing (NLP) libraries like HuggingFace and Spacy, which can now run in a distributed manner as a result of incorporating Ray. PyTorch is another Ray beneficiary, as are hyperparameter optimization libraries Hyperopt and Optuna. Horovod, Dask, Modin, and Mars are also using Ray.
Nishihara will also be showcasing a demo that gives viewers a peak at the future of distributed application development. The demo, which will take place Wednesday morning, will showcase how the combination of Ray for developing distributed applications and the AnyScale platform for running those applications could dramatically simplify the work lives of developers.
“Say you want to do some data processing, and maybe have a system for that. And then you want to use the data to train a model, and you have some other systems for that. And then you want to take the model and deploy it in production, you have another system for that,” Nishihara says.
“Instead of having to use separate distributed systems for all of those things, to be able to just import different libraries that are part of the same ecosystem, to do that all potentially in the same application–that would be really compelling,” he continues. “That’s something we are building, or working toward, with this ecosystem.”
There is no cost to attend Ray Summit. You can register at https://events.linuxfoundation.org/ray-summit/.
Related Items:
Anyscale Emerges from Stealth with Plan to Scale Ray
Why Every Python Developer Will Love Ray
Meet Ray, the Real-Time Machine-Learning Replacement for Spark