New PyTorch 2.0 Compiler Promises Big Speedup for AI Developers
Machine learning and AI developers are eager to get their hands on PyTorch 2.0, which was unveiled in late 2022 and is due to become available this month. Among the features greeting eager ML developers is a compiler as well as new optimizations for CPUs.
PyTorch is a popular machine learning library developed by Facebook’s AI Research lab (FAIR) and released to open source in 2016. The Python-based library, which was developed atop the Torch scientific computing framework, is used to build and train neural networks, such as those used for large language models (LLMs), such as GPT-4, and computer vision applications.
The first experimental release of PyTorch 2.0 was unveiled in December by the PyTorch Foundation, which was set up under the Linux Foundation just three months earlier. Now the PyTorch Foundation is gearing up to launch the first stable release of PyTorch 2.0 this month.
Among the biggest enhancements in PyTorch 2.0 is torch.compile. According to the PyTorch Foundation, the new compiler is designed to be much faster than the previous on-the-fly generation of code offered in the default “eager mode” in PyTorch 1.0.
The new compiler wraps a number of technologies into the library, including TorchDynamo, AOTAutograd, PrimTorch and TorchInductor. All of these were developed in Python, as opposed to C++ (which Python is compatible with). The launch of 2.0 “starts the move” back to Python from C++, the PyTorch Foundation says, adding “this is a substantial new direction for PyTorch.”
“From day one, we knew the performance limits of eager execution,” the PyTorch Foundation writes. “In July 2017, we started our first research project into developing a Compiler for PyTorch. The compiler needed to make a PyTorch program fast, but not at the cost of the PyTorch experience. Our key criteria was to preserve certain kinds of flexibility–support for dynamic shapes and dynamic programs which researchers use in various stages of exploration.”
The PyTorch Foundation expects users to start in the non-compiled “eager mode,” which uses dynamic on-the-fly code generator, and is still available in 2.0. But it expects the developers to quickly move up to the compiled mode using the porch.compile command, which can be done with the addition of a single line of code, it says.
Users can expect to see a 43% boost in compilation time with 2.0 over 1.0, according to the PyTorch Foundation. This number comes from benchmark tests that PyTorch Foundation ran using PyTorch 2.0 on an Nvidia A100 GPU against 163 open source models, including HuggingFace Tranformers, TIMM, and TorchBench.
According to PyTorch Foundation, the new compiler ran 21% faster when using Float32 precision mode and ran 51% faster when using Automatic Mixed Precision (AMP) mode. The new torch.compile mode worked 93% of the time, the foundation said.
“In the roadmap of PyTorch 2.x we hope to push the compiled mode further and further in terms of performance and scalability. Some of this work is in-flight,” the PyTorch Foundation said. “Some of this work has not started yet. Some of this work is what we hope to see, but don’t have the bandwidth to do ourselves.”
One of the companies helping to develop PyTorch 2.0 is Intel. The chipmaker contributed to various parts of the new compiler stack, including TorchInductor, GNN, INT8 inference optimization, and the oneDNN Graph API.
Intel’s Susan Kahler, who works on AI/ML products and solutions, described the contributions to the new compiler in a blog.
“The TorchInductor CPU backend is sped up by leveraging the technologies from the Intel Extension for PyTorch for Conv/GEMM ops with post-op fusion and weight prepacking, and PyTorch ATen CPU kernels for memory-bound ops with explicit vectorization on top of OpenMP-based thread parallelization,” she wrote.
PyTorch and Google’s TensorFlow are the two most popular deep learning frameworks. Thousands of organizations around the world are developing deep learning applications using PyTorch, and it’s use is growing.
The launch of PyTorch 2.0 will help to accelerate development of deep learning and AI applications, says Luca Antiga the CTO of Lightning AI and one of the primary maintainers of PyTorch Lightning
“PyTorch 2.0 embodies the future of deep learning frameworks,” Antiga says. “The possibility to capture a PyTorch program with effectively no user intervention and get massive on-device speedups and program manipulation out of the box unlocks a whole new dimension for AI developers.”
Related Items:
GPT-4 Has Arrived: Here’s What to Know
OpenXLA Delivers Flexibility for ML Apps
PyTorch Upgrades to Cloud TPUs, Links to R