It’s Time for MLOps Standards, Cloudera Says
Just as operational standards have been established for data management via DataOps, the industry needs to create open standards for machine learning operations, or MLOps, according to Cloudera, which today unveiled a call to action to the community to begin having that discussion.
The lack of open MLOps standards threatens to hamper the ability of organizations to effectively use machine learning models to improve their operations, says Santiago Giraldo Anduaga, a product marketing manager for data engineering at Cloudera.
“There a lot of organizations that are putting some models into production effectively. You can do this effectively with five to 10 models,” Anduaga said. “But what happens when an organization wants to ramp it up to 1,000 or 2,000 models? They need to sure they’re operating accurately on an ongoing basis.”
To solve the complex of MLOps-related problems when operating machine learning at scale, large firms like LinkedIn, Airbnb, and Uber have invested millions of dollars into building their own internal MLOps systems, Anduaga said. Many of firms, including Cloudera’s client Regions Financial Corporation, have also attempted to roll their own ML system.
Instead of each company spending small fortunes to build their own internal MLOps system to operationalize machine learning and AI, the entire world can benefit from defining the standards and building product out in the open, Anduaga said.
“What we’re trying to do is to bring this to the masses in a better way, saying you don’t have to invest millions or billions of dollars into defining these standards just to solve these type of problems,” he told Datanami. “We can do this as a community out in the open and actually bring that level of maturity into the industry, into our software, into our products, and into our workflows, regardless of who you are.”
The intent is twofold, Anduaga continued. “One is to get people involved. So people can come and contact us and learn about what we’re doing. We’re happy to share what we’re doing and have people contribute to it today,” he said. “The second is the creation of the actual community. We want to stand up a website that makes it very easy for people to pull these and contribute to it and see how they work and open up the hood and collaborate on them.”
Joining Cloudera in the call to action is Anaconda, which helped to standardize the Python data science ecosystem and develops an enterprise data science platform. According to Anaconda CEO Peter Wang, creating open MLOps standards would benefit customers.
“Open source and open APIs have powered the growth of data science in business. But deploying and managing models in production is often difficult because of technology sprawl and siloing,” Wang said in a press release. “Open standards for ML operations can reduce the clutter of proprietary technologies and give businesses the agility to focus on innovation. We are very pleased to see Cloudera lead the charge for this important next step.”
Cloudera, which launched a cloud-based machine learning product in September, recently committed to making all of its software open source. This month, the company launched a preview of an MLOps product, which ostensibly would be launched in 2020. According to Anduaga, Cloudera is aiming to fill gaps left by other solutions with its MLOps offering.
“What this product essentially does is bring together the aspects that we feel have been mismatched or not completed inside the machine learning world and open source community,” he said. “What we’re building into the product today are things like the ability to normalize machine learning metadata and monitoring capabilities to take a look at not just how the software is operating, but also mathematical factors to predict things like skew, drift, accuracy, or the need to retrain models.”
Cloudera would like to incorporate open standards defined by the community into its forthcoming MLOps product, or at least use them as a starting point for building its own solution (which would also be open source), Anduaga said. But Cloudera is under no illusion that its forthcoming product would address each and every MLOps challenge.
DataOps provides a good model for how the community can move forward with MLOps, according to Cloudera. In fact, it’s going further than that and positioning the Apache Atlas tool, which was ostensibly build for data governance, to also play a role in governing models in an MLOps project.
“Apache Atlas up to this point has been a good governance tool for data and defining things such as metadata standards and governance standards for data operations and data management,” Anduaga said. “We use data as a launching off point to begin to define machine learning metadata standards that are representatives of the unique challenges of acutely building and deploying them.”
Specifically, the Atlas model of data governance can play a role in how an MLOps tool can track things like model lineages, to serve as a catalog for models, Anganga said. It’s all about “normalizing some of these things we’ve seen repeated out in the wild by data scientists in a way that makes sense, that dent’ have to walk the line between the standards that exist for data and this Wild West that exits for machine learning operations,” he said. “It’s about bridging that gap and saying, the same way we have these standards for data, let’s create those same standard for how we talk about and how we exercise machine learning models inside of software.”
Atlas can provide a good repsoistory for governing machine learning models, said Doug Cutting, chief architect at Cloudera and the co-creator of Apache Hadoop and Apache Lucene.
“At Cloudera, we don’t want to solve the challenge of deploying and governing machine learning models at scale only for our customers,” Cutting said in a press release. “We agree it needs to be addressed at the industry level. Apache Atlas is the best positioned framework to integrate data management and explainable, interoperable, and reproducible MLOps workflows.”
Cloudera encourages customers to join the conversation by sending an email to [email protected].
Related Items:
How Cloudera Is Battling Shadow IT with CDP
Back to Basics: Governance, Quality, Security Grab the Spotlight at Strata Data Conference
ParallelM Aims to Close the Gap in ML Operationalization