Data Catalogs Take Center Stage in Eckerson CDO TechVent
If you’re in the market for a data catalog, join the party. By one estimate, nearly 93% of organizations have either deployed a data catalog or plan to. But as your buying decision gets closer, you might find yourself overwhelmed by the number of catalogs on the market and the capabilities they offer. That’s when you turn to a recent Eckerson CDO TechVent that focused on the popular product category.
The CDO TechVent is a brand new program launched by Eckerson Group late last year designed to dig deeply into specific products in the data and analytics space, to explore and compare their relative features, and provide intelligence and guidance on purchasing decisions. This first event, which was held December 15–and which included Datanami as a media partner–focused on data catalogs, a category of tool that we have covered extensively in these digital pages.
“We created CDO TechVent because we know how overwhelming data and analytics technology has become,” Eckerson Group president and founder Wayne Eckerson said during the event. “Our goal is to make it easy for you to compare leading data catalog products and perhaps more importantly, give you a sense of what’s possible with data catalog technology.”
Eckerson Group let out all the stops in its inaugural CDO TechVent. In addition to keynote addresses by Eckerson Group analyst Sanjeev Mohan, the event featured a panel discussion by four data catalog providers, including Alation, BigID, Data.world, and erwin. There were also virtual breakout rooms manned by representatives of these vendors and even a virtual product bake-off that allowed the attendees to compare their own needs with shrink-wrapped solutions. More than 100 folks attended the sessions, a recording of which can be found here.
According to an Eckerson survey completed just before the virtual event, just 10% of organizations have “fully deployed” a data catalog, but only 7% of organizations have zero interest in data catalogs, according to Mohan, a former analyst with Gartner.
“So I think the consensus is there: Data catalogs are red hot,” Mohan says. “There is no doubt about it.”
An Evolving Category
However, that does not mean that everything is hunky-dory in the data catalog world. Because of their heritage as IT-focused tools, there are still some legacy holdovers afflicting some products in the space, such as difficulty in deployment, which was a commonly cited drawback to data catalogs in Eckerson’s research.
As a whole, however, the category is moving away from that IT-focused legacy. SaaS deployments of data catalog tools are taking the hassle out of complex deployments, and many vendors offer services to help their clients get up and running quickly, Mohan says.
And instead of focusing on defensive use cases such as risk and compliance, which was the primary reason for deploying a data catalog in the past, the products today are being used to open up more offensive use cases for analytics and AI. That, in turn, has introduced new people to the tools, Mohan says.
“It used to be that there were only a few people who actually used the data catalog,” he says, “but now we are starting to see the data analysts, data scientists, even data engineers. We see that there is wider adoption in terms of number and types of users who are using it.”
The catalogs of old were static, and were primarily rules-based, whereas machine learning and AI are widely used in today’s data catalogs. Mohan noted that, in a few years, having algorithms do the first pass on data classification won’t be a big deal anymore. “This is the only way we can handle the vast amount of data that we will need,” he says.
Data catalogs increasingly are also becoming a place for collaboration to happen for data teams. Users benefit from being able to see metrics about the data, such as data quality scores, for the companies’ data. Integration with other data tools, such as master data management (MDM) and extract, transform, and load (ETL) tools–and increasingly with real-time streaming data–also helps to raise the profile of data catalogs in the enterprise.
What’s more, in many of the tools, users can get a preview of dashboards and other types of content that would normally be found in full-blown BI tools. “So for very quick analysis of data, you can visualize data from a source,” Mohan says. “Now, to do any major business intelligence reports and dashboard, you still use a BI tool. But we see that there is an expansion of the scope of data catalogs.”
Cataloging the Catalogs
As a Gartner analyst, Mohan immersed himself in Magic Quadrants, which are ubiquitous at the Connecticut firm. At Eckerson, Mohan devised a similar–but different–metric to rank the various data catalog products.
“This one is, if I may say so, a bit controversial because we have tried to put the vast number of data catalogs into some sort of framework,” he says. “It’s very similar to a Gartner Magic Quadrant, but it had nothing to do with it.”
Mohan’s quadrant divides the data catalogs by the degree of integration of functionality (standalone/pureplay or fully integrated) on the Y axis and the adoption level (emerging or established) on the X axis.
The quadrant with the fully integrated and well-established data catalogs has some familiar names, such as Alation, Collibra, Informatica, and IBM. Some of the emerging integrated data catalog vendors include Quest Erwin (Quest Software bought Erwin in January 2021), OneTrust, Ovaledge, Alex Solutions, Precisely, Zaloni, and Hitachi Vantara.
“Because they’re emerging, doesn’t mean they’re new,” Mohan adds. “Quest Erwin have been around for very long time, but they are now starting to expand their market presence.”
On the pureplay side of the aisle, Mohan lists some well-established catalog vendors, such as the aforementioned Data.world and BigID, along with catalogs from a host of tech giants, such as Microsoft Azure, Google Cloud, AWS, Oracle, SAP, and SAS. Emerging pureplay data catalog vendors include Promethium, Atlan (which just announced a $50 million Series B today), Boomi, Global IDs, MANTA Software, and Octopai.
Finally, Mohan brought out his bonus list, which is a collection of open source data catalogs such as Amundsen (backed by the commercial outfit Stemma), DataHub (backed by the commercial outfit Acryl Data), LF Egeria, and Apache Hive/Apache Atlas. These products were placed in the emerging pureplay quadrant.
That’s quite a list, but wait–Mohan has more! The analyst says he tracks 75 data catalog products, and so he decided to share some of the embedded data catalogs that are in the market. Not all of them are truly embedded, because they can be used standalone. But it’s worth getting their names out there, if only to demonstrate the breadth of development taking place in this very active space.
The vendor panel also had many good tips on how to select and deploy a data catalog–and some gotchas to watch out for. Stay tuned to Datanami for a future story on that. In the meantime, check out Eckerson Group’s next CDO TechVent, which is taking place April 26 on the related topic of data governance tools. You can register for that free event here.
Related Items:
Alation Adds $110M to Its Catalog
Data.world Aims to Rethink Data Catalogs
Google Enters Data Catalog Business, Updates BigQuery
Editor’s note: This article has been corrected. Mohan tracks 75 data catalog products, not 7,500. Datanami regrets the error.