Wanted: An Integration Layer for Data Management That Actually Works
As data silos continues to expand and data management difficulties continue to compound, a consensus is starting to build that what the industry needs now is a standard, repeatable, and efficient way of managing all that data. The question now is whether that layer will emerge at the top of the stack, at the bottom, or somewhere in between.
Since the early days of vacuum tubes and punch cards, integration has been a thorn in the side of enterprise IT. Whether it’s integrating applications or getting data formats to line up, success with integration invariably comes down to hours of manual labor.
Much of the perpetual integration gap can be traced to the natural evolution of IT. Creative folks are always one step ahead of the pack. They’re building groundbreaking, bespoke applications, and they’re using data in new and creative ways to solve problems and gain a competitive edge. Ensuring that all that new stuff fits in nicely with all the old stuff isn’t really conducive to the task of reimagining what’s possible, disrupting the status quo, and making a billion bucks. In fact, it’s diametrically opposed to it, by design.
So what’s the modern enterprise to do? The traditional path has been to pay system integrators (SIs) millions of dollars to manually stitch the one-off systems together. This was the case when customized ERP systems needed to talk to customized CRM systems, and it’s still the case today when custom-developed streaming data systems needs to integrate with custom-built cloud data warehouses.
The sharp end of the development stick will always be one-step ahead of the folks whose job is to force round pegs into square holes. As new stuff becomes more widely adopted, the sharp edges wear off and it plays more nicely with all the stuff that came before it. When it comes to big data, advanced analytics, and AI, we’re currently in that phase where there’s been a lot of development over the past 10 years, and now companies are looking for ways to make it work with everything else they have.
Today, there are two architectural patterns emerging that have the potential to shake up this status quo and deliver more repeatable automation to the data management issue: data fabrics and data meshes. While there are some similarities between data fabrics and data meshes, there are important differences between the two, which will likely influence the eventual success of each approach.
Data Fabrics
Data fabric addresses the data management issue by logically linking the various data management tools together. The data catalog, security, governance, quality, lineage, master data management (MDM), and extract, transform and load (ETL/ELT) products are physically connected together at the metadata layer, so that each tool knows what data the other tools have. This integration work is either done by the data integration tool developer or by the SI.
Companies are getting real value out of data fabrics and are able to manage their data with more efficiency and centralization, even if the data itself is spread across many silos, according to Forrester Vice President and Principal Analyst Noel Yuhanna, who has been instrumental in defining the product category. We’re now on the cusp of the second major wave of data fabric adoption, which will be marked by more use of graph engines and knowledge graphs to help manage information, he says.
Currently, about 65% of the data fabric deployments are done by SIs, Yuhanna says. That number is down from about 80% just a few years ago, and soon will drop below 50%, he says. Data fabric vendors like Informatica, IBM, and Talend (among others) are doing the work to integrate the various tools that compose a data fabric, there by eliminating the need for customers to do that integration work themselves.
As the data fabric pattern becomes more commonplace, Yuhanna says, it has the potential to become just another feature available to an organization when it signs into clouds to manage data. However, unless an organization has all of their data with just one cloud and only uses the data management tools from that cloud vendor, there will be a need to ensure that the various tools work together.
Perhaps what the market needs is a data fabric standard, a protocol that every vendor in the space adheres to that guarantees (or at least increases the odds) that the data management tools will play together nicely. This would allow organizations to pick and choose which data management tools they want to use, thereby eliminating the need to buy a complete data fabric suite from a single vendor.
This is where data fabrics still have some work to do, Yuhanna says. “We can make it semantically driven thorough some sort of data fabric or marketplaces or data services, which are still kind of evolving,” he says. “Standards play a big role in this equation. Obviously, people are using JSON standards and SQL accesses towards doing it, and ODBC and JDBC connectivity. We’re still improving some of these things. But I think it’s a good start.”
Data Meshes
The data mesh approach provides some of the same benefits as data fabrics. With a data mesh, an organization can enable independent teams of data product developers to access enterprise data in a governed and self-service manner, thereby helping to unleash the promise of data while avoiding data chaos by abiding by some ground rules.
Zhamak Dehghani, who spearheaded the data mesh concept several years ago while working at
Thoughtworks North America, recently launched a new company that aims to help organizations get their data meshes off the ground with a packaged offering. Nextdata, as the company is called, is developing containerized middleware that enables developers to build and deploy data products in a simple yet governed manner while automating some management tasks.
The idea with Nextdata is to create a higher abstraction level that simplifies management and governance tasks for data product developers, who are currently struggling to stitch everything together. “We’re going through this Cambrian explosion of so many tools and features,” Dehghani says. “The universe around them is just so disorienting.”
Just as the advent of microservices and REST APIs helped to simplify integration for enterprise application developers, the data mesh will provide a dial tone that data scientists and AI developers can rely on to keep all the required pieces from falling apart for lack of expertise at each layer of the stack.
“The world requires a set of technology that makes it feasible for an average tech group within a business domain to be able to share analytics and analytical data or data for AI and ML,” Dehghani tells Datanami. “There has to be some sort of technology that allows me interoperability, because naturally these data products will be built on different technology stacks. So if it’s interoperable, I can still use that data for AI and analytics. And most importantly, there is some form of governance and policy as code built into this, so that we don’t end up with data that nobody can trust because there is no governance over it.”
Top Down or Bottoms Up?
Data meshes and data fabrics tackle some of the same issues, and so they are often conflated. However, they represent two fundamentally different approaches.
The data fabric is more of a top-down approach that leans heavily on centralization. While physical centralization of data is no longer feasible, many enterprises demand a centralization and a standardization of data management and governance policies. The data fabric, then, becomes the metadata-driven expression of those policies, as defined by the specific processes controlled in the constituent pieces of the fabric.
On the other hand, the data mesh is more of a bottoms-up approach that leans more heavily on decentralization. By giving disparate groups of analysts and AI/ML product developers access to data that has management and governance built-in, so to speak, the productivity of the developers can be unleashed without the management and governance pain that would usually come along with this approach.
The two approaches are not incompatible, according to Yuhanna. Some companies are building domain-specific data fabrics and enduing up with the data mesh, he says. “If you build one data fabric with customer domains, and then build another fabric with another domain, like product domain and supplier domain–once you have all those domains being built, you have a mesh architecture,” the Forrester analyst says. “Mature organizations are heading toward that route of having multiple fabrics, which represents a data mesh.”
Dehghani expresses skepticism of the data fabric approach, in particular the component where the various consitituent pieces of the data fabric are stitched together, the metadata.
“You can’t just smear a metadata layer on top and say I’ve got good quality trustworthy data. I don’t think you have a data mesh by doing that,” she says. “Data mesh is the source, where the data gets generated. It’s providing reliable information, live, near real time, just from the source…Let’s fix from the source!”
It’s too early to tell whether one approach will win out over the other. Both the data fabric and the data mesh approaches are generating interest and garnering close looks by enterprises faced with thorny, ever-present data and application integration issues. Enterprises that have spent freely on new analytics and ML tooling are looking to get them integrated into their existing stacks. Something resembling the modern data stack has emerged over the past 10 years, but tweaks and changes are an ever-present reality.
One thing is for certain, however: As the data pours in and the analytics and ML tools keep evolving, it’s just a matter of time before the next great leap of innovation occurs, and that, too, must be integrated into the stack.
Related Items:
Inside Nextdata’s Plans for a Data Mesh Offering
Data Fabric Maturation Means More Shrinkwrap, Fewer Consultants
Forrester Shares the 411 on Data Fabric 2.0