

(amiak/Shutterstock)
In addition to the explosion of data volumes, many organizations are struggling with an explosion in the number of data sources and data silos. Managing data in this fluid, ever-changing environment is a major challenge for would-be data-driven organizations, but one pattern that offers potential salvation for the stressed data architect is the data fabric.
Data fabrics aren’t new. We’ve been writing about them for several years here at Datanami. In the early days, the definition of a data fabric was a bit loose. But lately, it’s begun to harden and the core elements of a data fabric have coalesced into a configuration that’s finding traction in the real world.
Forrester analyst Noel Yuhanna was one of the early proponents of the data fabric. In the latest Forrester Wave: Enterprise Data Fabric, Q2 2022, Yuhanna dived into the benefits of the data fabric and dissected the offerings of 15 data fabric vendors.
“Today, delayed insights can have a devastating effect on a firm’s ability to win, serve, and retain customers,” Yuhanna wrote in the Wave report. “Organizations want real-time, consistent, connected, and trusted data to support their critical business operations and insights. However, new data sources, slow data movement between platforms, rigid data transformation workflows and governance rules, expanding data volume, and distributed data across clouds and on-premises, can cause organizations to fail when executing their data strategy.”
Centralizing all data in a data lake such as Hadoop or Amazon S3 was supposed to solve many of these problems, but it hasn’t worked out that way. Not every piece of data belongs in lakes, thanks to bandwidth and storage costs as well as sheer practicality. Technological progress also continues to churn out new digital innovations, and people are more than happy to try them out, which typically results in yet another data silo.
Data silos appear to be permanent houseguests. Just as Edwin Hubble’s raisin pudding analogy held that the expansion of the universe makes matter grow farther apart, the big data boom seems to be causing data repositories to drift further apart even as the overall volume of data continues expanding at a geometric rate. The data fabric is a way to layer some connective tissue among those sweet, sweet nuggets of data.
As Yuhanna wrote:
“Data fabric delivers a unified, integrated, and intelligent end-to-end data platform to support new and emerging use cases,” he continued. “It automates all data management functions–including ingestion, transformation, orchestration, governance, security, preparation, quality, and curation–enabling insights and analytics to accelerate use cases quickly.”
Data fabrics are essentially pre-integrated super-suites of data management tools. Instead of cobbling together separate products for handling the data functions that Yuhanna mentioned above (not to mention data catalogs), data fabrics deliver these functions through a single product, providing consistency and repeatability to big data management processes, which helps breeds trust in data and the analytics that come from it.
Yuhanna sees a lot of data fabrics being deployed in cloud and hybrid cloud environments at the moment, particularly in support of applications like customer 360, business 360, fraud detection, IoT analytics, and real-time insights. Data fabrics are being deployed across multiple industries, including financial services, retail, healthcare, manufacturing, oil and gas, and energy, he wrote.
Data fabrics are also being deployed in the life sciences industry, where they can help knit disparate data silos into a seamless whole. One life sciences company that’s betting big on data fabrics is eClinical Solutions, a Massachusetts-based provider of software for running clinical trials.
In the past, clinical trials might have involved three or four disparate data sources, according to Raj Indupuri, eClinical Solutions’ CEO.
“But now with research we end up for every trial, every trial might have 15+ different sources, whichi means different streams of data, different structures, different formats, and different systems,” Indupuri said. “So the problem in terms of data chaos–we refer to this as data chaos–has only exploded or increased.”
In Indupuri’s view, the data fabric is a natural evolution of the data lake, or the lakehouse. These flexible data repositories are able to ingest and store just about any type of data, giving customers or stakeholders the ability to transform, prepare, and analyze the data when they need to. But when data spans multiple data lakes (or warehouses or lakehouses), that is where data fabrics play an important role.
“One big difference would be, instead of having everything in one centralized location, with the data fabric, that is how do you actually combine different stores,” he told Datanami in a recent interview. “They could be distributed. But on top we have a fabric so that with governance and with other capabilities, we’re able to deliver analytics to end stakeholders efficiently, to deliver it to downstream to different stakeholders in different systems.”
eClinical Solutions has already build some components of a data fabric solution into its offering. It has built an end-to-end data pipeline in AWS that automatically extracts metadata and catalogs it when a new piece of data lands in the system, according to Indupuri. The company’s solution also includes a data management workbench where data managers can review and clean data.
“We evolved significantly over a decade or so,” he said. “When we first started, it was kind of a report. Then we evolved into a data lake type of architecture, where you can stage any data, regardless of the source. Then we have embedded capabilities where it’s metadata driven, and you can actually transform and publish data marts within our data cloud.”
Where it gets tricky is dealing with the data repositories of eClinical Solutions’ own customers, who are drug companies or companies doing drug exploration. These customers often have separate data lakes for clinical research, for operational data, for safety data, and for regulatory data, and are loathe to move or copy data between them.
“You can actually enable them to access data across these data stores, or these distributed data clouds or data lakes or data warehouses,” Indupuri said. “So that’s where data fabric can help.”
Related Items:
Data Mesh Vs. Data Fabric: Understanding the Differences
Data Fabrics Emerge to Soothe Cloud Data Management Nightmares
Big Data Fabrics Emerge to Ease Hadoop Pain
February 14, 2025
- Clarifai Unveils Control Center for Enhanced AI Visibility and Decision-Making
- EDB Strengthens Partner Program to Accelerate Postgres and AI Adoption Worldwide
- Workday Introduces Agent System of Record for AI Workforce Management
- Fujitsu Unveils Generative AI Cloud Platform with Data Security Focus
- NTT DATA Highlights AI Responsibility Gap as Leadership Fails to Keep Pace
- Gurobi AI Modeling Empowers Users with Accessible Optimization Resources
February 13, 2025
- SingleStore Unveils No-Code Solution Designed to Cut Data Migration from Days to Hours
- Databricks Announces Launch of SAP Databricks
- SAP Debuts Business Data Cloud with Databricks to Turbocharge Business AI
- Data Science Salon Kickstarts 2025 with DSS ATX Conference, Featuring AI Startup Showcase
- Hydrolix Achieves Amazon CloudFront Ready Designation
- Astronomer Launches Astro Observe to Unify Data Observability and Orchestration
- EU Launches InvestAI Initiative to Build AI Gigafactories Across Europe
- HPE Announces Shipment of Its First NVIDIA Grace Blackwell System
- IDC Celebrates 60 Years of Tech Intelligence at Directions 2025
- Lucidity Gains $21M to Scale AI-Driven Cloud Storage Optimization
- Glean Launches Open Security and Governance Partner Program for Enterprise AI
February 12, 2025
- OpenTelemetry Is Too Complicated, VictoriaMetrics Says
- What Are Reasoning Models and Why You Should Care
- Three Ways Data Products Empower Internal Users
- Keeping Data Private and Secure with Agentic AI
- Memgraph Bolsters AI Development with GraphRAG Support
- Three Data Challenges Leaders Need To Overcome to Successfully Implement AI
- Top-Down or Bottom-Up Data Model Design: Which is Best?
- Inside Nvidia’s New Desktop AI Box, ‘Project DIGITS’
- Data Catalogs Vs. Metadata Catalogs: What’s the Difference?
- From Monolith to Microservices: The Future of Apache Spark
- More Features…
- Meet MATA, an AI Research Assistant for Scientific Data
- AI Agent Claims 80% Reduction in Time to Complete Data Tasks
- DataRobot Expands AI Capabilities with Agnostiq Acquisition
- Collibra Bolsters Position in Fast-Moving AI Governance Field
- Snowflake Unleashes AI Agents to Unlock Enterprise Data
- Observo AI Raises $15M for Agentic AI-Powered Data Pipelines
- Anaconda’s Commercial Fee Is Paying Off, CEO Says
- Microsoft Open Sources Code Behind PostgreSQL-Based MongoDB Clone
- Confluent and Databricks Join Forces to Bridge AI’s Data Gap
- Mathematica Helps Crack Zodiac Killer’s Code
- More News In Brief…
- Informatica Reveals Surge in GenAI Investments as Nearly All Data Leaders Race Ahead
- Gartner Predicts 40% of Generative AI Solutions Will Be Multimodal By 2027
- PEAK:AIO Powers AI Data for University of Strathclyde’s MediForge Hub
- DataRobot Acquires Agnostiq to Accelerate Agentic AI Application Development
- TigerGraph Launches Savanna Cloud Platform to Scale Graph Analytics for AI
- EY and Microsoft Unveil AI Skills Passport to Bridge Workforce AI Training Gap
- Alluxio Enhances Enterprise AI with Version 3.5 for Faster Model Training
- DeepSeek-R1 models now available on AWS
- Lightning AI Brings DeepSeek to Private Enterprise Clouds with AI Hub
- Seagate Unveils IronWolf Pro 24TB Hard Drive for SMBs and Enterprises
- More This Just In…