Six Reasons Why Enterprises Need a Modern Data Integration Architecture
Data is instigating change and giving rise to a new data-driven economy that is still in its infancy. Organizations across industries increasingly recognize that monetizing data is crucial to maintaining a competitive edge and are adapting their business models accordingly. There is tremendous opportunity, as applications, instrumented devices and web traffic are throwing off reams of 1s and 0s, rich in analytics potential.
Enterprises are using analytics to reshape sales, operations, and strategy on many fronts. Real-time processing of customer data is creating new revenue opportunities. Tracking devices with Internet of Things (IoT) sensors are enhancing operational efficiency, reducing risk and yielding new insights. New Artificial Intelligence (AI) approaches such as machine learning are accelerating and improving the accuracy of business predictions.
However, to overcome infrastructure challenges, win more customers, increase revenue, remain competitive, and streamline operations, today’s enterprises must be able to integrate and replicate their data. This means accelerating processes with modern, real-time data integration solutions that enable analysis of the right data, at the right time and in the right place. Traditional data integration tools, like ETL (extract, transform, load) or even newer tools like Sqoop, simply aren’t cutting it in the progressive enterprise. Their outdated approaches are brittle, require manual scripting and can’t withstand the immensity of big data velocities and volumes.
While data holds tremendous potential to help businesses, it is no longer acceptable to wait weeks or even days to analyze and respond to new opportunities and threats. Users need to access information in real-time regardless if it comes from a 30-year-old mainframe application or the latest cloud-based open source infrastructure. To handle this requirement, many enterprises are modernizing their businesses with innovative technology approaches that present data quickly and without impacting any of the existing infrastructure. And one solution, Change Data Capture (CDC), has reignited the imagination of teams that are struggling to define solutions for emerging data lake, streaming and cloud platforms that will support this new set of analytics requirements.
Here are six steps that can help enterprise architects and data managers build a modern data architecture that successfully incorporates data integration, replication and migration into a comprehensive strategy:
1. Move from Batch to Real-Time
Efficiently replicate different data types across heterogeneous sources and targets, including databases, data warehouses, Hadoop or the cloud, by using a CDC approach that natively reads changes from the source transaction logs and is optimized for the specific target endpoints.
CDC has three primary advantages over batch replication:
- It enables faster and more accurate decisions based on the most current data;
- It minimizes disruptions to production workloads;
- It reduces the cost of transferring data over the Wide Area Network (WAN) by sending only incremental changes.
Together these advantages enable IT teams to meet the real-time, efficiency, scalability, and zero-production impact requirements of a modern data architecture.
2. Take a Centralized Approach to Integration
Large organizations are complex, often containing many different data sources being fed into disparate warehouses or lakes which can make data integration hard to manage. Having a centralized logical view allows users to configure, monitor and manage all replication tasks throughout the IT environment.
This is especially critical in large organizations, with huge implementations containing hundreds or thousands of concurrent tasks – for example one large credit services firm is able to identify and apply 14 million source changes to a new targeted data system in a mere 30 seconds.
3. Remove the Latency from Hybrid Environments
Time is money. In a 2017 study entitled The Half-Life of Data by Nucleus Research, operational data was found on average to lose about half its value after 8 hours. These decisions, which range from improvements to customer service to inventory stocking and overall organizational efficiency, benefit from real-time access to data.
Most enterprises have a patchwork of the data infrastructure used to support different workloads. A trial-and-error learning process, changing business requirements and the rapid change in emerging data lake platforms all increase the latency as data is copied from one place to another.
This was the case with one global insurance enterprise in London. The firm faced severe data latency issues when replicating data across business systems. However, with a CDC-based approach they can now replicate 100 gigabytes (GB) of data for near real-time transfer of data into existing and emerging business-critical systems.
4. Publish Data Streams from Core Transactional systems
Data streaming technologies like Kafka or Azure Event Hubs were conceived to create real-time data pipelines. But to realize the value of these technologies, you need to feed data from a wide variety of systems.
CDC is a perfect complement as it creates real-time data streams from core transactional systems including databases, mainframes and SAP applications. When used in conjunction, streaming and CDC deliver real-time analytics and thereby capitalize on data value that is perishable. In fact, it is a critical part of one Fortune 250 healthcare solution provider as they strive to improve the quality of care.
The CDC technology allows the analytics team to quickly consolidate clinical data from a variety of sources with minimal IT assistance. As a result, this company can better assess relationships between clinical drug treatments, drug usage and outcomes and predict future outcomes. It’s not hard to envision ways in which real-time data updates, sometimes referred to as “fast data,” can impact all areas of operations – whether it is to assist healthcare companies in saving lives to an insurance agency responding to customer claims.
5. Create AI and Analytics-Ready Data in Your Lake
Most initial data lake initiatives have failed to deliver the analytic insights as intended. While it’s relatively easy to land data in your lake, it’s a much more daunting challenge to handle continuous updates, merge and reconcile, and create analytics-ready structures.
Advanced CDC solutions will not only land the data but will automate the creation and updates of historic and operational data stores in technologies like Apache Hive. With this approach, your data scientists and business analysts can now spend time on data analytics and not data preparation.
6. Automate Data Delivery and Creation of Data Warehouses and Marts
Once you’ve automated the data ingestion and creation of analytics-ready data in your lake, you’ll then want to find ways to automate the creation of functional-specific data warehouses and marts. When you couple CDC with data warehouse automation, you can rapidly create and update data marts anywhere they are needed – in your lake, the cloud or a traditional RDBMS. The businesses benefit from improved business agility, speed, cost savings and reduced project risk.
Modern tools allow businesses to drastically reducing ETL coding time and increasing the speed of processes. A U.S. southwestern university is using CDC technology for real-time data ingestion across its many different databases and analytic platforms to gain insight for improving enrollment, increasing retention and providing a high-quality student experience.
The Hallmark of a Modern Enterprise
Traditional data integration tools, like ETL, are anything but magical. Their outdated architectures don’t address modern challenges, require manual scripting and can’t withstand the immensity of big data velocities and volumes.
Data mobility and on-demand integration will be critical to the success of modern enterprise IT departments for the foreseeable future. From an IT perspective, data flows might best be viewed as the circulatory system of the modern enterprise. The beating heart is CDC, which identifies, and delivers changed data to its various users. To thrive, enterprises must expand beyond legacy data integration approaches and rethink the limits of what a modern data integration platform can do for them. It’s time to fully leverage this technology to optimize how companies are using their most valuable asset – data.
About the author: Dan Potter is the vice president of product management and marketing at Attunity. In this role, he is responsible for product roadmap management, marketing and go-to-market strategies. Prior to Attunity, he held senior marketing roles at Datawatch, IBM, Oracle and Progress Software. Dan earned a B.S. in Business Administration from University of New Hampshire.
Related Items:
A Modern Way to Think About Your Next-Generation Applications
Re-Platforming the Enterprise, Or Putting Data Back at the Center of the Data Center