Follow BigDATAwire:

Tag: data pipeline

AI Has a Data Problem, Appen Report Says

AI may be a priority at American companies, but the difficulty in managing data and obtaining high quality data to train AI models is becoming a bigger hurdle to achieving AI aspirations, according to Appen’s State of Read more…

Astronomer’s High Hopes for New DataOps Platform

Astronomer last month rolled out a new observability product called Astro Observe that’s aimed at giving customers the full picture of how their data is flowing using Apache Airflow, the open source data orchestration Read more…

Cutting-Edge Infrastructure Best Practices for Enterprise AI Data Pipelines

The ability to harness, process, and leverage vast amounts of data sets leading organizations apart in today's data-driven landscape. To stay ahead, enterprises must master the complexities of artificial intelligence (AI Read more…

Data Is the Foundation for GenAI, MIT Tech Review Says

Pretrained large language models (LLMs) like GPT-4 and Gemini are great, but real competitive advantage comes from combining LLMs with private data. Unfortunately, there are questions as to how well companies have prepar Read more…

Data Quality Is A Mess, But GenAI Can Help

A recurring theme in big data over the past two decades is the poor quality of data. No matter how much ink is spilled on the topic, organizations continually seem surprised that the data they want to use for analytics o Read more…

Data Observability in 2024: A Guide

In today's data-driven world, data observability is a critical concept for organizations aiming to effectively manage their data. Simply put, it means having the ability to constantly monitor and understand the status of Read more…

How Airflow 2.8 Makes Building and Running Data Pipelines Easier

Apache Airflow is one of the world’s most popular open source tools for building and managing data pipelines, with around 16 million downloads per month. Those users will see several compelling new features that help t Read more…

AWS Plots Zero-ETL Connections to Azure and Google

At the recent re:Invent show, AWS unveiled new zero-ETL connections that will eliminate the need for customers to build and maintain data pipelines between various AWS data services, including Redshift, Aurora, DynamoDB, Read more…

Pantomath on a Mission To Enhance End-To-End Data Pipeline Observability Across Complex Data Ecosystems

In our data-intensive business world, organizations are striving to use new and innovative methods to derive valuable and actionable insights from the data. Unfortunately, data quality issues are a major challenge and re Read more…

Six Common Signs It’s Time to Invest in Data Reliability

Every enterprise relies heavily on data to make decisions.  This makes data reliability crucial. Without it, you may not find the path to streamlined customer experience and revenue generation. However, data reliability Read more…

Meet Maxime Beauchemin, a 2023 Person to Watch

When it comes to prolific contributors to open source projects in the big data space, Maxime Beauchemin is definitely somebody you should know. As a data engineer at Airbnb, Beauchemin created multiple tools that he subs Read more…

Data Mesh Creator Takes Next Data Step

Zhamak Dehghani, who is credited with popularizing the data mesh concept, announced earlier this month the founding of Nextdata. The new outfit will develop software designed to help customers implement “data product c Read more…

Data Integration and Observability Provider Crux Nabs $50m in Funding

Crux, a cloud-based provider of data integration and observability tools that claims to have more than 250 data connectors, today announced that it raised $50 million in a Series B round of venture capital. The San Franc Read more…

AWS Seeks an End to ETL

Extract, transform, and load. It’s a simple and ubiquitous thing in IT. And yet everybody seems to hate it. The latest company to pile on to ETL is AWS, which declared an effort to end ETL yesterday at re:Invent. Ad Read more…

How Snowplow Breaks Down Data Barriers

If you have suspicions about your data, you’re not alone. The AI and data analytics dreams of many a company have been broken by poor data management and defective ETL pipelines. But by enforcing data schema at the poi Read more…

A Data Platform for Chatbot Development

One of the most compelling use cases for AI at the moment is developing chatbots and conversational agents. While the AI part of the equation works reasonably well, getting the training data organized to build and train Read more…

CI/CD Pipeline: 7 Advantages To A Continuous Integration Approach to Data Pipelines

When it comes to modern software development, it’s not surprising that companies have a need for speed. But if you develop software too quickly, it can mean sacrificing quality, security and compliance. DevOps and con Read more…

Exploring the Top Options for Real-Time ELT

Competitive advantage in today’s world rests on a company’s ability to innovate and adapt to a rapidly changing environment. To do that, organizations must adopt real-time thinking in the way they approach the design Read more…

Airflow Available as a New Managed Service Called Astro

Companies can now get an Apache Airflow data orchestration environment up and running in less than an hour via Astro, a new managed service launched today by Astronomer, the commercial entity behind the popular open-sour Read more…

Monte Carlo Raises $135 Million to Grow Data Observability Biz

Data observability has emerged as one of the hottest sectors in the big data market, thanks to its focus on fixing broken data pipelines. One of the hottest players in the field is Monte Carlo, which this week announced Read more…

BigDATAwire