Tag: data pipeline
AI Has a Data Problem, Appen Report Says
AI may be a priority at American companies, but the difficulty in managing data and obtaining high quality data to train AI models is becoming a bigger hurdle to achieving AI aspirations, according to Appen’s State of Read more…
Astronomer’s High Hopes for New DataOps Platform
Astronomer last month rolled out a new observability product called Astro Observe that’s aimed at giving customers the full picture of how their data is flowing using Apache Airflow, the open source data orchestration Read more…
Cutting-Edge Infrastructure Best Practices for Enterprise AI Data Pipelines
The ability to harness, process, and leverage vast amounts of data sets leading organizations apart in today's data-driven landscape. To stay ahead, enterprises must master the complexities of artificial intelligence (AI Read more…
Data Is the Foundation for GenAI, MIT Tech Review Says
Pretrained large language models (LLMs) like GPT-4 and Gemini are great, but real competitive advantage comes from combining LLMs with private data. Unfortunately, there are questions as to how well companies have prepar Read more…
Data Quality Is A Mess, But GenAI Can Help
A recurring theme in big data over the past two decades is the poor quality of data. No matter how much ink is spilled on the topic, organizations continually seem surprised that the data they want to use for analytics o Read more…
Data Observability in 2024: A Guide
In today's data-driven world, data observability is a critical concept for organizations aiming to effectively manage their data. Simply put, it means having the ability to constantly monitor and understand the status of Read more…
How Airflow 2.8 Makes Building and Running Data Pipelines Easier
Apache Airflow is one of the world’s most popular open source tools for building and managing data pipelines, with around 16 million downloads per month. Those users will see several compelling new features that help t Read more…
AWS Plots Zero-ETL Connections to Azure and Google
At the recent re:Invent show, AWS unveiled new zero-ETL connections that will eliminate the need for customers to build and maintain data pipelines between various AWS data services, including Redshift, Aurora, DynamoDB, Read more…
Pantomath on a Mission To Enhance End-To-End Data Pipeline Observability Across Complex Data Ecosystems
In our data-intensive business world, organizations are striving to use new and innovative methods to derive valuable and actionable insights from the data. Unfortunately, data quality issues are a major challenge and re Read more…
Six Common Signs It’s Time to Invest in Data Reliability
Every enterprise relies heavily on data to make decisions. This makes data reliability crucial. Without it, you may not find the path to streamlined customer experience and revenue generation. However, data reliability Read more…
Meet Maxime Beauchemin, a 2023 Person to Watch
When it comes to prolific contributors to open source projects in the big data space, Maxime Beauchemin is definitely somebody you should know. As a data engineer at Airbnb, Beauchemin created multiple tools that he subs Read more…
Data Mesh Creator Takes Next Data Step
Zhamak Dehghani, who is credited with popularizing the data mesh concept, announced earlier this month the founding of Nextdata. The new outfit will develop software designed to help customers implement “data product c Read more…
Data Integration and Observability Provider Crux Nabs $50m in Funding
Crux, a cloud-based provider of data integration and observability tools that claims to have more than 250 data connectors, today announced that it raised $50 million in a Series B round of venture capital. The San Franc Read more…
AWS Seeks an End to ETL
Extract, transform, and load. It’s a simple and ubiquitous thing in IT. And yet everybody seems to hate it. The latest company to pile on to ETL is AWS, which declared an effort to end ETL yesterday at re:Invent. Ad Read more…
How Snowplow Breaks Down Data Barriers
If you have suspicions about your data, you’re not alone. The AI and data analytics dreams of many a company have been broken by poor data management and defective ETL pipelines. But by enforcing data schema at the poi Read more…
A Data Platform for Chatbot Development
One of the most compelling use cases for AI at the moment is developing chatbots and conversational agents. While the AI part of the equation works reasonably well, getting the training data organized to build and train Read more…
CI/CD Pipeline: 7 Advantages To A Continuous Integration Approach to Data Pipelines
When it comes to modern software development, it’s not surprising that companies have a need for speed. But if you develop software too quickly, it can mean sacrificing quality, security and compliance. DevOps and con Read more…
Exploring the Top Options for Real-Time ELT
Competitive advantage in today’s world rests on a company’s ability to innovate and adapt to a rapidly changing environment. To do that, organizations must adopt real-time thinking in the way they approach the design Read more…
Airflow Available as a New Managed Service Called Astro
Companies can now get an Apache Airflow data orchestration environment up and running in less than an hour via Astro, a new managed service launched today by Astronomer, the commercial entity behind the popular open-sour Read more…