
Data Quality Got You Down? Thank GenAI

(who-is-Danny/Shutterstock)
Chief data officers have a lot of challenges on their plates these days: data integration, security, privacy, compliance, cloud migrations, and IT staff and resources, to name a few. But a whopping 68% of CDOs in a recent report identified data quality as their number one problem. The driving force behind the surge in the awareness of data quality falls squarely on the emergence of generative AI, says Mike McKee, the CEO of Ataccama.
“AI has been that catalyst to look more closely at data quality,” McKee said. “The closer that people have looked, the more concerned they get, and as a result, it’s gone way up the priority list for people to address.”
Ataccama commissioned Hannover Research to survey about 300 senior data professionals in the US, UK, and Canada in late 2024. The results of that survey, which Ataccama published in its newly released Data Trust Report 2025, show the many enterprises are facing challenges in their plans to move forward with GenAI directly because of issues with data quality.
“AI models are only as effective as the data they rely on,” Ataccama says in its report. “And when that data is bad, the consequences are far-reaching.” Impacts of bad data, the company says, include: inaccurate insights, operational slow-downs, waste of resources, jeopardized compliance initiatives, and lowered ROI.
The report also found that legacy systems share a lot of the blame. They’re ill-quipped to handle increasing data volumes, and many were designed to provide period data updates, not the continuous, real-time streams demanded by AI, Ataccama says. Maintaining data quality across the organization is a challenge for 41% of respondents, the company says. These are all reasons why just 33% of organizations report meaningful progress in AI adoption, the company says in the report.
Data quality has been a problem since the first byte was written. There are untold ways that data can go bad (with human error leading the way), and data professionals have spent countless hours trying to address them.
For instance, during the data warehousing era, companies embarked upon heavy-handed, top-down initiatives in an attempt to dictate data quality standards for the organization. When data volumes were lower, companies could get away with brute-force approaches, such as master data management (MDM) initiatives that defined a centralized “golden record” that could be relied upon for decision-making.
But those boil-the-ocean approaches aren’t effective in today’s big data environment. For a variety of reasons–including the proliferation of data silos (both on-prem and in the cloud), the rapid expansion of use cases, and the emergence of unstructured data as a valued resource–data quality has gotten worse as volumes go up.
The best that companies can do is try to apply available resources to the most pressing data quality challenge at hand, McKee says.
“Trying to master all those different data sources can quickly become a fool’s errand,” he says. “Without a doubt, the piece of advice is start with the business initiative. Don’t start with a theoretical data project. [Ask yourself] what area of the business are you trying to improve? What area of the of the business has the least trust in the data and needs to address data quality issues first?
“To run this marketing campaign, I can just use these five data sources,” he continues. “I know there’s 25 data sources, but if I actually take information from these five data sources, match and merge, bring that together, know what the correct information is or the master information from a subset of the resources, then I’m driving that business initiative better.”
While we’ll never have perfect data, there is vast room for improvement on what we have today. Many companies struggle to enact meaningful analytics thanks to misspelled names, erroneous addresses, blank fields, and intentionally wrong data entered into forms. Every company struggles to purge these errors from their databases and file systems, but the challenge has become even more pressing since we’ve tried to use this data for GenAI.
“I think a big turning point was ChatGPT a couple of years ago,” McKee tells BigDATAwire in an interview. “All of a sudden, people are talking about AI. All of a sudden, the boards and the business leaders are like, hey, can you start using AI? And all of a sudden, the CIOs of the world who have been working on these data projects for so long, they’re like, ‘Hey, we’ve been working these data projects. You didn’t care about them then. Now you care so much.’”
The continuing explosion of data is putting the onus on data stewards and data engineers to track down and fix data quality problems, McKee says.
“You have a fixed amount of data professionals trying to handle an exploding amount of data, which, once again, is going to have a negative impact on data quality and the need to have automated data quality tools to address that concern,” he says.
Ataccama is seeking to address the data quality problem through automation. It employs machine learning and AI to help with the matching and merging capabilities in its data quality product, and also to automate much of the rule creation and rule documentation work, McKee says. It also uses GenAI techniques to help improve data quality to bolster other downstream GenAI projects, a great example of the virtuous cycle of data and AI.
But better data quality tools can only get you so far. In their 2025 AI & Data Leadership Executive Benchmark Survey, Randy Bean and Tom Davenport found that 92% of respondents “believe that the primary barrier to establishing data- and AI-driven cultures is people and organization change-based, and only 8% thought technology was the culprit.”

Data investments are increasing (Source: “2025 AI & Data Leadership Executive Benchmark Survey” by Randy Bean and Tom Davenport)
When it comes to the importance of data quality, however, Davenport and Bean, who is an advisor to Ataccama, are in full agreement with Ataccama and McKee: GenAI is exposing a data quality as the massive problem that it is.
“…[A]s a consequence of the rapid increase in interest and commitment to AI investment, a growing percentage of organizations are now focusing on their data initiatives as well,” Bean and Davenport write. “It is increasingly understood that the quality of AI is largely dependent upon the quality of the data that is available.”
The good news is that the recognition of the problem of data quality is leading to more resources being devoted to it, both in terms of expanding the human and organizational heft needed to attack it, as well as buying better tools. In their 2025 AI & Data Leadership Executive Benchmark Survey, Bean and Davenport note that investments in data and GenAI are increasing. So is the percent of decision-makers who say that data and AI are a top priority.
Ataccama is also seeing this trend impact its revenue. While the Boston-based company doesn’t focus exclusively on data quality, it is the heritage of the company, which has its roots in the Czech Republic. According to McKee, bookings increased 100% in the last year, while top-line revenue jumped 30%. That indicates companies are recognizing and responding to the problem, he says.
“I would say that more organizations are prioritizing it…and I think we’ll see major improvements in the next two to three years,” McKee predicts. “I think we’re sort of in the ‘admit you have a problem’ stage. Once you admit you have a problem, look for solutions. And then as you look for solutions, then you’ll start to see an improvement overall.”
Related Items:
Ataccama Introduces AI Agent For Enhanced Data Management
Overcoming the Financial Implications of Poor Data Quality
Data Quality Getting Worse, Report Says