Security Risks of Gen AI Raise Eyebrows
Unless you’ve been hiding under a rock the past eight months, you’ve undoubtedly heard how large language models (LLMs) and generative AI will change everything. Businesses are eagerly adopting things like ChatGPT to augment human employees or replace them outright. But besides the impact of job losses and ethical implications of biased models, these new forms of AI carry data security risks that corporate IT departments are starting to understand.
“Every company on the planet is looking at their difficult technical problems and just slapping on an LLM,” Matei Zaharia, the Databricks CTO and co-founder and the creator of Apache Spark, said during his keynote address at the Data + AI Summit on Tuesday. “How many of your bosses have asked you do this? It seems like pretty much everyone here.”
Corporate boardrooms are certainly aware of the potential impact of generative AI. According to a survey conducted by Harris Poll on behalf of Insight Enterprises, 81% of large companies (1,000+ employees) have already established or implemented policies or strategies around generative AI, or are in the process of doing so.
“The pace of exploration and adoption of this technology is unprecedented,” Matt Jackson, Insight’s global chief technology officer, stated in a Tuesday press release. “People are sitting in meeting rooms or virtual rooms discussing how generative AI can help them achieve near-term business goals while trying to stave off being disrupted by somebody else who is a faster, more efficient adopter.”
Nobody wants to get displaced by a faster-moving company that figured out how to monetize generative AI first. That seems like a distinct possibility at the moment. But there are other possibilities too, including you losing control of your private data, your Gen AI getting hijacked, or your Gen AI app being poisoned by hackers or competitors.
Among the unique security risks that LLM users should be on the lookout for are things like prompt injections, data leakage, and unauthorized code execution. These are some of the top risks that the Open Worldwide Application Security Project (OWASP), an online community dedicated to furthering knowledge about security vulnerabilities, published in Top 10 List for Large Language Models.
Data leakage, in which an LLM inadvertently shares potentially private information that was used to train it, has been documented as an LLM concern for years, but the concerns have taken a backseat to the hype of Gen AI since ChatGPT debuted in late 2022. Hackers also could potentially craft specific prompts designed to extract information from Gen AI apps. To prevent data leakage, users need to implement protection, such as through output filtering.
While sharing your company’s raw sales data with an API from OpenAI, Google, or Microsoft may seem like a great way to get a halfway-decent, ready-made report, it also carries intellectual property (IP) disclosure risks that users should be aware of. In Wednesday op-ed in the Wall Street Journal titled “Don’t Let AI Steal Your Data,” Matt Calkins, the CEO of Appian, encourages businesses to be wary with sending private data up into the cloud.
“A financial analyst I know recently asked ChatGPT to write a report,” Calkins writes. “Within seconds, the software generated a passable document, which the analyst thought would earn him plaudits. Instead, his boss was irate: ‘You told Microsoft everything you think?’”
While LLMs and Gen AI apps can string together marketing pitches or sales reports like an average copy writer or business analyst, they come with a big caveat: there is no guarantee that the data will be kept private.
“Businesses are learning that large language models are powerful but not private,” Calkins writes. “Before the technology can give you valuable feedback, you have to offer it valuable information.”
The folks at Databricks hear that concern from their customers too, which is one of the reasons why it snapped up MosiacML for a cool $1.3 billion on Monday and then launched Databricks AI yesterday. The company’s CEO, Ali Ghodsi, has been an avowed supporter of the democratization of AI, and today that appears to mean owning and running your own LLM.
“Every conversation I’m having, the customers are saying ‘I want to control the IP and I want to lock down my data,’” Ghodsi said during a press conference Tuesday. “The companies want to own that model. They don’t want to just use one model that somebody is providing, because it’s intellectual property and it’s competitiveness.”
While Ghodsi is fond of saying every company will be a data and AI company, they won’t become data and AI companies in the same way. The larger companies likely will lead in developing high-quality, custom LLMs–which MosiacML co-founder and CEO Naveen Rao said Tuesday will cost individual comapnies in the hundreds of thousands of dollars to build, not the hundreds of millions that companies like Google and OpenAI spend to train their giant models.
But as easy and affordable as companies like MosiacML and Databricks can make creating custom LLMs, smaller companies without the money and tech resources still will be more likely to tap into pre-built LLMs running in public clouds, to which they will upload their prompts via an API, and for which they will pay a subscription to access, just like how they access all their other SaaS applications. These companies must need to come to grips with the risk that this poses to their private data and IP.
There is evidence that companies are starting to realize the security that posed by new forms of AI. According to the Insight Enterprise study, 49% of survey-takers said they’re concerned about the safety and security risks of generative AI, trailing only quality and control. That was ahead of concerns about limits of human innovation, cost, and legal and regulatory compliance.
The boom in Gen AI will likely be a boon to the security business. According to global telemetry data collected by Skyhigh Security (formerly McAfee Enterprise) from the first half of 2023, about 1 million of its users have accessed ChatGPT through corporate infrastructures. From January to June, the volume of users accessing ChatGPT through its security software has increased by 1,500%, the company says.
“Securing corporate data in SaaS applications, like ChatGPT and other generative AI applications, is what Skyhigh Security was built to do,” Anand Ramanathan, chief product officer for Skyhigh Security, stated in a press release.
Related Items:
Databricks’ $1.3B MosaicML Buyout: A Strategic Bet on Generative AI
Feds Boost Cyber Spending as Security Threats to Data Proliferate
Databricks Unleashes New Tools for Gen AI in the Lakehouse