Why A Bad LLM Is Worse Than No LLM At All
Companies are sprinting to add large language models (LLMs) to their technology stacks thanks to the popularity of generative AI like ChatGPT and Bard. The hours of work saved from using generative AI apps have many eager to unleash LLMs onto their data and see what treasures they can uncover.
While the recent enthusiasm for AI is a welcome change from the Skynet-tinged narrative of years past, the reality is that business leaders need to take a cautious yet optimistic approach. In the rush to buy and deploy LLM services and tools companies may not be thinking through the enterprise value of this technology or the potential risks, especially when it comes to its use in data analytics.
LLMs Aren’t Magic
LLMs are a type of generative AI that uses deep learning techniques and huge datasets to understand, summarize, and generate text-based content. While this tech sometimes appears to be magical (we are constantly surprised by the things it can help), the algorithm has been trained to predict the text response that makes the most sense based on the massive amounts of content that it has been trained on. That trained response can be helpful, but it can also introduce a lot of risk.
Generative AI has been lauded as instantly providing answers to queries, retrieving information, and building creative narratives…and sometimes it does! But when it comes to anything AI, every result it produces should be followed by a thorough fact-checking mission before putting it to use in any business strategy or operation.
Furthermore, LLMs are usually trained on datasets scraped from the Internet and other open sources. The large volume of content from these places from a variety of contributors makes it challenging to filter out inaccurate, biased, or outdated information. As a result, some generative AI can create more fiction than fact (and for some use cases, that is okay). With companies strapped for resources and the pressure of productivity, LLMs can and should be used to accelerate appropriate tasks.
But they shouldn’t be used to automate tasks entirely, because that leads to four significant concerns:
1. Query and Prompt Design
For an LLM to return a useful output, it needs to have interpreted the user’s query or prompt the way it was intended. There is a lot of nuance in language that can lead to misunderstandings and no solution exists yet that has guardrails to ensure consistent—and accurate—results that meet expectations.
2. Hallucinations
LLMs, including ChatGPT, have been known to simply make up data to fill in the gaps in their knowledge just so that they can answer the prompt. They are designed to produce answers that feel right, even if they aren’t. If you work with vendors supplying LLMs within their products or as standalone tools, it’s critical to ask them how their LLM is trained and what they’re doing to mitigate inaccurate results.
3. Security and Privacy
The majority of LLMs on the market are available publicly online, which makes it incredibly challenging to safeguard any sensitive information or queries you input. It’s very likely that this data is visible to the vendor, who will almost certainly be storing and using it to train future versions of their product. And if that vendor is hacked or there’s a data leak, expect even bigger headaches for your organization. In the end, using LLMs is a risk because there are no universal standards for their safe and ethical use yet.
4. Confidence and Trust
Often when AI is used to automate a task, such as creating an agenda or writing content, it is obvious to the end user that an LLM was used instead of a human. In some cases, that’s an acceptable trade compared to the time saved. But sometimes LLM-generated content acts as a red flag to users and negatively impacts their experience.
Though LLMs are still an emerging technology, many AI-driven products have enormous potential to expand and deepen data exploration when they are guided by data scientists.
Exploring Data More Intelligently
We’re already seeing how AI is well-suited for combing through huge amounts of data, extracting meaning, and generating a new way to consume that meaning. Intelligent exploration is the use of AI coupled with multidimensional visualizations to do rich data exploration of vast, complex datasets.
Companies use AI to drive intelligent exploration so users can explore and understand data. These AI technologies use natural language and visuals to tell the full story hiding in data, surfacing meaningful insight. This helps accelerate analytics work so that analysts can focus on elements of the story that may not live in the data and provide even more value to their organizations.
Leveraging AI for data analytics gives businesses the ability to look at their data more objectively and more creatively. While generative AI still has a long way to go before it’s considered mature, that doesn’t mean that we can’t start using it to explore our data with the right guidance.
The Future is Bright—But So is the Present
Despite the current limitations of LLMs, there is huge potential for this technology to benefit the data analytics space sooner than you might think.
So many organizations sit on a wealth of data they can’t make sense of for a multitude of reasons. AI-guided Intelligent Exploration helps companies derive value from their data and take strategic action. By leveraging XAI, generative AI, and rich visualizations together, users understand complex datasets and gain insights that can change their business for the better.
The future of AI is bright, but there is much to be gained by using AI to elevate your data analytics efforts today. As companies continue to evaluate and develop Generative AI to improve data analytics, there is so much that AI can already do to help teams get more from their data, if they can harness the opportunity with the right tools.
About the authors: Aakash Indurkhya graduated from Caltech with a focus on machine learning and systems engineering. During his time at Caltech, he founded and taught a course on big data frameworks and contributed to ongoing research in computational theory at Caltech and computational science at Duke University. At Virtualitics, Aakash manages the development of AI tools and solutions for clients and Virtualitics products and holds several patents for the innovative capabilities of the Virtualitics AI Platform.
Sarthak Sahu graduated from Caltech and leads a team of data scientists, machine learning engineers, and AI platform developers that work on creating enterprise AI products and solving challenging machine learning and data analytics problems for our clients. As the first ML hire at a fast growth AI startup, he has years of cross functional experience as both an individual contributor and an engineering & technical product manager. Research areas of interest include generative AI, explainable AI (XAI), network graph analytics, natural language processing (NLP), and computer vision (CV).
Related Items:
How Large Language Models and Humans Can Make Strategic Decisions Together
Leveraging GenAI and LLMs in Financial Services
GenAI Doesn’t Need Bigger LLMs. It Needs Better Data