July 12, 2024

Patronus AI Releases Lynx for Real-Time Hallucination Detection in LLMs

SAN FRANCISCO, July 12, 2024 — Patronus AI announced the release of Lynx, a State-of-the-Art hallucination detection model designed to address the challenge of hallucinations in large language models (LLMs).

Hallucinations occur when LLMs generate responses that are coherent but do not align with factual reality or the input context, undermining their practical utility across various applications. While traditional proprietary LLMs, like GPT-4, have become used to detect these inconsistencies in recent times (‘LLM-as-a-judge’), there are concerns over their reliability, scalability, and cost.

Lynx represents a breakthrough in the field by enabling real-time hallucination detection without the need for manual annotation. Patronus AI also open sourced HaluBench, a new benchmark sourced from real-world domains, to assess faithfulness in LLM responses comprehensively.

“Since the release of ChatGPT in November 2022, the proliferation of large language models has revolutionized text generation and knowledge-intensive tasks like question answering. However, hallucinations, where models produce coherent but inaccurate responses, remains a critical challenge and poses significant risks for enterprises,” said Anand Kannappan. “We address this challenge head-on with Lynx, a groundbreaking open source model capable of real-time hallucination detection. Today, we not only introduce the most powerful LLM-as-a-judge with Lynx, we also introduce HaluBench, a novel 15k sample benchmark that LLM developers can use to measure the hallucination rate of their fine-tuned LLMs in domain-specific scenarios.”

Lynx is the first model that beats GPT-4 on hallucination tasks. Lynx (70B) achieved the highest accuracy at detecting hallucinations, compared to all other LLMs used as judges, making it the largest and most powerful open source hallucination model to date. It outperformed OpenAI’s GPT models and Anthropic’s Claude 3 models at a fraction of the size.

Lynx and HaluBench also support real world domains like Finance and Medicine, which previous datasets and models did not include, making it more applicable to real world problems.

Results:

In medical answers (PubMedQA), Lynx (70B) was 8.3% more accurate than GPT-4o at detecting medical inaccuracies.
Lynx (8B) outperformed GPT-3.5 by 24.5% on HaluBench, and beat Claude-3-Sonnet and Claude-3-Haiku by 8.6% and 18.4% respectively, showing strong capabilities in a smaller model.
Both Lynx (8B) and Lynx (70B) achieve significantly increased accuracy compared to open source model baselines, with Lynx (8B) showing gains of 13.3% over Llama-3-8B-Instruct from supervised finetuning.
Lynx (70B) outperformed GPT-3.5 by an average of 29.0% across all tasks.

Lynx and HaluBench are now publicly available on Hugging Face, the open source AI platform.

About Patronus AI

Patronus AI is the first automated evaluation and security platform that helps companies use large language models (LLMs) safely. For more information, visit https://www.patronus.ai or reach out to [email protected].

Source: Patronus AI

Patronus AI Releases Lynx for Real-Time Hallucination Detection in LLMs

February 18, 2025

February 14, 2025

February 13, 2025

Sponsored Partner Content

CData recognized in the 2024 Gartner ® Magic Quadrant™ Report

Introducing AIStor, the most powerful version of MinIO to date

Designing a Copilot for Data Transformation

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Transforming Healthcare with Data

IDC Spotlight: Boosting AI Impact with Data Products

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Patronus AI Releases Lynx for Real-Time Hallucination Detection in LLMs

February 18, 2025

February 14, 2025

February 13, 2025

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Share

Copy short link