Key Takeaways
- Vectara Hallucination Leaderboard reports GPT-4o-mini has a 1.7% hallucination rate on summarization tasks
- According to Vectara, Claude 3 Haiku exhibits a 2.2% hallucination rate in factual retrieval
- GPT-4 Turbo shows 0.9% hallucination rate per Vectara's evaluation on RAG tasks
- GPT-4 has 3% hallucination rate on MMLU benchmark subset
- Claude 3 Opus shows 1.8% hallucinations in proprietary evals
- Gemini 1.5 Flash records 2.4% factual errors on internal tests
- In legal RAG, GPT-4 hallucinates 17% of citations
- Medical QA with Med-PaLM shows 9% hallucinations
- Code generation in GPT-4 has 12% factual errors in docs
- HaluEval benchmark: GPT-4 scores 84.5% hallucination detection accuracy inversely
- TruthfulQA: GPT-3.5 has 45% truthfulness score implying 55% potential hallucination
- HHEM benchmark shows Claude 2 at 12.5% hallucination rate
- MMLU subset for hallucinations: GPT-4 2.1%, category: Benchmark Evaluations
- 69% of users report hallucinations impacting trust per Stanford study
- $100M+ potential losses from hallucinations in enterprise per Gartner
AI hallucinations show varied rates in models and real-world impacts.
Benchmark Evaluations
Benchmark Evaluations Interpretation
Benchmark Evaluations, source url: https://arxiv.org/abs/2303.18221
Benchmark Evaluations, source url: https://arxiv.org/abs/2303.18221 Interpretation
Impact Assessments
Impact Assessments Interpretation
Model-Specific
Model-Specific Interpretation
Overall Frequency
Overall Frequency Interpretation
Task-Specific
Task-Specific Interpretation
Sources & References
- Reference 1VECTARAvectara.comVisit source
- Reference 2ARXIVarxiv.orgVisit source
- Reference 3ANTHROPICanthropic.comVisit source
- Reference 4DEEPMINDdeepmind.googleVisit source
- Reference 5HUGGINGFACEhuggingface.coVisit source
- Reference 6LMSYSlmsys.orgVisit source
- Reference 7KOALA-LMkoala-lm.stanford.eduVisit source
- Reference 8GITHUBgithub.comVisit source
- Reference 9HAIhai.stanford.eduVisit source
- Reference 10GARTNERgartner.comVisit source
- Reference 11MCKINSEYmckinsey.comVisit source
- Reference 12REUTERSreuters.comVisit source
- Reference 13FORBESforbes.comVisit source
- Reference 14IBMibm.comVisit source
- Reference 15BAINbain.comVisit source






