A Comprehensive Survey of Hallucination in Large Language, Image, Video and Audio Foundation Models

Sahoo, Pranab; Meharia, Prabhash; Ghosh, Akash; Saha, Sriparna; Jain, Vinija; Chadha, Aman

Computer Science > Machine Learning

arXiv:2405.09589 (cs)

[Submitted on 15 May 2024 (v1), last revised 3 Oct 2024 (this version, v4)]

Title:A Comprehensive Survey of Hallucination in Large Language, Image, Video and Audio Foundation Models

Authors:Pranab Sahoo, Prabhash Meharia, Akash Ghosh, Sriparna Saha, Vinija Jain, Aman Chadha

View PDF HTML (experimental)

Abstract:The rapid advancement of foundation models (FMs) across language, image, audio, and video domains has shown remarkable capabilities in diverse tasks. However, the proliferation of FMs brings forth a critical challenge: the potential to generate hallucinated outputs, particularly in high-stakes applications. The tendency of foundation models to produce hallucinated content arguably represents the biggest hindrance to their widespread adoption in real-world scenarios, especially in domains where reliability and accuracy are paramount. This survey paper presents a comprehensive overview of recent developments that aim to identify and mitigate the problem of hallucination in FMs, spanning text, image, video, and audio modalities. By synthesizing recent advancements in detecting and mitigating hallucination across various modalities, the paper aims to provide valuable insights for researchers, developers, and practitioners. Essentially, it establishes a clear framework encompassing definition, taxonomy, and detection strategies for addressing hallucination in multimodal foundation models, laying the foundation for future research in this pivotal area.

Comments:	EMNLP 2024 Findings
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2405.09589 [cs.LG]
	(or arXiv:2405.09589v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2405.09589

Submission history

From: Pranab Sahoo [view email]
[v1] Wed, 15 May 2024 10:16:25 UTC (19,478 KB)
[v2] Mon, 20 May 2024 06:30:06 UTC (19,478 KB)
[v3] Sat, 21 Sep 2024 03:28:57 UTC (19,478 KB)
[v4] Thu, 3 Oct 2024 09:00:35 UTC (4,700 KB)

Computer Science > Machine Learning

Title:A Comprehensive Survey of Hallucination in Large Language, Image, Video and Audio Foundation Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A Comprehensive Survey of Hallucination in Large Language, Image, Video and Audio Foundation Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators