IC3: Image Captioning by Committee Consensus

Chan, David M.; Myers, Austin; Vijayanarasimhan, Sudheendra; Ross, David A.; Canny, John

Computer Science > Computer Vision and Pattern Recognition

arXiv:2302.01328 (cs)

[Submitted on 2 Feb 2023 (v1), last revised 19 Oct 2023 (this version, v3)]

Title:IC3: Image Captioning by Committee Consensus

Authors:David M. Chan, Austin Myers, Sudheendra Vijayanarasimhan, David A. Ross, John Canny

View PDF

Abstract:If you ask a human to describe an image, they might do so in a thousand different ways. Traditionally, image captioning models are trained to generate a single "best" (most like a reference) image caption. Unfortunately, doing so encourages captions that are "informationally impoverished," and focus on only a subset of the possible details, while ignoring other potentially useful information in the scene. In this work, we introduce a simple, yet novel, method: "Image Captioning by Committee Consensus" (IC3), designed to generate a single caption that captures high-level details from several annotator viewpoints. Humans rate captions produced by IC3 at least as helpful as baseline SOTA models more than two thirds of the time, and IC3 can improve the performance of SOTA automated recall systems by up to 84%, outperforming single human-generated reference captions, and indicating significant improvements over SOTA approaches for visual description. Code is available at this https URL

Comments:	To Appear at EMNLP 2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2302.01328 [cs.CV]
	(or arXiv:2302.01328v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2302.01328

Submission history

From: David Chan [view email]
[v1] Thu, 2 Feb 2023 18:58:05 UTC (29,216 KB)
[v2] Thu, 16 Feb 2023 23:38:25 UTC (29,216 KB)
[v3] Thu, 19 Oct 2023 17:58:05 UTC (2,284 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:IC3: Image Captioning by Committee Consensus

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:IC3: Image Captioning by Committee Consensus

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators