VISREAS: Complex Visual Reasoning with Unanswerable Questions

Akter, Syeda Nahida; Lee, Sangwu; Chang, Yingshan; Bisk, Yonatan; Nyberg, Eric

Computer Science > Computer Vision and Pattern Recognition

arXiv:2403.10534 (cs)

[Submitted on 23 Feb 2024]

Title:VISREAS: Complex Visual Reasoning with Unanswerable Questions

Authors:Syeda Nahida Akter, Sangwu Lee, Yingshan Chang, Yonatan Bisk, Eric Nyberg

View PDF HTML (experimental)

Abstract:Verifying a question's validity before answering is crucial in real-world applications, where users may provide imperfect instructions. In this scenario, an ideal model should address the discrepancies in the query and convey them to the users rather than generating the best possible answer. Addressing this requirement, we introduce a new compositional visual question-answering dataset, VISREAS, that consists of answerable and unanswerable visual queries formulated by traversing and perturbing commonalities and differences among objects, attributes, and relations. VISREAS contains 2.07M semantically diverse queries generated automatically using Visual Genome scene graphs. The unique feature of this task, validating question answerability with respect to an image before answering, and the poor performance of state-of-the-art models inspired the design of a new modular baseline, LOGIC2VISION that reasons by producing and executing pseudocode without any external modules to generate the answer. LOGIC2VISION outperforms generative models in VISREAS (+4.82% over LLaVA-1.5; +12.23% over InstructBLIP) and achieves a significant gain in performance against the classification models.

Comments:	18 pages, 14 figures, 5 tables
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2403.10534 [cs.CV]
	(or arXiv:2403.10534v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2403.10534

Submission history

From: Syeda Nahida Akter [view email]
[v1] Fri, 23 Feb 2024 00:12:10 UTC (21,584 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:VISREAS: Complex Visual Reasoning with Unanswerable Questions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:VISREAS: Complex Visual Reasoning with Unanswerable Questions

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators