CAPSTONE: Curriculum Sampling for Dense Retrieval with Document Expansion

He, Xingwei; Gong, Yeyun; Jin, A-Long; Zhang, Hang; Dong, Anlei; Jiao, Jian; Yiu, Siu Ming; Duan, Nan

Computer Science > Computation and Language

arXiv:2212.09114 (cs)

[Submitted on 18 Dec 2022 (v1), last revised 29 Oct 2023 (this version, v2)]

Title:CAPSTONE: Curriculum Sampling for Dense Retrieval with Document Expansion

Authors:Xingwei He, Yeyun Gong, A-Long Jin, Hang Zhang, Anlei Dong, Jian Jiao, Siu Ming Yiu, Nan Duan

View PDF

Abstract:The dual-encoder has become the de facto architecture for dense retrieval. Typically, it computes the latent representations of the query and document independently, thus failing to fully capture the interactions between the query and document. To alleviate this, recent research has focused on obtaining query-informed document representations. During training, it expands the document with a real query, but during inference, it replaces the real query with a generated one. This inconsistency between training and inference causes the dense retrieval model to prioritize query information while disregarding the document when computing the document representation. Consequently, it performs even worse than the vanilla dense retrieval model because its performance heavily relies on the relevance between the generated queries and the real this http URL this paper, we propose a curriculum sampling strategy that utilizes pseudo queries during training and progressively enhances the relevance between the generated query and the real query. By doing so, the retrieval model learns to extend its attention from the document alone to both the document and query, resulting in high-quality query-informed document representations. Experimental results on both in-domain and out-of-domain datasets demonstrate that our approach outperforms previous dense retrieval models.

Comments:	Accetpted to EMNLP 2023
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2212.09114 [cs.CL]
	(or arXiv:2212.09114v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2212.09114

Submission history

From: Xingwei He [view email]
[v1] Sun, 18 Dec 2022 15:57:46 UTC (7,333 KB)
[v2] Sun, 29 Oct 2023 09:32:07 UTC (485 KB)

Computer Science > Computation and Language

Title:CAPSTONE: Curriculum Sampling for Dense Retrieval with Document Expansion

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:CAPSTONE: Curriculum Sampling for Dense Retrieval with Document Expansion

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators