default search action
LLM4Eval@sIGIR 2024: Washington DC, USA
- Clemencia Siro, Mohammad Aliannejadi, Hossein A. Rahmani, Nick Craswell, Charles L. A. Clarke, Guglielmo Faggioli, Bhaskar Mitra, Paul Thomas, Emine Yilmaz:
Proceedings of The First Workshop on Large Language Models for Evaluation in Information Retrieval (LLM4Eval 2024) co-located with 10th International Conference on Online Publishing (SIGIR 2024), Washington D.C., USA, July 18, 2024. CEUR Workshop Proceedings 3752, CEUR-WS.org 2024
LLMJudge Challenge Overivew
- Hossein A. Rahmani, Emine Yilmaz, Nick Craswell, Bhaskar Mitra, Paul Thomas, Charles L. A. Clarke, Mohammad Aliannejadi, Clemencia Siro, Guglielmo Faggioli:
LLMJudge: LLMs for Relevance Judgments. 1-3
Research Papers
- Bhashithe Abeysinghe, Ruhan Circi:
The Challenges of Evaluating LLM Applications: An Analysis of Automated, Human, and LLM-Based Approaches. 4-18 - Gabriel de Jesus, Sérgio Sobral Nunes:
Exploring Large Language Models for Relevance Judgments in Tetun. 19-30 - Naghmeh Farzi, Laura Dietz:
EXAM++: LLM-based Answerability Metrics for IR Evaluation. 31-50 - Jia-Hong Huang, Hongyi Zhu, Yixian Shen, Stevan Rudinac, Alessio M. Pacces, Evangelos Kanoulas:
A Novel Evaluation Framework for Image2Text Generation. 51-65 - Hyunwoo Kim, Yoonseo Choi, Taehyun Yang, Honggu Lee, Chaneon Park, Yongju Lee, Jin Young Kim, Juho Kim:
Using LLMs to Investigate Correlations of Conversational Follow-up Queries with User Satisfaction. 66-91 - Zackary Rackauckas, Arthur Câmara, Jakub Zavrel:
Evaluating RAG-Fusion with RAGElo: an Automated Elo-based Framework. 92-112 - Jheng-Hong Yang, Jimmy Lin:
Toward Automatic Relevance Judgment using Vision-Language Models for Image-Text Retrieval Evaluation. 113-123
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.