Abstract
An important open problem in video retrieval and exploration concerns the generation and refinement of queries for complex tasks that standard methods are unable to solve, especially when the systems are used by novices. In conversational search, the proposed approach is to ask users to interactively refine the information provided to the search process, until results are satisfactory. In user relevance feedback (URF), the proposed approach is to ask users to interactively judge query results, which in turn refines the query used to retrieve the next set of suggestions. The question that we seek to answer in the long term is how to integrate these two approaches: can the query refinements of conversational search directly impact the URF model, and can URF judgments directly impact the conversational search? We extend the existing Exquisitor URF system with conversational search. In this version of Exquisitor, the two modes of interaction are separate, but the user interface has been completely redesigned to (a) prepare for future integration with conversational search and (b) better support novice users.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Amato, G., et al.: VISIONE at video browser showdown 2023. In: Dang-Nguyen, D.T., et al. (eds.) MMM 2023. LNCS, vol. 13833, pp. 615–621. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-27077-2_48
Arnold, R., Sauter, L., Schuldt, H.: Free-form multi-modal multimedia retrieval (4MR). In: Dang-Nguyen, D.T., et al. (eds.) MMM 2023. LNCS, vol. 13833, pp. 678–683. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-27077-2_58
Brown, T., et al.: Language models are few-shot learners. In: Advances in Neural Information Processing Systems 33, pp. 1877–1901 (2020)
Carreira, J., Noland, E., Hillier, C., Zisserman, A.: A short note on the kinetics-700 human action dataset. arXiv preprint arXiv:1907.06987 (2019)
Dalton, J., Xiong, C., Callan, J.: CAsT 2020: the conversational assistance track overview. In: Proceedings of TREC (2021)
Guðmundsson, G.Þ., Jónsson, B.Þ., Amsaleg, L.: A large-scale performance study of cluster-based high-dimensional indexing. In: Proceedings of the International Workshop on Very-Large-Scale Multimedia Corpus, Mining and Retrieval (VLS-MCM), Firenze, Italy (2010)
Jagerman, R., Zhuang, H., Qin, Z., Wang, X., Bendersky, M.: Query expansion by prompting large language models. arXiv preprint arXiv:2305.03653 (2023)
Jaided AI: EasyOCR. https://github.com/JaidedAI/EasyOCR
Jónsson, B.Þ, Khan, O.S., Koelma, D.C., Rudinac, S., Worring, M., Zahálka, J.: Exquisitor at the video browser showdown 2020. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 796–802. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_72
Khan, O.S., et al.: Exquisitor at the video browser showdown 2021: relationships between semantic classifiers. In: Lokoč, J., et al. (eds.) MMM 2021. LNCS, vol. 12573, pp. 410–416. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-67835-7_37
Khan, O.S., Jónsson, B.Þ.: User relevance feedback and novices: anecdotes from Exquisitor’s participation in interactive retrieval competitions. In: Proceedings of the Content-Based Multimedia Indexing, CBMI 2023, Orléans, France (2023)
Khan, O.S., et al.: Interactive learning for multimedia at large. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12035, pp. 495–510. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45439-5_33
Khan, O.S., et al.: Exquisitor at the video browser showdown 2022. In: Þór Jónsson, B., et al. (eds.) MMM 2022. LNCS, vol. 13142, pp. 511–517. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-98355-0_47
Kratochvíl, M., Veselý, P., Mejzlík, F., Lokoč, J.: SOM-hunter: video browsing with relevance-to-SOM feedback loop. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 790–795. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_71
Li, J., Li, D., Savarese, S., Hoi, S.: BLIP-2: bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597 (2023)
Lokoč, J., et al.: Interactive video retrieval in the age of effective joint embedding deep models: lessons from the 11th VBS. Multimedia Syst. 29, 3481–3504 (2023)
Lokoč, J., et al.: Interactive search or sequential browsing? A detailed analysis of the video browser showdown 2018. ACM TOMM 15(1), 1–18 (2019)
Lokoč, J., Kovalčík, G., Souček, T.: VIRET at video browser showdown 2020. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 784–789. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_70
Lokoč, J., et al.: Is the reign of interactive search eternal? Findings from the video browser showdown 2020. ACM Trans. Multimedia Comput. Commun. Appl. (TOMM) 17(3), 1–26 (2021)
Mao, K., Dou, Z., Qian, H.: Curriculum contrastive context denoising for few-shot conversational dense retrieval. In: Proceedings of the 45th International ACM SIGIR Conference, pp. 176–186 (2022)
Mettes, P., Koelma, D.C., Snoek, C.G.: The ImageNet shuffle: reorganized pre-training for video event detection. In: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, ICMR 2016, New York, NY, USA, pp. 175–182. Association for Computing Machinery (2016)
Nguyen, T.N., et al.: VideoCLIP: an interactive CLIP-based video retrieval system at VBS2023. In: Dang-Nguyen, D.T., et al. (eds.) MMM 2023. LNCS, vol. 13833, pp. 671–677. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-27077-2_57
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Ragnarsdóttir, H., et al.: Exquisitor: breaking the interaction barrier for exploration of 100 million images. In: Proceedings of the ACM Multimedia, Nice, France (2019)
Sauter, L., Amiri Parian, M., Gasser, R., Heller, S., Rossetto, L., Schuldt, H.: Combining boolean and multimedia retrieval in vitrivr for large-scale video search. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 760–765. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_66
Schoeffmann, K., Stefanics, D., Leibetseder, A.: diveXplore at the video browser showdown 2023. In: Dang-Nguyen, D.T., et al. (eds.) MMM 2023. LNCS, vol. 13833, pp. 684–689. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-27077-2_59
Song, W., He, J., Li, X., Feng, S., Liang, C.: QIVISE: a quantum-inspired interactive video search engine in VBS2023. In: Dang-Nguyen, D.T., et al. (eds.) MMM 2023. LNCS, vol. 13833, pp. 640–645. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-27077-2_52
Wang, L., Yang, N., Wei, F.: Query2doc: query expansion with large language models. arXiv preprint arXiv:2303.07678 (2023)
Wei, J., et al.: Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652 (2021)
Wei, J., et al.: Chain-of-thought prompting elicits reasoning in large language models. In: Advances in Neural Information Processing Systems 35, pp. 24824–24837 (2022)
Yu, W., et al.: Generate rather than retrieve: large language models are strong context generators. arXiv preprint arXiv:2209.10063 (2022)
Zahálka, J., Rudinac, S., Worring, M.: Analytic quality: evaluation of performance and insight in multimedia collection analysis. In: Proceedings of the 23rd ACM International Conference on Multimedia, MM 2015, pp. 231–240, New York, NY, USA. Association for Computing Machinery (2015)
Zahálka, J., Rudinac, S., Jónsson, B.Þ, Koelma, D.C., Worring, M.: Blackthorn: large-scale interactive multimodal learning. IEEE TMM 20(3), 687–698 (2018)
Acknowledgments
This work was supported by Icelandic Research Fund grant 239772-051.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Khan, O.S., Zhu, H., Sharma, U., Kanoulas, E., Rudinac, S., Jónsson, B.Þ. (2024). Exquisitor at the Video Browser Showdown 2024: Relevance Feedback Meets Conversational Search. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14557. Springer, Cham. https://doi.org/10.1007/978-3-031-53302-0_31
Download citation
DOI: https://doi.org/10.1007/978-3-031-53302-0_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53301-3
Online ISBN: 978-3-031-53302-0
eBook Packages: Computer ScienceComputer Science (R0)