Multi-Modal Video Topic Segmentation with Dual-Contrastive Domain Adaptation

Xing, Linzi; Tran, Quan; Caba, Fabian; Dernoncourt, Franck; Yoon, Seunghyun; Wang, Zhaowen; Bui, Trung; Carenini, Giuseppe

Computer Science > Multimedia

arXiv:2312.00220 (cs)

[Submitted on 30 Nov 2023]

Title:Multi-Modal Video Topic Segmentation with Dual-Contrastive Domain Adaptation

Authors:Linzi Xing, Quan Tran, Fabian Caba, Franck Dernoncourt, Seunghyun Yoon, Zhaowen Wang, Trung Bui, Giuseppe Carenini

View PDF

Abstract:Video topic segmentation unveils the coarse-grained semantic structure underlying videos and is essential for other video understanding tasks. Given the recent surge in multi-modal, relying solely on a single modality is arguably insufficient. On the other hand, prior solutions for similar tasks like video scene/shot segmentation cater to short videos with clear visual shifts but falter for long videos with subtle changes, such as livestreams. In this paper, we introduce a multi-modal video topic segmenter that utilizes both video transcripts and frames, bolstered by a cross-modal attention mechanism. Furthermore, we propose a dual-contrastive learning framework adhering to the unsupervised domain adaptation paradigm, enhancing our model's adaptability to longer, more semantically complex videos. Experiments on short and long video corpora demonstrate that our proposed solution, significantly surpasses baseline methods in terms of both accuracy and transferability, in both intra- and cross-domain settings.

Comments:	Accepted at the 30th International Conference on Multimedia Modeling (MMM 2024)
Subjects:	Multimedia (cs.MM); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2312.00220 [cs.MM]
	(or arXiv:2312.00220v1 [cs.MM] for this version)
	https://doi.org/10.48550/arXiv.2312.00220

Submission history

From: Linzi Xing [view email]
[v1] Thu, 30 Nov 2023 21:59:05 UTC (2,787 KB)

Computer Science > Multimedia

Title:Multi-Modal Video Topic Segmentation with Dual-Contrastive Domain Adaptation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Multimedia

Title:Multi-Modal Video Topic Segmentation with Dual-Contrastive Domain Adaptation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators