Decomposed Mutual Information Estimation for Contrastive Representation Learning

Alessandro Sordoni, Nouha Dziri, Hannes Schulz, Geoff Gordon, Philip Bachman, Remi Tachet Des Combes
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:9859-9869, 2021.

Abstract

Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-sordoni21a, title = {Decomposed Mutual Information Estimation for Contrastive Representation Learning}, author = {Sordoni, Alessandro and Dziri, Nouha and Schulz, Hannes and Gordon, Geoff and Bachman, Philip and Combes, Remi Tachet Des}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {9859--9869}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/sordoni21a/sordoni21a.pdf}, url = {https://proceedings.mlr.press/v139/sordoni21a.html}, abstract = {Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation.} }
Endnote
%0 Conference Paper %T Decomposed Mutual Information Estimation for Contrastive Representation Learning %A Alessandro Sordoni %A Nouha Dziri %A Hannes Schulz %A Geoff Gordon %A Philip Bachman %A Remi Tachet Des Combes %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-sordoni21a %I PMLR %P 9859--9869 %U https://proceedings.mlr.press/v139/sordoni21a.html %V 139 %X Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation.
APA
Sordoni, A., Dziri, N., Schulz, H., Gordon, G., Bachman, P. & Combes, R.T.D.. (2021). Decomposed Mutual Information Estimation for Contrastive Representation Learning. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:9859-9869 Available from https://proceedings.mlr.press/v139/sordoni21a.html.

Related Material