Learning Invariant Molecular Representation in Latent Discrete Space

Zhuang, Xiang; Zhang, Qiang; Ding, Keyan; Bian, Yatao; Wang, Xiao; Lv, Jingsong; Chen, Hongyang; Chen, Huajun

Abstract:Molecular representation learning lays the foundation for drug discovery. However, existing methods suffer from poor out-of-distribution (OOD) generalization, particularly when data for training and testing originate from different environments. To address this issue, we propose a new framework for learning molecular representations that exhibit invariance and robustness against distribution shifts. Specifically, we propose a strategy called ``first-encoding-then-separation'' to identify invariant molecule features in the latent space, which deviates from conventional practices. Prior to the separation step, we introduce a residual vector quantization module that mitigates the over-fitting to training data distributions while preserving the expressivity of encoders. Furthermore, we design a task-agnostic self-supervised learning objective to encourage precise invariance identification, which enables our method widely applicable to a variety of tasks, such as regression and multi-label classification. Extensive experiments on 18 real-world molecular datasets demonstrate that our model achieves stronger generalization against state-of-the-art baselines in the presence of various distribution shifts. Our code is available at this https URL.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2310.14170 [cs.LG]
	(or arXiv:2310.14170v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2310.14170

Computer Science > Machine Learning

Title:Learning Invariant Molecular Representation in Latent Discrete Space

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators