Self-Supervised Graph Information Bottleneck for Multiview Molecular Embedding Learning

Changsheng Li*, Kaihang Mao, Shiye Wang, Ye Yuan, Guoren Wang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

In the field of computer-aided drug discovery, identifying promising drug candidates from small molecule libraries requires meaningful molecular embeddings for downstream tasks, such as property prediction. However, obtaining experimentally determined molecular property measurements is often expensive and time-consuming, making it challenging to train molecular encoders with limited supervision. In addition, molecules can be represented in two ways: as 2-D chemical-bond structures and 3-D geometry structures. Molecular embedding learning using only one of these representations can result in information loss, and effective fusion of the two views has not been fully explored. To address these challenges, we propose a new approach called the self-supervised multiview graph neural network (SMV-GNN) for molecular embedding learning. Our approach involves a self-supervised task that promotes the representation ability of the molecular encoder without requiring extra human-annotation data. Specifically, we use chemical-bond-based graph structures as inputs to predict interatom distances from the 2-D view and randomly shuffle a ratio of atoms in the 3-D coordinate-based graphs to predict atom rationality from the 3-D view. We further improve the representation ability of the molecular embedding by using information bottleneck to learn essential shared feature representations by discarding superfluous information from the 2-D/3-D views for downstream tasks. We evaluate our proposed SMV-GNN approach on seven benchmark datasets for molecule property-prediction tasks, and demonstrate that it outperforms the current state-of-the-art methods.

Original languageEnglish
Pages (from-to)1554-1562
Number of pages9
JournalIEEE Transactions on Artificial Intelligence
Volume5
Issue number4
DOIs
Publication statusPublished - 1 Apr 2024

Keywords

  • Information bottleneck (IB)
  • molecular embedding learning
  • molecular property prediction
  • multiview learning
  • self-supervised learning

Fingerprint

Dive into the research topics of 'Self-Supervised Graph Information Bottleneck for Multiview Molecular Embedding Learning'. Together they form a unique fingerprint.

Cite this