TY - JOUR
T1 - Self-Supervised Graph Information Bottleneck for Multiview Molecular Embedding Learning
AU - Li, Changsheng
AU - Mao, Kaihang
AU - Wang, Shiye
AU - Yuan, Ye
AU - Wang, Guoren
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2024/4/1
Y1 - 2024/4/1
N2 - In the field of computer-aided drug discovery, identifying promising drug candidates from small molecule libraries requires meaningful molecular embeddings for downstream tasks, such as property prediction. However, obtaining experimentally determined molecular property measurements is often expensive and time-consuming, making it challenging to train molecular encoders with limited supervision. In addition, molecules can be represented in two ways: as 2-D chemical-bond structures and 3-D geometry structures. Molecular embedding learning using only one of these representations can result in information loss, and effective fusion of the two views has not been fully explored. To address these challenges, we propose a new approach called the self-supervised multiview graph neural network (SMV-GNN) for molecular embedding learning. Our approach involves a self-supervised task that promotes the representation ability of the molecular encoder without requiring extra human-annotation data. Specifically, we use chemical-bond-based graph structures as inputs to predict interatom distances from the 2-D view and randomly shuffle a ratio of atoms in the 3-D coordinate-based graphs to predict atom rationality from the 3-D view. We further improve the representation ability of the molecular embedding by using information bottleneck to learn essential shared feature representations by discarding superfluous information from the 2-D/3-D views for downstream tasks. We evaluate our proposed SMV-GNN approach on seven benchmark datasets for molecule property-prediction tasks, and demonstrate that it outperforms the current state-of-the-art methods.
AB - In the field of computer-aided drug discovery, identifying promising drug candidates from small molecule libraries requires meaningful molecular embeddings for downstream tasks, such as property prediction. However, obtaining experimentally determined molecular property measurements is often expensive and time-consuming, making it challenging to train molecular encoders with limited supervision. In addition, molecules can be represented in two ways: as 2-D chemical-bond structures and 3-D geometry structures. Molecular embedding learning using only one of these representations can result in information loss, and effective fusion of the two views has not been fully explored. To address these challenges, we propose a new approach called the self-supervised multiview graph neural network (SMV-GNN) for molecular embedding learning. Our approach involves a self-supervised task that promotes the representation ability of the molecular encoder without requiring extra human-annotation data. Specifically, we use chemical-bond-based graph structures as inputs to predict interatom distances from the 2-D view and randomly shuffle a ratio of atoms in the 3-D coordinate-based graphs to predict atom rationality from the 3-D view. We further improve the representation ability of the molecular embedding by using information bottleneck to learn essential shared feature representations by discarding superfluous information from the 2-D/3-D views for downstream tasks. We evaluate our proposed SMV-GNN approach on seven benchmark datasets for molecule property-prediction tasks, and demonstrate that it outperforms the current state-of-the-art methods.
KW - Information bottleneck (IB)
KW - molecular embedding learning
KW - molecular property prediction
KW - multiview learning
KW - self-supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85165410582&partnerID=8YFLogxK
U2 - 10.1109/TAI.2023.3297576
DO - 10.1109/TAI.2023.3297576
M3 - Article
AN - SCOPUS:85165410582
SN - 2691-4581
VL - 5
SP - 1554
EP - 1562
JO - IEEE Transactions on Artificial Intelligence
JF - IEEE Transactions on Artificial Intelligence
IS - 4
ER -