FSCC: Few-Shot Learning for Macromolecule Classification Based on Contrastive Learning and Distribution Calibration in Cryo-Electron Tomography

Shan Gao, Xiangrui Zeng, Min Xu*, Fa Zhang*

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

4 引用 (Scopus)
Plum Print visual indicator of research metrics
  • Citations
    • Citation Indexes: 4
  • Captures
    • Readers: 4
see details

摘要

Cryo-electron tomography (Cryo-ET) is an emerging technology for three-dimensional (3D) visualization of macromolecular structures in the near-native state. To recover structures of macromolecules, millions of diverse macromolecules captured in tomograms should be accurately classified into structurally homogeneous subsets. Although existing supervised deep learning–based methods have improved classification accuracy, such trained models have limited ability to classify novel macromolecules that are unseen in the training stage. To adapt the trained model to the macromolecule classification of a novel class, massive labeled macromolecules of the novel class are needed. However, data labeling is very time-consuming and labor-intensive. In this work, we propose a novel few-shot learning method for the classification of novel macromolecules (named FSCC). A two-stage training strategy is designed in FSCC to enhance the generalization ability of the model to novel macromolecules. First, FSCC uses contrastive learning to pre-train the model on a sufficient number of labeled macromolecules. Second, FSCC uses distribution calibration to re-train the classifier, enabling the model to classify macromolecules of novel classes (unseen class in the pre-training). Distribution calibration transfers learned knowledge in the pre-training stage to novel macromolecules with limited labeled macromolecules of novel class. Experiments were performed on both synthetic and real datasets. On the synthetic datasets, compared with the state-of-the-art (SOTA) method based on supervised deep learning, FSCC achieves competitive performance. To achieve such performance, FSCC only needs five labeled macromolecules per novel class. However, the SOTA method needs 1100 ∼ 1500 labeled macromolecules per novel class. On the real datasets, FSCC improves the accuracy by 5% ∼ 16% when compared to the baseline model. These demonstrate good generalization ability of contrastive learning and calibration distribution to classify novel macromolecules with very few labeled macromolecules.

源语言英语
文章编号931949
期刊Frontiers in Molecular Biosciences
9
DOI
出版状态已出版 - 5 7月 2022
已对外发布

指纹

探究 'FSCC: Few-Shot Learning for Macromolecule Classification Based on Contrastive Learning and Distribution Calibration in Cryo-Electron Tomography' 的科研主题。它们共同构成独一无二的指纹。

引用此

Gao, S., Zeng, X., Xu, M., & Zhang, F. (2022). FSCC: Few-Shot Learning for Macromolecule Classification Based on Contrastive Learning and Distribution Calibration in Cryo-Electron Tomography. Frontiers in Molecular Biosciences, 9, 文章 931949. https://doi.org/10.3389/fmolb.2022.931949