Multi-labelled proteins recognition for high-throughput microscopy images using deep convolutional neural networks

Enze Zhang; Boheng Zhang; Shaohan Hu; Fa Zhang; Zhiyong Liu; Xiaohua Wan

doi:10.1186/s12859-021-04196-3

Multi-labelled proteins recognition for high-throughput microscopy images using deep convolutional neural networks

Enze Zhang, Boheng Zhang, Shaohan Hu, Fa Zhang, Zhiyong Liu, Xiaohua Wan^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

2 Citations (Scopus)

Abstract

Background: Proteins are of extremely vital importance in the human body, and no movement or activity can be performed without proteins. Currently, microscopy imaging technologies developed rapidly are employed to observe proteins in various cells and tissues. In addition, due to the complex and crowded cellular environments as well as various types and sizes of proteins, a considerable number of protein images are generated every day and cannot be classified manually. Therefore, an automatic and accurate method should be designed to properly solve and analyse protein images with mixed patterns. Results: In this paper, we first propose a novel customized architecture with adaptive concatenate pooling and “buffering” layers in the classifier part, which could make the networks more adaptive to training and testing datasets, and develop a novel hard sampler at the end of our network to effectively mine the samples from small classes. Furthermore, a new loss is presented to handle the label imbalance based on the effectiveness of samples. In addition, in our method, several novel and effective optimization strategies are adopted to solve the difficult training-time optimization problem and further increase the accuracy by post-processing. Conclusion: Our methods outperformed the SOTA method of multi-labelled protein classification on the HPA dataset, GapNet-PL, by above 2% in the F1 score. Therefore, experimental results based on the test set split from the Human Protein Atlas dataset show that our methods have good performance in automatically classifying multi-class and multi-labelled high-throughput microscopy protein images.

Original language	English
Article number	327
Journal	BMC Bioinformatics
Volume	22
DOIs	https://doi.org/10.1186/s12859-021-04196-3
Publication status	Published - May 2021
Externally published	Yes

Keywords

DNNs
High-throughput microscopy images
Label imbalance
Multi-class and multi-label
Protein pattern recognition

Access to Document

10.1186/s12859-021-04196-3

Cite this

@article{f647566321914450b12e11b3653b289f,

title = "Multi-labelled proteins recognition for high-throughput microscopy images using deep convolutional neural networks",

abstract = "Background: Proteins are of extremely vital importance in the human body, and no movement or activity can be performed without proteins. Currently, microscopy imaging technologies developed rapidly are employed to observe proteins in various cells and tissues. In addition, due to the complex and crowded cellular environments as well as various types and sizes of proteins, a considerable number of protein images are generated every day and cannot be classified manually. Therefore, an automatic and accurate method should be designed to properly solve and analyse protein images with mixed patterns. Results: In this paper, we first propose a novel customized architecture with adaptive concatenate pooling and “buffering” layers in the classifier part, which could make the networks more adaptive to training and testing datasets, and develop a novel hard sampler at the end of our network to effectively mine the samples from small classes. Furthermore, a new loss is presented to handle the label imbalance based on the effectiveness of samples. In addition, in our method, several novel and effective optimization strategies are adopted to solve the difficult training-time optimization problem and further increase the accuracy by post-processing. Conclusion: Our methods outperformed the SOTA method of multi-labelled protein classification on the HPA dataset, GapNet-PL, by above 2% in the F1 score. Therefore, experimental results based on the test set split from the Human Protein Atlas dataset show that our methods have good performance in automatically classifying multi-class and multi-labelled high-throughput microscopy protein images.",

keywords = "DNNs, High-throughput microscopy images, Label imbalance, Multi-class and multi-label, Protein pattern recognition",

author = "Enze Zhang and Boheng Zhang and Shaohan Hu and Fa Zhang and Zhiyong Liu and Xiaohua Wan",

note = "Publisher Copyright: {\textcopyright} 2021, The Author(s).",

year = "2021",

month = may,

doi = "10.1186/s12859-021-04196-3",

language = "English",

volume = "22",

journal = "BMC Bioinformatics",

issn = "1471-2105",

publisher = "BioMed Central Ltd.",

}

TY - JOUR

T1 - Multi-labelled proteins recognition for high-throughput microscopy images using deep convolutional neural networks

AU - Zhang, Enze

AU - Zhang, Boheng

AU - Hu, Shaohan

AU - Zhang, Fa

AU - Liu, Zhiyong

AU - Wan, Xiaohua

PY - 2021/5

Y1 - 2021/5

N2 - Background: Proteins are of extremely vital importance in the human body, and no movement or activity can be performed without proteins. Currently, microscopy imaging technologies developed rapidly are employed to observe proteins in various cells and tissues. In addition, due to the complex and crowded cellular environments as well as various types and sizes of proteins, a considerable number of protein images are generated every day and cannot be classified manually. Therefore, an automatic and accurate method should be designed to properly solve and analyse protein images with mixed patterns. Results: In this paper, we first propose a novel customized architecture with adaptive concatenate pooling and “buffering” layers in the classifier part, which could make the networks more adaptive to training and testing datasets, and develop a novel hard sampler at the end of our network to effectively mine the samples from small classes. Furthermore, a new loss is presented to handle the label imbalance based on the effectiveness of samples. In addition, in our method, several novel and effective optimization strategies are adopted to solve the difficult training-time optimization problem and further increase the accuracy by post-processing. Conclusion: Our methods outperformed the SOTA method of multi-labelled protein classification on the HPA dataset, GapNet-PL, by above 2% in the F1 score. Therefore, experimental results based on the test set split from the Human Protein Atlas dataset show that our methods have good performance in automatically classifying multi-class and multi-labelled high-throughput microscopy protein images.

AB - Background: Proteins are of extremely vital importance in the human body, and no movement or activity can be performed without proteins. Currently, microscopy imaging technologies developed rapidly are employed to observe proteins in various cells and tissues. In addition, due to the complex and crowded cellular environments as well as various types and sizes of proteins, a considerable number of protein images are generated every day and cannot be classified manually. Therefore, an automatic and accurate method should be designed to properly solve and analyse protein images with mixed patterns. Results: In this paper, we first propose a novel customized architecture with adaptive concatenate pooling and “buffering” layers in the classifier part, which could make the networks more adaptive to training and testing datasets, and develop a novel hard sampler at the end of our network to effectively mine the samples from small classes. Furthermore, a new loss is presented to handle the label imbalance based on the effectiveness of samples. In addition, in our method, several novel and effective optimization strategies are adopted to solve the difficult training-time optimization problem and further increase the accuracy by post-processing. Conclusion: Our methods outperformed the SOTA method of multi-labelled protein classification on the HPA dataset, GapNet-PL, by above 2% in the F1 score. Therefore, experimental results based on the test set split from the Human Protein Atlas dataset show that our methods have good performance in automatically classifying multi-class and multi-labelled high-throughput microscopy protein images.

KW - DNNs

KW - High-throughput microscopy images

KW - Label imbalance

KW - Multi-class and multi-label

KW - Protein pattern recognition

UR - http://www.scopus.com/inward/record.url?scp=85108092599&partnerID=8YFLogxK

U2 - 10.1186/s12859-021-04196-3

DO - 10.1186/s12859-021-04196-3

M3 - Article

C2 - 34130623

AN - SCOPUS:85108092599

SN - 1471-2105

VL - 22

JO - BMC Bioinformatics

JF - BMC Bioinformatics

M1 - 327

ER -

Multi-labelled proteins recognition for high-throughput microscopy images using deep convolutional neural networks

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this