TY - JOUR
T1 - Unsupervised convolutional variational autoencoder deep embedding clustering for Raman spectra
AU - Guo, Yixin
AU - Jin, Weiqi
AU - Wang, Weilin
AU - Guo, Zongyu
AU - He, Yuqing
N1 - Publisher Copyright:
© 2022 The Royal Society of Chemistry.
PY - 2022/9/20
Y1 - 2022/9/20
N2 - Unsupervised deep learning methods place increased emphasis on the process of cluster analysis of unknown samples without requiring sample labels. Clustering algorithms based on deep embedding networks have been recently developed and are widely used in data mining, speech processing and image recognition, but barely any of them have been used on spectra data. This study presents an unsupervised clustering algorithm for Raman spectra, called the convolutional variational autoencoder deep embedding clustering method (CVDE). It improves the network structure of the multi-layer perception (MLP) that is commonly used in other methods based on the VAE-GMM model, like VaDE, by replacing the hidden fully connected layer in the MLP with three convolution layers and two pooling layers for better clustering on the Raman spectra. The three convolution layers extend vertical channels to learn features, while pooling layers directly reduce the horizontal coding dimensions to prevent gradient explosion and overfitting. Furthermore, such network structures can easily incorporate the gradient-weighted class activation mapping (Grad-Cam) method to visualise the importance of spectral features for clustering, facilitating network tuning and spectral difference analysis. Moreover, through comparative experiments, CVDE has proven that it affords better clustering performance than current advanced clustering methods on not only the MNIST dataset but also two sets of Raman spectra: soybean oil Raman spectra with very small Raman feature differences and drug Raman spectra with a small data size. The clustering accuracies of these three datasets reach 94.48%, 90.43% and 98.70% respectively. Thus, CVDE is suitable for applications in static spectra, such as Raman spectra and LIBS spectra, and is more versatile than supervised methods in the spectral and chemical analysis fields.
AB - Unsupervised deep learning methods place increased emphasis on the process of cluster analysis of unknown samples without requiring sample labels. Clustering algorithms based on deep embedding networks have been recently developed and are widely used in data mining, speech processing and image recognition, but barely any of them have been used on spectra data. This study presents an unsupervised clustering algorithm for Raman spectra, called the convolutional variational autoencoder deep embedding clustering method (CVDE). It improves the network structure of the multi-layer perception (MLP) that is commonly used in other methods based on the VAE-GMM model, like VaDE, by replacing the hidden fully connected layer in the MLP with three convolution layers and two pooling layers for better clustering on the Raman spectra. The three convolution layers extend vertical channels to learn features, while pooling layers directly reduce the horizontal coding dimensions to prevent gradient explosion and overfitting. Furthermore, such network structures can easily incorporate the gradient-weighted class activation mapping (Grad-Cam) method to visualise the importance of spectral features for clustering, facilitating network tuning and spectral difference analysis. Moreover, through comparative experiments, CVDE has proven that it affords better clustering performance than current advanced clustering methods on not only the MNIST dataset but also two sets of Raman spectra: soybean oil Raman spectra with very small Raman feature differences and drug Raman spectra with a small data size. The clustering accuracies of these three datasets reach 94.48%, 90.43% and 98.70% respectively. Thus, CVDE is suitable for applications in static spectra, such as Raman spectra and LIBS spectra, and is more versatile than supervised methods in the spectral and chemical analysis fields.
UR - http://www.scopus.com/inward/record.url?scp=85139880547&partnerID=8YFLogxK
U2 - 10.1039/d2ay01184k
DO - 10.1039/d2ay01184k
M3 - Article
C2 - 36169059
AN - SCOPUS:85139880547
SN - 1759-9660
VL - 14
SP - 3898
EP - 3910
JO - Analytical Methods
JF - Analytical Methods
IS - 39
ER -