TY - JOUR
T1 - Importance evaluation of spectral lines in laser-induced breakdown spectroscopy for classification of pathogenic bacteria
AU - Wang, Qianqian
AU - Teng, Geer
AU - Qiao, Xiaolei
AU - Zhao, Y. U.
AU - Kong, Jinglin
AU - Dong, Liqiang
AU - Cui, Xutai
N1 - Publisher Copyright:
© 2018 Optical Society of America.
PY - 2018/11/1
Y1 - 2018/11/1
N2 - The correct classification of pathogenic bacteria is significant for clinical diagnosis and treatment. Compared with the use of whole spectral data, using feature lines as the inputs of the classification model can improve the correct classification rate (CCR) and reduce the analyzing time. In order to select feature lines, we need to investigate the contribution to the CCR of each spectral line. In this paper, two algorithms, important weights based on principal component analysis (IW-PCA) and random forests (RF), were proposed to evaluate the importance of spectra lines. The laser-induced plasma spectra (LIBS) of six common clinical pathogenic bacteria species were measured and a support vector machine (SVM) classifier was used to classify the LIBS of bacteria species. In the proposed IW-PCA algorithm, the product of the loading of each line and the variance of the corresponding principal component were calculated. The maximum product of each line calculated from the first three PCs was used to represent the line’s importance weight. In the RF algorithm, the Gini index reduction value of each line was considered as the line’s importance weight. The experimental results demonstrated that the lines with high importance were more suitable for classification and can be chosen as feature lines. The optimal number of feature lines used in the SVM classifier can be determined by comparing the CCRs with a different number of feature lines. Importance weights evaluated by RF are more suitable for extracting feature lines using LIBS combined with an SVM classification mechanism than those evaluated by IW-PCA. Furthermore, the two methods mutually verified the importance of selected lines and the lines evaluated important by both IW-PCA and RF contributed more to the CCR.
AB - The correct classification of pathogenic bacteria is significant for clinical diagnosis and treatment. Compared with the use of whole spectral data, using feature lines as the inputs of the classification model can improve the correct classification rate (CCR) and reduce the analyzing time. In order to select feature lines, we need to investigate the contribution to the CCR of each spectral line. In this paper, two algorithms, important weights based on principal component analysis (IW-PCA) and random forests (RF), were proposed to evaluate the importance of spectra lines. The laser-induced plasma spectra (LIBS) of six common clinical pathogenic bacteria species were measured and a support vector machine (SVM) classifier was used to classify the LIBS of bacteria species. In the proposed IW-PCA algorithm, the product of the loading of each line and the variance of the corresponding principal component were calculated. The maximum product of each line calculated from the first three PCs was used to represent the line’s importance weight. In the RF algorithm, the Gini index reduction value of each line was considered as the line’s importance weight. The experimental results demonstrated that the lines with high importance were more suitable for classification and can be chosen as feature lines. The optimal number of feature lines used in the SVM classifier can be determined by comparing the CCRs with a different number of feature lines. Importance weights evaluated by RF are more suitable for extracting feature lines using LIBS combined with an SVM classification mechanism than those evaluated by IW-PCA. Furthermore, the two methods mutually verified the importance of selected lines and the lines evaluated important by both IW-PCA and RF contributed more to the CCR.
UR - http://www.scopus.com/inward/record.url?scp=85056603083&partnerID=8YFLogxK
U2 - 10.1364/BOE.9.005837
DO - 10.1364/BOE.9.005837
M3 - Article
AN - SCOPUS:85056603083
SN - 2156-7085
VL - 9
SP - 5837
EP - 5850
JO - Biomedical Optics Express
JF - Biomedical Optics Express
IS - 11
M1 - #341441
ER -