TY - JOUR
T1 - 基于自适应密度聚类的多准则主动学习方法
AU - He, Zhonghai
AU - Zhu, Wenhan
AU - Chen, Xuwang
AU - Zhang, Xiaofang
N1 - Publisher Copyright:
© 2024 Science Press. All rights reserved.
PY - 2024/3
Y1 - 2024/3
N2 - Active learning proves instrumental in training superior machine learning models while minimizing labeling costs. The combination of RD and QBC algorithms effectively addresses issues associated with considering only a single criterion. However, the K-means clustering upon which RD is based may include outliers, leading to a decrease in model performance, and QBC requires maintaining multiple models and indirectly provides sample information. To address these issues, we propose an adaptive density clustering-based Gaussian process regression (ADC-GPR) algorithm, which efficiently selects samples by first clustering and then utilizing uncertainty directly. The ADC clustering in this algorithm is not only robust against outliers but also adapts to the distribution characteristics of the dataset, providing representative sample points and their corresponding clusters for subsequent AL. This method ensures both representativeness and diversity in unsupervised selection and considers informativeness, representativeness, and diversity in supervised selection. The experimental results demonstrate that compared to the RS, KS, and RD-GPR algorithms, the ADC-GPR algorithm exhibits an average performance improvement of 37. 3%, 8%, and 2. 8% respectively, with the same number of sampling iterations. Furthermore, the ADC-GPR algorithm demonstrates higher selection efficiency.
AB - Active learning proves instrumental in training superior machine learning models while minimizing labeling costs. The combination of RD and QBC algorithms effectively addresses issues associated with considering only a single criterion. However, the K-means clustering upon which RD is based may include outliers, leading to a decrease in model performance, and QBC requires maintaining multiple models and indirectly provides sample information. To address these issues, we propose an adaptive density clustering-based Gaussian process regression (ADC-GPR) algorithm, which efficiently selects samples by first clustering and then utilizing uncertainty directly. The ADC clustering in this algorithm is not only robust against outliers but also adapts to the distribution characteristics of the dataset, providing representative sample points and their corresponding clusters for subsequent AL. This method ensures both representativeness and diversity in unsupervised selection and considers informativeness, representativeness, and diversity in supervised selection. The experimental results demonstrate that compared to the RS, KS, and RD-GPR algorithms, the ADC-GPR algorithm exhibits an average performance improvement of 37. 3%, 8%, and 2. 8% respectively, with the same number of sampling iterations. Furthermore, the ADC-GPR algorithm demonstrates higher selection efficiency.
KW - active learning
KW - adaptive density clustering
KW - Gaussian process regression
KW - multi-criteria fusion
KW - outlier robustness
UR - http://www.scopus.com/inward/record.url?scp=85198257137&partnerID=8YFLogxK
U2 - 10.19650/j.cnki.cjsi.J2312180
DO - 10.19650/j.cnki.cjsi.J2312180
M3 - 文章
AN - SCOPUS:85198257137
SN - 0254-3087
VL - 45
SP - 179
EP - 187
JO - Yi Qi Yi Biao Xue Bao/Chinese Journal of Scientific Instrument
JF - Yi Qi Yi Biao Xue Bao/Chinese Journal of Scientific Instrument
IS - 3
ER -