摘要
Representative samples are important for multivariate calibration. The highly efficient selection of representative samples to be labelled can save money and time. Existing methods, such as Kennard-Stone and net analyte signal selection, are usually based on the distance between candidate samples and labelled calibration sets in feature space. However, these distances are influenced by the feature space, which is spanned by an information vector extracted from labelled samples. To overcome the negative effects of the distance-based selection method, a model performance enhancement-based sample selection method is proposed to select calibration samples efficiently. Based on loss function optimization, the samples that can improve model performance the most, as estimated by bootstrap, are sequentially selected and added to the calibration set. Due to the high representation of each sample, a few samples can build a model that has no significant loss of prediction ability when compared with a model built with the large number set of calibration samples. The performance enhancement-based active learning (PEAL) sample selection method is both effective and efficient.
源语言 | 英语 |
---|---|
文章编号 | e3386 |
期刊 | Journal of Chemometrics |
卷 | 36 |
期 | 3 |
DOI | |
出版状态 | 已出版 - 3月 2022 |