Performance enhancement-based active learning sample selection method

Zhonghai He; Shijie Song; Kun Shen; Xiaofang Zhang

doi:10.1002/cem.3386

Performance enhancement-based active learning sample selection method

Zhonghai He^*, Shijie Song, Kun Shen, Xiaofang Zhang

^*此作品的通讯作者

光电学院

科研成果: 期刊稿件 › 文章 › 同行评审

4 引用（Scopus）

摘要

Representative samples are important for multivariate calibration. The highly efficient selection of representative samples to be labelled can save money and time. Existing methods, such as Kennard-Stone and net analyte signal selection, are usually based on the distance between candidate samples and labelled calibration sets in feature space. However, these distances are influenced by the feature space, which is spanned by an information vector extracted from labelled samples. To overcome the negative effects of the distance-based selection method, a model performance enhancement-based sample selection method is proposed to select calibration samples efficiently. Based on loss function optimization, the samples that can improve model performance the most, as estimated by bootstrap, are sequentially selected and added to the calibration set. Due to the high representation of each sample, a few samples can build a model that has no significant loss of prediction ability when compared with a model built with the large number set of calibration samples. The performance enhancement-based active learning (PEAL) sample selection method is both effective and efficient.

源语言	英语
文章编号	e3386
期刊	Journal of Chemometrics
卷	36
期	3
DOI	https://doi.org/10.1002/cem.3386
出版状态	已出版 - 3月 2022

访问文件

10.1002/cem.3386

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{c7e51717bb004a03a350ad59fd114c5e,

title = "Performance enhancement-based active learning sample selection method",

abstract = "Representative samples are important for multivariate calibration. The highly efficient selection of representative samples to be labelled can save money and time. Existing methods, such as Kennard-Stone and net analyte signal selection, are usually based on the distance between candidate samples and labelled calibration sets in feature space. However, these distances are influenced by the feature space, which is spanned by an information vector extracted from labelled samples. To overcome the negative effects of the distance-based selection method, a model performance enhancement-based sample selection method is proposed to select calibration samples efficiently. Based on loss function optimization, the samples that can improve model performance the most, as estimated by bootstrap, are sequentially selected and added to the calibration set. Due to the high representation of each sample, a few samples can build a model that has no significant loss of prediction ability when compared with a model built with the large number set of calibration samples. The performance enhancement-based active learning (PEAL) sample selection method is both effective and efficient.",

keywords = "bootstrap modelling, feature space, parsimonious sample selection, performance enhancement, set representation",

author = "Zhonghai He and Shijie Song and Kun Shen and Xiaofang Zhang",

note = "Publisher Copyright: {\textcopyright} 2022 John Wiley & Sons, Ltd.",

year = "2022",

month = mar,

doi = "10.1002/cem.3386",

language = "English",

volume = "36",

journal = "Journal of Chemometrics",

issn = "0886-9383",

publisher = "John Wiley and Sons Ltd",

number = "3",

}

TY - JOUR

T1 - Performance enhancement-based active learning sample selection method

AU - He, Zhonghai

AU - Song, Shijie

AU - Shen, Kun

AU - Zhang, Xiaofang

PY - 2022/3

Y1 - 2022/3

N2 - Representative samples are important for multivariate calibration. The highly efficient selection of representative samples to be labelled can save money and time. Existing methods, such as Kennard-Stone and net analyte signal selection, are usually based on the distance between candidate samples and labelled calibration sets in feature space. However, these distances are influenced by the feature space, which is spanned by an information vector extracted from labelled samples. To overcome the negative effects of the distance-based selection method, a model performance enhancement-based sample selection method is proposed to select calibration samples efficiently. Based on loss function optimization, the samples that can improve model performance the most, as estimated by bootstrap, are sequentially selected and added to the calibration set. Due to the high representation of each sample, a few samples can build a model that has no significant loss of prediction ability when compared with a model built with the large number set of calibration samples. The performance enhancement-based active learning (PEAL) sample selection method is both effective and efficient.

AB - Representative samples are important for multivariate calibration. The highly efficient selection of representative samples to be labelled can save money and time. Existing methods, such as Kennard-Stone and net analyte signal selection, are usually based on the distance between candidate samples and labelled calibration sets in feature space. However, these distances are influenced by the feature space, which is spanned by an information vector extracted from labelled samples. To overcome the negative effects of the distance-based selection method, a model performance enhancement-based sample selection method is proposed to select calibration samples efficiently. Based on loss function optimization, the samples that can improve model performance the most, as estimated by bootstrap, are sequentially selected and added to the calibration set. Due to the high representation of each sample, a few samples can build a model that has no significant loss of prediction ability when compared with a model built with the large number set of calibration samples. The performance enhancement-based active learning (PEAL) sample selection method is both effective and efficient.

KW - bootstrap modelling

KW - feature space

KW - parsimonious sample selection

KW - performance enhancement

KW - set representation

UR - http://www.scopus.com/inward/record.url?scp=85123751033&partnerID=8YFLogxK

U2 - 10.1002/cem.3386

DO - 10.1002/cem.3386

M3 - Article

AN - SCOPUS:85123751033

SN - 0886-9383

VL - 36

JO - Journal of Chemometrics

JF - Journal of Chemometrics

IS - 3

M1 - e3386

ER -

Performance enhancement-based active learning sample selection method

摘要

访问文件

其它文件与链接

指纹

引用此