A distance metric-based space-filling subsampling method for nonparametric models

Huaimin Diao, Dianpeng Wang, Xu He*

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

摘要

Taking subset samples from the original data set is an efficient and popular strategy to handle massive data that is too large to be directly modeled. To optimize inference and prediction accuracy, it is crucial to employ a subsampling scheme to collect observations intelligently. In this paper, we propose a space-filling subsampling method that uses distance metric-based strata to select subsamples from high-volume data sets. To minimize the maximal distance from pairs of samples that locate in the same stratum, Voronoi cells of thinnest covering lattices are used to partition the input space. In addition, subsamples that are space-filling according to the response are collected from each stratum. With the help of an algorithm to quickly identify the cell an observation locates in, the computational cost of our subsampling method is proportional to the number of observations and irrelevant to the number of cells, which makes our method applicable to extremely large data sets. Results from simulated studies and real data analysis show that the new method is remarkably better than existing approaches when used in conjunction with Gaussian process models.

源语言英语
页(从-至)3247-3273
页数27
期刊Electronic Journal of Statistics
18
2
DOI
出版状态已出版 - 2024

指纹

探究 'A distance metric-based space-filling subsampling method for nonparametric models' 的科研主题。它们共同构成独一无二的指纹。

引用此