跳到主要导航 跳到搜索 跳到主要内容

A review on design inspired subsampling for big data

  • Jun Yu
  • , Mingyao Ai*
  • , Zhiqiang Ye
  • *此作品的通讯作者
  • Peking University

科研成果: 期刊稿件文章同行评审

摘要

Subsampling focuses on selecting a subsample that can efficiently sketch the information of the original data in terms of statistical inference. It provides a powerful tool in big data analysis and gains the attention of data scientists in recent years. In this review, some state-of-the-art subsampling methods inspired by statistical design are summarized. Three types of designs, namely optimal design, orthogonal design, and space filling design, have shown their great potential in subsampling for different objectives. The relationships between experimental designs and the related subsampling approaches are discussed. Specifically, two major families of design inspired subsampling techniques are presented. The first aims to select a subsample in accordance with some optimal design criteria. The second tries to find a subsample that meets some design requirements, including balancing, orthogonality, and uniformity. Simulated and real data examples are provided to compare these methods empirically.

源语言英语
页(从-至)467-510
页数44
期刊Statistical Papers
65
2
DOI
出版状态已出版 - 4月 2024

指纹

探究 'A review on design inspired subsampling for big data' 的科研主题。它们共同构成独一无二的指纹。

引用此