A review on design inspired subsampling for big data

Jun Yu, Mingyao Ai*, Zhiqiang Ye

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

23 Citations (Scopus)

Abstract

Subsampling focuses on selecting a subsample that can efficiently sketch the information of the original data in terms of statistical inference. It provides a powerful tool in big data analysis and gains the attention of data scientists in recent years. In this review, some state-of-the-art subsampling methods inspired by statistical design are summarized. Three types of designs, namely optimal design, orthogonal design, and space filling design, have shown their great potential in subsampling for different objectives. The relationships between experimental designs and the related subsampling approaches are discussed. Specifically, two major families of design inspired subsampling techniques are presented. The first aims to select a subsample in accordance with some optimal design criteria. The second tries to find a subsample that meets some design requirements, including balancing, orthogonality, and uniformity. Simulated and real data examples are provided to compare these methods empirically.

Original languageEnglish
Pages (from-to)467-510
Number of pages44
JournalStatistical Papers
Volume65
Issue number2
DOIs
Publication statusPublished - Apr 2024

Keywords

  • 62D05
  • 62K05
  • 62K86
  • Massive data
  • Optimal design
  • Orthogonal array
  • Space filling design

Fingerprint

Dive into the research topics of 'A review on design inspired subsampling for big data'. Together they form a unique fingerprint.

Cite this