Multi-resolution subsampling for linear classification with massive data

Haolin Chen, Holger Dette, Jun Yu*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Subsampling is one of the popular methods to balance statistical efficiency and computational efficiency in the big data era. Most approaches aim to select informative or representative sample points to achieve good overall information of the full data. The present work takes the view that sampling techniques are recommended for the region we focus on and summary measures are enough to collect the information for the rest according to a well-designed data partitioning. We propose a subsampling strategy that collects global information described by summary measures and local information obtained from selected subsample points. Thus, we call it multi-resolution subsampling. We show that the proposed method leads to a more efficient subsample-based estimator for general linear classification problems. Some asymptotic properties of the proposed method are established and connections to existing subsampling procedures are explored. Finally, we illustrate the proposed subsampling strategy via simulated and real-world examples.

Original languageEnglish
Pages (from-to)1260-1280
Number of pages21
JournalJournal of the Royal Statistical Society. Series B: Statistical Methodology
Volume87
Issue number4
DOIs
Publication statusPublished - 1 Sept 2025
Externally publishedYes

Keywords

  • classification
  • linear projection
  • M-estimator
  • optimal design
  • Rao-Blackwellization

Fingerprint

Dive into the research topics of 'Multi-resolution subsampling for linear classification with massive data'. Together they form a unique fingerprint.

Cite this