Optimal subsampling for large-scale quantile regression

Mingyao Ai; Fei Wang; Jun Yu; Huiming Zhang

doi:10.1016/j.jco.2020.101512

Optimal subsampling for large-scale quantile regression

Mingyao Ai, Fei Wang, Jun Yu^*, Huiming Zhang

^*Corresponding author for this work

School of Mathematics and Statistics

Peking University

Research output: Contribution to journal › Article › peer-review

40 Citations (Scopus)

Abstract

To deal with massive data sets, subsampling is known as an effective method which can significantly reduce computational costs in estimating model parameters. In this article, an efficient subsampling method is developed for large-scale quantile regression via Poisson sampling framework, which can solve the memory constraint problem imposed by big data. Under some mild conditions, large sample properties for the estimator involving the weak and strong consistencies, and asymptotic normality are established. Furthermore, the optimal subsampling probabilities are derived according to the A-optimality criterion. It is shown that the estimator based on the optimal subsampling asymptotically achieves a smaller variance than that by the uniform random subsampling. The proposed method is illustrated and evaluated through numerical analyses on both simulated and real data sets.

Original language	English
Article number	101512
Journal	Journal of Complexity
Volume	62
DOIs	https://doi.org/10.1016/j.jco.2020.101512
Publication status	Published - Feb 2021

Keywords

A-optimality
Law of the iterated logarithm
Massive data
Non-informative sampling
Poisson sampling

Access to Document

10.1016/j.jco.2020.101512

Cite this

@article{4f08f66ddc0a49d0afd5cda6cf5ad5b8,

title = "Optimal subsampling for large-scale quantile regression",

abstract = "To deal with massive data sets, subsampling is known as an effective method which can significantly reduce computational costs in estimating model parameters. In this article, an efficient subsampling method is developed for large-scale quantile regression via Poisson sampling framework, which can solve the memory constraint problem imposed by big data. Under some mild conditions, large sample properties for the estimator involving the weak and strong consistencies, and asymptotic normality are established. Furthermore, the optimal subsampling probabilities are derived according to the A-optimality criterion. It is shown that the estimator based on the optimal subsampling asymptotically achieves a smaller variance than that by the uniform random subsampling. The proposed method is illustrated and evaluated through numerical analyses on both simulated and real data sets.",

keywords = "A-optimality, Law of the iterated logarithm, Massive data, Non-informative sampling, Poisson sampling",

author = "Mingyao Ai and Fei Wang and Jun Yu and Huiming Zhang",

note = "Publisher Copyright: {\textcopyright} 2020 Elsevier Inc.",

year = "2021",

month = feb,

doi = "10.1016/j.jco.2020.101512",

language = "English",

volume = "62",

journal = "Journal of Complexity",

issn = "0885-064X",

publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - Optimal subsampling for large-scale quantile regression

AU - Ai, Mingyao

AU - Wang, Fei

AU - Yu, Jun

AU - Zhang, Huiming

PY - 2021/2

Y1 - 2021/2

N2 - To deal with massive data sets, subsampling is known as an effective method which can significantly reduce computational costs in estimating model parameters. In this article, an efficient subsampling method is developed for large-scale quantile regression via Poisson sampling framework, which can solve the memory constraint problem imposed by big data. Under some mild conditions, large sample properties for the estimator involving the weak and strong consistencies, and asymptotic normality are established. Furthermore, the optimal subsampling probabilities are derived according to the A-optimality criterion. It is shown that the estimator based on the optimal subsampling asymptotically achieves a smaller variance than that by the uniform random subsampling. The proposed method is illustrated and evaluated through numerical analyses on both simulated and real data sets.

AB - To deal with massive data sets, subsampling is known as an effective method which can significantly reduce computational costs in estimating model parameters. In this article, an efficient subsampling method is developed for large-scale quantile regression via Poisson sampling framework, which can solve the memory constraint problem imposed by big data. Under some mild conditions, large sample properties for the estimator involving the weak and strong consistencies, and asymptotic normality are established. Furthermore, the optimal subsampling probabilities are derived according to the A-optimality criterion. It is shown that the estimator based on the optimal subsampling asymptotically achieves a smaller variance than that by the uniform random subsampling. The proposed method is illustrated and evaluated through numerical analyses on both simulated and real data sets.

KW - A-optimality

KW - Law of the iterated logarithm

KW - Massive data

KW - Non-informative sampling

KW - Poisson sampling

UR - http://www.scopus.com/inward/record.url?scp=85089003024&partnerID=8YFLogxK

U2 - 10.1016/j.jco.2020.101512

DO - 10.1016/j.jco.2020.101512

M3 - Article

AN - SCOPUS:85089003024

SN - 0885-064X

VL - 62

JO - Journal of Complexity

JF - Journal of Complexity

M1 - 101512

ER -

Optimal subsampling for large-scale quantile regression

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this