A Subsampling Strategy for AIC-based Model Averaging with Generalized Linear Models

Jun Yu; Hai Ying Wang; Mingyao Ai

doi:10.1080/00401706.2024.2407310

A Subsampling Strategy for AIC-based Model Averaging with Generalized Linear Models

Jun Yu, Hai Ying Wang^*, Mingyao Ai

^*Corresponding author for this work

School of Mathematics and Statistics

Research output: Contribution to journal › Article › peer-review

Abstract

Subsampling is an effective approach to address computational challenges associated with massive datasets. However, existing subsampling methods do not consider model uncertainty. In this article, we investigate the subsampling technique for the Akaike information criterion (AIC) and extend the subsampling method to the smoothed AIC model-averaging framework in the context of generalized linear models. By correcting the asymptotic bias of the maximized subsample objective function used to approximate the Kullback–Leibler divergence, we derive the form of the AIC based on the subsample. We then provide a subsampling strategy for the smoothed AIC model-averaging estimator and study the corresponding asymptotic properties of the loss and the resulting estimator. A practically implementable algorithm is developed, and its performance is evaluated through numerical experiments on both real and simulated datasets.

Original language	English
Journal	Technometrics
DOIs	https://doi.org/10.1080/00401706.2024.2407310
Publication status	Accepted/In press - 2024

Keywords

Big data
Information criterion
Nonuniform
Smoothed AIC
Subsampling

Access to Document

10.1080/00401706.2024.2407310

Cite this

Yu, J., Wang, H. Y., & Ai, M. (Accepted/In press). A Subsampling Strategy for AIC-based Model Averaging with Generalized Linear Models. Technometrics. https://doi.org/10.1080/00401706.2024.2407310

@article{5e7279e06633414580f690a6378f7476,

title = "A Subsampling Strategy for AIC-based Model Averaging with Generalized Linear Models",

abstract = "Subsampling is an effective approach to address computational challenges associated with massive datasets. However, existing subsampling methods do not consider model uncertainty. In this article, we investigate the subsampling technique for the Akaike information criterion (AIC) and extend the subsampling method to the smoothed AIC model-averaging framework in the context of generalized linear models. By correcting the asymptotic bias of the maximized subsample objective function used to approximate the Kullback–Leibler divergence, we derive the form of the AIC based on the subsample. We then provide a subsampling strategy for the smoothed AIC model-averaging estimator and study the corresponding asymptotic properties of the loss and the resulting estimator. A practically implementable algorithm is developed, and its performance is evaluated through numerical experiments on both real and simulated datasets.",

keywords = "Big data, Information criterion, Nonuniform, Smoothed AIC, Subsampling",

author = "Jun Yu and Wang, {Hai Ying} and Mingyao Ai",

note = "Publisher Copyright: {\textcopyright} 2024 American Statistical Association and the American Society for Quality.",

year = "2024",

doi = "10.1080/00401706.2024.2407310",

language = "English",

journal = "Technometrics",

issn = "0040-1706",

publisher = "Taylor and Francis Ltd.",

}

TY - JOUR

T1 - A Subsampling Strategy for AIC-based Model Averaging with Generalized Linear Models

AU - Yu, Jun

AU - Wang, Hai Ying

AU - Ai, Mingyao

PY - 2024

Y1 - 2024

N2 - Subsampling is an effective approach to address computational challenges associated with massive datasets. However, existing subsampling methods do not consider model uncertainty. In this article, we investigate the subsampling technique for the Akaike information criterion (AIC) and extend the subsampling method to the smoothed AIC model-averaging framework in the context of generalized linear models. By correcting the asymptotic bias of the maximized subsample objective function used to approximate the Kullback–Leibler divergence, we derive the form of the AIC based on the subsample. We then provide a subsampling strategy for the smoothed AIC model-averaging estimator and study the corresponding asymptotic properties of the loss and the resulting estimator. A practically implementable algorithm is developed, and its performance is evaluated through numerical experiments on both real and simulated datasets.

AB - Subsampling is an effective approach to address computational challenges associated with massive datasets. However, existing subsampling methods do not consider model uncertainty. In this article, we investigate the subsampling technique for the Akaike information criterion (AIC) and extend the subsampling method to the smoothed AIC model-averaging framework in the context of generalized linear models. By correcting the asymptotic bias of the maximized subsample objective function used to approximate the Kullback–Leibler divergence, we derive the form of the AIC based on the subsample. We then provide a subsampling strategy for the smoothed AIC model-averaging estimator and study the corresponding asymptotic properties of the loss and the resulting estimator. A practically implementable algorithm is developed, and its performance is evaluated through numerical experiments on both real and simulated datasets.

KW - Big data

KW - Information criterion

KW - Nonuniform

KW - Smoothed AIC

KW - Subsampling

UR - http://www.scopus.com/inward/record.url?scp=85213360308&partnerID=8YFLogxK

U2 - 10.1080/00401706.2024.2407310

DO - 10.1080/00401706.2024.2407310

M3 - Article

AN - SCOPUS:85213360308

SN - 0040-1706

JO - Technometrics

JF - Technometrics

ER -

A Subsampling Strategy for AIC-based Model Averaging with Generalized Linear Models

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this