TY - JOUR
T1 - Design based incomplete U-statistics
AU - Kong, Xiangshun
AU - Zheng, Wei
N1 - Publisher Copyright:
© 2021 Institute of Statistical Science. All rights reserved.
PY - 2021/7
Y1 - 2021/7
N2 - U-statistics are widely used in fields such as economics, machine learning, and statistics. However, while they enjoy desirable statistical properties, they have an obvious drawback in that the computation becomes impractical as the data size n increases. Specifically, the number of combinations, say m, that a U-statistic of order d has to evaluate is O(nd). Many efforts have been made to approximate the original U-statistic using a small subset of combinations since Blom (1976), who referred to such an approximation as an incomplete U-statistic. To the best of our knowledge, all existing methods require m to grow at least faster than n, albeit more slowly than nd, in order for the corresponding incomplete U-statistic to be asymptotically efficient in terms of the mean squared error. In this paper, we introduce a new type of incomplete U-statistic that can be asymptotically efficient, even when m grows more slowly than n. In some cases, m is only required to grow faster than √n. Our theoretical and empirical results both show significant improvements in the statistical efficiency of the new incomplete U-statistic.
AB - U-statistics are widely used in fields such as economics, machine learning, and statistics. However, while they enjoy desirable statistical properties, they have an obvious drawback in that the computation becomes impractical as the data size n increases. Specifically, the number of combinations, say m, that a U-statistic of order d has to evaluate is O(nd). Many efforts have been made to approximate the original U-statistic using a small subset of combinations since Blom (1976), who referred to such an approximation as an incomplete U-statistic. To the best of our knowledge, all existing methods require m to grow at least faster than n, albeit more slowly than nd, in order for the corresponding incomplete U-statistic to be asymptotically efficient in terms of the mean squared error. In this paper, we introduce a new type of incomplete U-statistic that can be asymptotically efficient, even when m grows more slowly than n. In some cases, m is only required to grow faster than √n. Our theoretical and empirical results both show significant improvements in the statistical efficiency of the new incomplete U-statistic.
KW - Asymptotically efficient
KW - BIBD
KW - Big data
KW - Design of experiment
KW - Subsampling
UR - http://www.scopus.com/inward/record.url?scp=85114141679&partnerID=8YFLogxK
U2 - 10.5705/ss.202019.0098
DO - 10.5705/ss.202019.0098
M3 - Article
AN - SCOPUS:85114141679
SN - 1017-0405
VL - 31
SP - 1593
EP - 1618
JO - Statistica Sinica
JF - Statistica Sinica
IS - 3
ER -