Design based incomplete U-statistics

Xiangshun Kong; Wei Zheng

doi:10.5705/ss.202019.0098

Design based incomplete U-statistics

Xiangshun Kong, Wei Zheng^*

^*此作品的通讯作者

数学学院

University of Tennessee, Knoxville

科研成果: 期刊稿件 › 文章 › 同行评审

4 引用（Scopus）

摘要

U-statistics are widely used in fields such as economics, machine learning, and statistics. However, while they enjoy desirable statistical properties, they have an obvious drawback in that the computation becomes impractical as the data size n increases. Specifically, the number of combinations, say m, that a U-statistic of order d has to evaluate is O(n^d). Many efforts have been made to approximate the original U-statistic using a small subset of combinations since Blom (1976), who referred to such an approximation as an incomplete U-statistic. To the best of our knowledge, all existing methods require m to grow at least faster than n, albeit more slowly than n^d, in order for the corresponding incomplete U-statistic to be asymptotically efficient in terms of the mean squared error. In this paper, we introduce a new type of incomplete U-statistic that can be asymptotically efficient, even when m grows more slowly than n. In some cases, m is only required to grow faster than √n. Our theoretical and empirical results both show significant improvements in the statistical efficiency of the new incomplete U-statistic.

源语言	英语
页（从-至）	1593-1618
页数	26
期刊	Statistica Sinica
卷	31
期	3
DOI	https://doi.org/10.5705/ss.202019.0098
出版状态	已出版 - 7月 2021

访问文件

10.5705/ss.202019.0098

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{113b6fc934f842c686ab5e82e7254242,

title = "Design based incomplete U-statistics",

abstract = "U-statistics are widely used in fields such as economics, machine learning, and statistics. However, while they enjoy desirable statistical properties, they have an obvious drawback in that the computation becomes impractical as the data size n increases. Specifically, the number of combinations, say m, that a U-statistic of order d has to evaluate is O(nd). Many efforts have been made to approximate the original U-statistic using a small subset of combinations since Blom (1976), who referred to such an approximation as an incomplete U-statistic. To the best of our knowledge, all existing methods require m to grow at least faster than n, albeit more slowly than nd, in order for the corresponding incomplete U-statistic to be asymptotically efficient in terms of the mean squared error. In this paper, we introduce a new type of incomplete U-statistic that can be asymptotically efficient, even when m grows more slowly than n. In some cases, m is only required to grow faster than √n. Our theoretical and empirical results both show significant improvements in the statistical efficiency of the new incomplete U-statistic.",

keywords = "Asymptotically efficient, BIBD, Big data, Design of experiment, Subsampling",

author = "Xiangshun Kong and Wei Zheng",

year = "2021",

month = jul,

doi = "10.5705/ss.202019.0098",

language = "English",

volume = "31",

pages = "1593--1618",

journal = "Statistica Sinica",

issn = "1017-0405",

publisher = "Institute of Statistical Science",

number = "3",

}

TY - JOUR

T1 - Design based incomplete U-statistics

AU - Kong, Xiangshun

AU - Zheng, Wei

PY - 2021/7

Y1 - 2021/7

N2 - U-statistics are widely used in fields such as economics, machine learning, and statistics. However, while they enjoy desirable statistical properties, they have an obvious drawback in that the computation becomes impractical as the data size n increases. Specifically, the number of combinations, say m, that a U-statistic of order d has to evaluate is O(nd). Many efforts have been made to approximate the original U-statistic using a small subset of combinations since Blom (1976), who referred to such an approximation as an incomplete U-statistic. To the best of our knowledge, all existing methods require m to grow at least faster than n, albeit more slowly than nd, in order for the corresponding incomplete U-statistic to be asymptotically efficient in terms of the mean squared error. In this paper, we introduce a new type of incomplete U-statistic that can be asymptotically efficient, even when m grows more slowly than n. In some cases, m is only required to grow faster than √n. Our theoretical and empirical results both show significant improvements in the statistical efficiency of the new incomplete U-statistic.

AB - U-statistics are widely used in fields such as economics, machine learning, and statistics. However, while they enjoy desirable statistical properties, they have an obvious drawback in that the computation becomes impractical as the data size n increases. Specifically, the number of combinations, say m, that a U-statistic of order d has to evaluate is O(nd). Many efforts have been made to approximate the original U-statistic using a small subset of combinations since Blom (1976), who referred to such an approximation as an incomplete U-statistic. To the best of our knowledge, all existing methods require m to grow at least faster than n, albeit more slowly than nd, in order for the corresponding incomplete U-statistic to be asymptotically efficient in terms of the mean squared error. In this paper, we introduce a new type of incomplete U-statistic that can be asymptotically efficient, even when m grows more slowly than n. In some cases, m is only required to grow faster than √n. Our theoretical and empirical results both show significant improvements in the statistical efficiency of the new incomplete U-statistic.

KW - Asymptotically efficient

KW - BIBD

KW - Big data

KW - Design of experiment

KW - Subsampling

UR - http://www.scopus.com/inward/record.url?scp=85114141679&partnerID=8YFLogxK

U2 - 10.5705/ss.202019.0098

DO - 10.5705/ss.202019.0098

M3 - Article

AN - SCOPUS:85114141679

SN - 1017-0405

VL - 31

SP - 1593

EP - 1618

JO - Statistica Sinica

JF - Statistica Sinica

IS - 3

ER -

Design based incomplete U-statistics

摘要

访问文件

其它文件与链接

指纹

引用此