Abstract
Identifying the existence and locations of change points has been a broadly encountered task in many statistical application areas. The existing change point detection methods may produce unsatisfactory results for high-dimensional data since certain distributional assumptions are made on data, which are hard to verify in practice. Moreover, some parameters (such as the number of change points) need to be estimated beforehand for some methods, making their powers sensitive to these values. Here, we propose a kernel-based (Figure presented.) -statistic to identify change points (KUCP) for high dimensional data, which is free of distributional assumptions and sup-parameter estimations. Specifically, we employ a kernel function to describe similarities among the subjects and construct a (Figure presented.) -statistic to test the existence of change point for a given location. The asymptotic properties of the (Figure presented.) -statistic are deduced. We also develop a procedure to locate the change points sequentially via a dichotomy algorithm. Extensive simulations demonstrate that KUCP has higher sensitivity in identifying existence of change points and higher accuracy in locating these change points than its counterparts. We further illustrate its practical utility by analyzing a gene expression data of human brain to detect the time point when gene expression profiles begin to change, which has been reported to be closely related with aging brain.
Original language | English |
---|---|
Pages (from-to) | 4644-4663 |
Number of pages | 20 |
Journal | Statistics in Medicine |
Volume | 42 |
Issue number | 25 |
DOIs | |
Publication status | Published - 10 Nov 2023 |
Keywords
- -statistic
- change point detection
- gene expression profile
- high dimensional data
- kernel-based method