TY - GEN
T1 - Model selection with the covering number of the ball of RKHS
AU - Ding, Lizhong
AU - Liao, Shizhong
N1 - Publisher Copyright:
Copyright 2014 ACM.
PY - 2014/11/3
Y1 - 2014/11/3
N2 - Model selection in kernel methods is the problem of choosing an appropriate hypothesis space for kernel-based learning algorithms to avoid either underfitting or overfitting of the resulting hypothesis. One of main problems faced by model selection is how to control the sample complexity when designing the model selection criterion. In this paper, we take balls of reproducing kernel Hilbert spaces (RKHSs) as candidate hypothesis spaces and propose a novel model selection criterion via minimizing the empirical optimal error in the ball of RKHS and the covering number of the ball. By introducing the covering number to measure the capacity of the ball of RKHS, our criterion could directly control the sample complexity. Specifically, we first prove the relation between expected optimal error and empirical optimal error in the ball of RKHS. Using the relation as the theoretical foundation, we give the definition of our criterion. Then, by estimating the expectation of optimal empirical error and proving an upper bound of the covering number, we represent our criterion as a functional of the kernel matrix. An efficient algorithm is further developed for approximately calculating the functional so that the fast Fourier transform (FFT) can be applied to achieve a quasi-linear computational complexity. We also prove the consistency between the approximate criterion and the accurate one for large enough samples. Finally, we empirically evaluate the performance of our criterion and verify the consistency between the approximate and accurate criterion.
AB - Model selection in kernel methods is the problem of choosing an appropriate hypothesis space for kernel-based learning algorithms to avoid either underfitting or overfitting of the resulting hypothesis. One of main problems faced by model selection is how to control the sample complexity when designing the model selection criterion. In this paper, we take balls of reproducing kernel Hilbert spaces (RKHSs) as candidate hypothesis spaces and propose a novel model selection criterion via minimizing the empirical optimal error in the ball of RKHS and the covering number of the ball. By introducing the covering number to measure the capacity of the ball of RKHS, our criterion could directly control the sample complexity. Specifically, we first prove the relation between expected optimal error and empirical optimal error in the ball of RKHS. Using the relation as the theoretical foundation, we give the definition of our criterion. Then, by estimating the expectation of optimal empirical error and proving an upper bound of the covering number, we represent our criterion as a functional of the kernel matrix. An efficient algorithm is further developed for approximately calculating the functional so that the fast Fourier transform (FFT) can be applied to achieve a quasi-linear computational complexity. We also prove the consistency between the approximate criterion and the accurate one for large enough samples. Finally, we empirically evaluate the performance of our criterion and verify the consistency between the approximate and accurate criterion.
KW - Covering number
KW - Kernel methods
KW - Model selection
KW - Multilevel circulant matrix
UR - http://www.scopus.com/inward/record.url?scp=84937538757&partnerID=8YFLogxK
U2 - 10.1145/2661829.2662034
DO - 10.1145/2661829.2662034
M3 - Conference contribution
AN - SCOPUS:84937538757
T3 - CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management
SP - 1159
EP - 1168
BT - CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery
T2 - 23rd ACM International Conference on Information and Knowledge Management, CIKM 2014
Y2 - 3 November 2014 through 7 November 2014
ER -