Abstract
Dirichlet compound multinomial manifold (DCM manifold) is proposed. DCM manifold with positive sphere manifold is homeomorphic and isometric, so the geodesic distance of positive sphere manifold can be mapped as the geodesic distance of DCM manifold through pullback mapping. Then the distance metric is built on DCM manifold. DCM diffusion kernel function and DCMIDF diffusion kernel function are built on DCM manifold. The performance of the proposed algorithms for text classification are tested on the corpuses of WebKB Top 4 and 20 Newsgroups, and the experimental results show that DCM manifold is more desirable than that of Euclidean space in modeling texts on the corpuses. Compared with polynomial kernel based support vector machine and NGD kernel based support vector machine, the proposed DCM diffusion kernel and DCMIDF diffusion kernel based support vector machine algorithms show better computational accuracy for text classification.
Original language | English |
---|---|
Pages (from-to) | 339-345 |
Number of pages | 7 |
Journal | Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence |
Volume | 25 |
Issue number | 2 |
Publication status | Published - Apr 2012 |
Keywords
- Diffusion kernel
- Dirichlet distribution
- Statistical manifold
- Text classification