Text classification using diffusion kernel on statistical manifold

Kan Li*, Shi Bin Zhou, Yu Shu Liu

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

Dirichlet compound multinomial manifold (DCM manifold) is proposed. DCM manifold with positive sphere manifold is homeomorphic and isometric, so the geodesic distance of positive sphere manifold can be mapped as the geodesic distance of DCM manifold through pullback mapping. Then the distance metric is built on DCM manifold. DCM diffusion kernel function and DCMIDF diffusion kernel function are built on DCM manifold. The performance of the proposed algorithms for text classification are tested on the corpuses of WebKB Top 4 and 20 Newsgroups, and the experimental results show that DCM manifold is more desirable than that of Euclidean space in modeling texts on the corpuses. Compared with polynomial kernel based support vector machine and NGD kernel based support vector machine, the proposed DCM diffusion kernel and DCMIDF diffusion kernel based support vector machine algorithms show better computational accuracy for text classification.

Original languageEnglish
Pages (from-to)339-345
Number of pages7
JournalMoshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence
Volume25
Issue number2
Publication statusPublished - Apr 2012

Keywords

  • Diffusion kernel
  • Dirichlet distribution
  • Statistical manifold
  • Text classification

Fingerprint

Dive into the research topics of 'Text classification using diffusion kernel on statistical manifold'. Together they form a unique fingerprint.

Cite this