Abstract
Streaming data presents new challenges to data mining algorithms. To conduct data clustering on the streaming data, this paper proposes a novel incremental clustering approach utilizing Gaussian Mixture Model (GMM), termed as ICGT (Incremental Construction of GMM Tree). The ICGT creates and dynamically adjusts a GMM tree consistent to the sequentially presented data. Each leaf node in the tree corresponds to a dense Gaussian distribution and each non-leaf node to a GMM. To update the GMM tree for insertion of the newly arrived data points, we introduce the definitions of node connectivity and connected subsets, and present the tree update algorithm. We further develop a clustering evaluation criterion and search strategy to determine the final partition of the data set based on the constructed GMM tree. We evaluated the proposed approach on synthetic and real-world data sets and compared ICGT with other incremental and static clustering methods. The experimental results confirm that our approach is effective and promising.
Original language | English |
---|---|
Pages (from-to) | 71-86 |
Number of pages | 16 |
Journal | Data and Knowledge Engineering |
Volume | 117 |
DOIs | |
Publication status | Published - Sept 2018 |
Keywords
- Gaussian mixture model (GMM)
- Incremental data clustering
- Streaming data
- Tree structure