摘要
More and more attention has been paid to the issue of sequence mining. In this paper, a new clustering algorithm for categorical sequences is proposed. For the property that sequences have unequal length, we introduce a similarity measure for clustering of categorical and sequential attributes. The similarity measure is derived from the regular sequence alignment and is based on the idea of dynamic programming. The relative distance between element pairs is used to compute the similarity value for two sequences. The sequence similarity measure is applied in the traditional hierarchical clustering algorithm to cluster sequences. Using a splice dataset and synthetic datasets, we show the quality of clusters generated by our proposed approach and the scalability of our algorithm.
源语言 | 英语 |
---|---|
页(从-至) | 1575-1581 |
页数 | 7 |
期刊 | Journal of Computational Information Systems |
卷 | 7 |
期 | 5 |
出版状态 | 已出版 - 5月 2011 |