TY - GEN
T1 - Clustering algorithm based on optimal intervals division for high-dimension data streams
AU - Li, Yinzhao
AU - Ren, Jiadong
AU - Hu, Changzheng
AU - Xu, Lina
PY - 2009
Y1 - 2009
N2 - Clustering for high-dimension data streams is a main focus in the field of clustering research. In order to optimize the clustering process, especially for the large number of candidate subspaces generated in it, optimal segmentation section technology and FP-tree structure are introduced, based on which, DOIC (Dynamic optimal intervals-based cluster) algorithm is proposed. In this paper, the memory-based data partition and optimal intervals division are defined to generate high-density grids for each dimension, which are stored in a High-Density Unit tree (HDU). The HDU-tree is built according to the principle that high-density grids for the same interval in every dimension are stored in the same branch. Thus the process of clustering high-dimension data streams is transformed into that of searching for dense grids in the HDU-tree. By merging HDU-trees, new data streams is inserted and historical data streams is decayed, then the updating of data streams is achieved. The clustering result is returned in the form of DNF expressions timely as requests. The experimental results demonstrate that DOIC has better space scalability and higher clustering quality compared with traditional clustering algorithms.
AB - Clustering for high-dimension data streams is a main focus in the field of clustering research. In order to optimize the clustering process, especially for the large number of candidate subspaces generated in it, optimal segmentation section technology and FP-tree structure are introduced, based on which, DOIC (Dynamic optimal intervals-based cluster) algorithm is proposed. In this paper, the memory-based data partition and optimal intervals division are defined to generate high-density grids for each dimension, which are stored in a High-Density Unit tree (HDU). The HDU-tree is built according to the principle that high-density grids for the same interval in every dimension are stored in the same branch. Thus the process of clustering high-dimension data streams is transformed into that of searching for dense grids in the HDU-tree. By merging HDU-trees, new data streams is inserted and historical data streams is decayed, then the updating of data streams is achieved. The clustering result is returned in the form of DNF expressions timely as requests. The experimental results demonstrate that DOIC has better space scalability and higher clustering quality compared with traditional clustering algorithms.
KW - Clustering
KW - Data stream
KW - High-dimension
KW - Intervals division
UR - http://www.scopus.com/inward/record.url?scp=70350441318&partnerID=8YFLogxK
U2 - 10.1109/ICCSE.2009.5228155
DO - 10.1109/ICCSE.2009.5228155
M3 - Conference contribution
AN - SCOPUS:70350441318
SN - 9781424435210
T3 - Proceedings of 2009 4th International Conference on Computer Science and Education, ICCSE 2009
SP - 783
EP - 787
BT - Proceedings of 2009 4th International Conference on Computer Science and Education, ICCSE 2009
T2 - 2009 4th International Conference on Computer Science and Education, ICCSE 2009
Y2 - 25 July 2009 through 28 July 2009
ER -