Clustering algorithm based on optimal intervals division for high-dimension data streams

Yinzhao Li; Jiadong Ren; Changzheng Hu; Lina Xu

doi:10.1109/ICCSE.2009.5228155

Clustering algorithm based on optimal intervals division for high-dimension data streams

Yinzhao Li^*, Jiadong Ren, Changzheng Hu, Lina Xu

^*Corresponding author for this work

School of Cyberspace Science and Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

1 Citation (Scopus)

Abstract

Clustering for high-dimension data streams is a main focus in the field of clustering research. In order to optimize the clustering process, especially for the large number of candidate subspaces generated in it, optimal segmentation section technology and FP-tree structure are introduced, based on which, DOIC (Dynamic optimal intervals-based cluster) algorithm is proposed. In this paper, the memory-based data partition and optimal intervals division are defined to generate high-density grids for each dimension, which are stored in a High-Density Unit tree (HDU). The HDU-tree is built according to the principle that high-density grids for the same interval in every dimension are stored in the same branch. Thus the process of clustering high-dimension data streams is transformed into that of searching for dense grids in the HDU-tree. By merging HDU-trees, new data streams is inserted and historical data streams is decayed, then the updating of data streams is achieved. The clustering result is returned in the form of DNF expressions timely as requests. The experimental results demonstrate that DOIC has better space scalability and higher clustering quality compared with traditional clustering algorithms.

Original language	English
Title of host publication	Proceedings of 2009 4th International Conference on Computer Science and Education, ICCSE 2009
Pages	783-787
Number of pages	5
DOIs	https://doi.org/10.1109/ICCSE.2009.5228155
Publication status	Published - 2009
Event	2009 4th International Conference on Computer Science and Education, ICCSE 2009 - Nanning, China Duration: 25 Jul 2009 → 28 Jul 2009

Publication series

Name	Proceedings of 2009 4th International Conference on Computer Science and Education, ICCSE 2009

Conference

Conference	2009 4th International Conference on Computer Science and Education, ICCSE 2009
Country/Territory	China
City	Nanning
Period	25/07/09 → 28/07/09

Keywords

Clustering
Data stream
High-dimension
Intervals division

Access to Document

10.1109/ICCSE.2009.5228155

Cite this

Li, Y., Ren, J., Hu, C., & Xu, L. (2009). Clustering algorithm based on optimal intervals division for high-dimension data streams. In Proceedings of 2009 4th International Conference on Computer Science and Education, ICCSE 2009 (pp. 783-787). Article 5228155 (Proceedings of 2009 4th International Conference on Computer Science and Education, ICCSE 2009). https://doi.org/10.1109/ICCSE.2009.5228155

@inproceedings{059f80a602d54a9e85e6bec5d93ded7d,

title = "Clustering algorithm based on optimal intervals division for high-dimension data streams",

abstract = "Clustering for high-dimension data streams is a main focus in the field of clustering research. In order to optimize the clustering process, especially for the large number of candidate subspaces generated in it, optimal segmentation section technology and FP-tree structure are introduced, based on which, DOIC (Dynamic optimal intervals-based cluster) algorithm is proposed. In this paper, the memory-based data partition and optimal intervals division are defined to generate high-density grids for each dimension, which are stored in a High-Density Unit tree (HDU). The HDU-tree is built according to the principle that high-density grids for the same interval in every dimension are stored in the same branch. Thus the process of clustering high-dimension data streams is transformed into that of searching for dense grids in the HDU-tree. By merging HDU-trees, new data streams is inserted and historical data streams is decayed, then the updating of data streams is achieved. The clustering result is returned in the form of DNF expressions timely as requests. The experimental results demonstrate that DOIC has better space scalability and higher clustering quality compared with traditional clustering algorithms.",

keywords = "Clustering, Data stream, High-dimension, Intervals division",

author = "Yinzhao Li and Jiadong Ren and Changzheng Hu and Lina Xu",

year = "2009",

doi = "10.1109/ICCSE.2009.5228155",

language = "English",

isbn = "9781424435210",

series = "Proceedings of 2009 4th International Conference on Computer Science and Education, ICCSE 2009",

pages = "783--787",

booktitle = "Proceedings of 2009 4th International Conference on Computer Science and Education, ICCSE 2009",

note = "2009 4th International Conference on Computer Science and Education, ICCSE 2009 ; Conference date: 25-07-2009 Through 28-07-2009",

}

Li, Y, Ren, J, Hu, C & Xu, L 2009, Clustering algorithm based on optimal intervals division for high-dimension data streams. in Proceedings of 2009 4th International Conference on Computer Science and Education, ICCSE 2009., 5228155, Proceedings of 2009 4th International Conference on Computer Science and Education, ICCSE 2009, pp. 783-787, 2009 4th International Conference on Computer Science and Education, ICCSE 2009, Nanning, China, 25/07/09. https://doi.org/10.1109/ICCSE.2009.5228155

Clustering algorithm based on optimal intervals division for high-dimension data streams. / Li, Yinzhao; Ren, Jiadong; Hu, Changzheng et al.
Proceedings of 2009 4th International Conference on Computer Science and Education, ICCSE 2009. 2009. p. 783-787 5228155 (Proceedings of 2009 4th International Conference on Computer Science and Education, ICCSE 2009).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Clustering algorithm based on optimal intervals division for high-dimension data streams

AU - Li, Yinzhao

AU - Ren, Jiadong

AU - Hu, Changzheng

AU - Xu, Lina

PY - 2009

Y1 - 2009

N2 - Clustering for high-dimension data streams is a main focus in the field of clustering research. In order to optimize the clustering process, especially for the large number of candidate subspaces generated in it, optimal segmentation section technology and FP-tree structure are introduced, based on which, DOIC (Dynamic optimal intervals-based cluster) algorithm is proposed. In this paper, the memory-based data partition and optimal intervals division are defined to generate high-density grids for each dimension, which are stored in a High-Density Unit tree (HDU). The HDU-tree is built according to the principle that high-density grids for the same interval in every dimension are stored in the same branch. Thus the process of clustering high-dimension data streams is transformed into that of searching for dense grids in the HDU-tree. By merging HDU-trees, new data streams is inserted and historical data streams is decayed, then the updating of data streams is achieved. The clustering result is returned in the form of DNF expressions timely as requests. The experimental results demonstrate that DOIC has better space scalability and higher clustering quality compared with traditional clustering algorithms.

AB - Clustering for high-dimension data streams is a main focus in the field of clustering research. In order to optimize the clustering process, especially for the large number of candidate subspaces generated in it, optimal segmentation section technology and FP-tree structure are introduced, based on which, DOIC (Dynamic optimal intervals-based cluster) algorithm is proposed. In this paper, the memory-based data partition and optimal intervals division are defined to generate high-density grids for each dimension, which are stored in a High-Density Unit tree (HDU). The HDU-tree is built according to the principle that high-density grids for the same interval in every dimension are stored in the same branch. Thus the process of clustering high-dimension data streams is transformed into that of searching for dense grids in the HDU-tree. By merging HDU-trees, new data streams is inserted and historical data streams is decayed, then the updating of data streams is achieved. The clustering result is returned in the form of DNF expressions timely as requests. The experimental results demonstrate that DOIC has better space scalability and higher clustering quality compared with traditional clustering algorithms.

KW - Clustering

KW - Data stream

KW - High-dimension

KW - Intervals division

UR - http://www.scopus.com/inward/record.url?scp=70350441318&partnerID=8YFLogxK

U2 - 10.1109/ICCSE.2009.5228155

DO - 10.1109/ICCSE.2009.5228155

M3 - Conference contribution

AN - SCOPUS:70350441318

SN - 9781424435210

T3 - Proceedings of 2009 4th International Conference on Computer Science and Education, ICCSE 2009

SP - 783

EP - 787

BT - Proceedings of 2009 4th International Conference on Computer Science and Education, ICCSE 2009

T2 - 2009 4th International Conference on Computer Science and Education, ICCSE 2009

Y2 - 25 July 2009 through 28 July 2009

ER -

Li Y, Ren J, Hu C, Xu L. Clustering algorithm based on optimal intervals division for high-dimension data streams. In Proceedings of 2009 4th International Conference on Computer Science and Education, ICCSE 2009. 2009. p. 783-787. 5228155. (Proceedings of 2009 4th International Conference on Computer Science and Education, ICCSE 2009). doi: 10.1109/ICCSE.2009.5228155

Clustering algorithm based on optimal intervals division for high-dimension data streams

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this