TY - JOUR
T1 - HCluWin
T2 - An algorithm for clustering heterogeneous data streams over sliding windows
AU - Ren, Jiadong
AU - Hu, Changzhen
AU - Ma, Ruiqing
PY - 2010/5
Y1 - 2010/5
N2 - Many applications in web usage mining, such as business intelligence and usage characterization, require effective and efficient techniques to discover the users with similar usage patterns and, the web pages with correlate contents in the physical world. Clustering click streams can help to achieve the goal. Despite the high processing rate, the existing methods for clustering click streams over sliding widows suffer from the missing of categorical attributes in click stream, data. In this paper, we present HCluWin, an approach for clustering heterogeneous data, streams which contain both continuous attributes and, categorical attributes over sliding windows. A Heterogeneous Temporal Cluster Feature (HTCF) is introduced, to m,onitor the distribution statistics of heterogeneous data, points. Based, on this structure, Exponential Histogram, of Heterogeneous Cluster Feature (EHHCF) is presented. Simultaneously, a, new similarity m,ea,sure between two heterogeneous objects is proposed. Experimental results show that the clustering quality of HCluWin is higher than CluWin and, the stream, processing rate of HCluWin is higher than HCluStream,.
AB - Many applications in web usage mining, such as business intelligence and usage characterization, require effective and efficient techniques to discover the users with similar usage patterns and, the web pages with correlate contents in the physical world. Clustering click streams can help to achieve the goal. Despite the high processing rate, the existing methods for clustering click streams over sliding widows suffer from the missing of categorical attributes in click stream, data. In this paper, we present HCluWin, an approach for clustering heterogeneous data, streams which contain both continuous attributes and, categorical attributes over sliding windows. A Heterogeneous Temporal Cluster Feature (HTCF) is introduced, to m,onitor the distribution statistics of heterogeneous data, points. Based, on this structure, Exponential Histogram, of Heterogeneous Cluster Feature (EHHCF) is presented. Simultaneously, a, new similarity m,ea,sure between two heterogeneous objects is proposed. Experimental results show that the clustering quality of HCluWin is higher than CluWin and, the stream, processing rate of HCluWin is higher than HCluStream,.
KW - Clustering
KW - Data stream
KW - Heterogeneous attribute
KW - Sliding windows
UR - http://www.scopus.com/inward/record.url?scp=77953016224&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:77953016224
SN - 1349-4198
VL - 6
SP - 2171
EP - 2179
JO - International Journal of Innovative Computing, Information and Control
JF - International Journal of Innovative Computing, Information and Control
IS - 5
ER -