Abstract
Many applications in web usage mining, such as business intelligence and usage characterization, require effective and efficient techniques to discover the users with similar usage patterns and, the web pages with correlate contents in the physical world. Clustering click streams can help to achieve the goal. Despite the high processing rate, the existing methods for clustering click streams over sliding widows suffer from the missing of categorical attributes in click stream, data. In this paper, we present HCluWin, an approach for clustering heterogeneous data, streams which contain both continuous attributes and, categorical attributes over sliding windows. A Heterogeneous Temporal Cluster Feature (HTCF) is introduced, to m,onitor the distribution statistics of heterogeneous data, points. Based, on this structure, Exponential Histogram, of Heterogeneous Cluster Feature (EHHCF) is presented. Simultaneously, a, new similarity m,ea,sure between two heterogeneous objects is proposed. Experimental results show that the clustering quality of HCluWin is higher than CluWin and, the stream, processing rate of HCluWin is higher than HCluStream,.
Original language | English |
---|---|
Pages (from-to) | 2171-2179 |
Number of pages | 9 |
Journal | International Journal of Innovative Computing, Information and Control |
Volume | 6 |
Issue number | 5 |
Publication status | Published - May 2010 |
Keywords
- Clustering
- Data stream
- Heterogeneous attribute
- Sliding windows