TY - GEN
T1 - An improved outlier detection method in high-dimension based on weighted hypergraph
AU - Li, Yin Zhao
AU - Wu, Di
AU - Ren, Jia Dong
AU - Hu, Chang Zhen
PY - 2009
Y1 - 2009
N2 - Outlier detection in high-dimensional space is a hot topic in data mining, the main goal is to find out a small quantity of data objects with abnormal behavior in data set. In this paper, the concepts of the feature vector and the attribute similarity are defined, an improved algorithm SWHOT based on weighed hypergraph model for outlier detection in high dimensional space is presented. The objects in high dimensional space are translated into binary data type, by looking for the hyperedge of binary set, the data set hypergarph model is established, meanwhile, the weight of the hyperedge is equal to the value of the attribute similarity. In addition, the objects of the hypergraph are clustered by CURE algorithm, arbitrary shaped clusters can be identified. Furthermore, the outliers are found according to the point-to-window weighted support, the point-to-class belongingness and the point-to-window weighted deviation of size, the meaningful outliers in high-dimension can be mined by means of appropriate user-defined threshold. Experimental results show that SWHOT can improve scaling and precision.
AB - Outlier detection in high-dimensional space is a hot topic in data mining, the main goal is to find out a small quantity of data objects with abnormal behavior in data set. In this paper, the concepts of the feature vector and the attribute similarity are defined, an improved algorithm SWHOT based on weighed hypergraph model for outlier detection in high dimensional space is presented. The objects in high dimensional space are translated into binary data type, by looking for the hyperedge of binary set, the data set hypergarph model is established, meanwhile, the weight of the hyperedge is equal to the value of the attribute similarity. In addition, the objects of the hypergraph are clustered by CURE algorithm, arbitrary shaped clusters can be identified. Furthermore, the outliers are found according to the point-to-window weighted support, the point-to-class belongingness and the point-to-window weighted deviation of size, the meaningful outliers in high-dimension can be mined by means of appropriate user-defined threshold. Experimental results show that SWHOT can improve scaling and precision.
KW - Clustering
KW - Hypergraph
KW - Outlier detection
KW - Similarity
KW - Weight
UR - http://www.scopus.com/inward/record.url?scp=74049092295&partnerID=8YFLogxK
U2 - 10.1109/ISECS.2009.54
DO - 10.1109/ISECS.2009.54
M3 - Conference contribution
AN - SCOPUS:74049092295
SN - 9780769536439
T3 - 2nd International Symposium on Electronic Commerce and Security, ISECS 2009
SP - 159
EP - 163
BT - 2nd International Symposium on Electronic Commerce and Security, ISECS 2009
T2 - 2nd International Symposium on Electronic Commerce and Security, ISECS 2009
Y2 - 22 May 2009 through 24 May 2009
ER -