TY - JOUR
T1 - Computing structural similarity of source XML schemas against domain XML schema
AU - Li, Jianxin
AU - Liu, Chengfei
AU - Yu, Jeffrey Xu
AU - Liu, Jixue
AU - Wang, Guoren
AU - Yangt, Chi
PY - 2008
Y1 - 2008
N2 - In this paper, we study the problem of measuring structural similarities of large number of source schemas against a single domain schema, which is useful for enhancing the quality of searching and ranking big volume of source documents on the Web with the help of structural information. After analyzing the improperness of adopting existing edit-distance based methods, we propose a new similarity measure model that caters for the requirements of the problem. Given the asymmetric nature of the similarity comparisons of source schemas with a domain schema, similarity preserving rules and algorithm are designed to filter out uninteresting elements in source schemas for the purpose of optimizing the similarity computation. Based on the model, a basic algorithm and an improved algorithm are developed for structural similarity computation. The improved algorithm makes full use of a new coding scheme that is devised to reduce the number of comparisons. Complexities of both algorithms are analyzed and extensive experiments are conducted showing the significant performance gain achieved by the improved algorithm.
AB - In this paper, we study the problem of measuring structural similarities of large number of source schemas against a single domain schema, which is useful for enhancing the quality of searching and ranking big volume of source documents on the Web with the help of structural information. After analyzing the improperness of adopting existing edit-distance based methods, we propose a new similarity measure model that caters for the requirements of the problem. Given the asymmetric nature of the similarity comparisons of source schemas with a domain schema, similarity preserving rules and algorithm are designed to filter out uninteresting elements in source schemas for the purpose of optimizing the similarity computation. Based on the model, a basic algorithm and an improved algorithm are developed for structural similarity computation. The improved algorithm makes full use of a new coding scheme that is devised to reduce the number of comparisons. Complexities of both algorithms are analyzed and extensive experiments are conducted showing the significant performance gain achieved by the improved algorithm.
KW - Structural similarity
KW - Xml schema
UR - http://www.scopus.com/inward/record.url?scp=84873287266&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:84873287266
SN - 1445-1336
VL - 75
SP - 155
EP - 164
JO - Conferences in Research and Practice in Information Technology Series
JF - Conferences in Research and Practice in Information Technology Series
T2 - 19th Australasian Database Conference, ADC 2008
Y2 - 1 January 2008 through 1 January 2008
ER -