Computing structural similarity of source XML schemas against domain XML schema

Jianxin Li; Chengfei Liu; Jeffrey Xu Yu; Jixue Liu; Guoren Wang; Chi Yangt

Computing structural similarity of source XML schemas against domain XML schema

Jianxin Li^*, Chengfei Liu, Jeffrey Xu Yu, Jixue Liu, Guoren Wang, Chi Yangt

^*此作品的通讯作者

科研成果: 期刊稿件 › 会议文章 › 同行评审

2 引用（Scopus）

摘要

In this paper, we study the problem of measuring structural similarities of large number of source schemas against a single domain schema, which is useful for enhancing the quality of searching and ranking big volume of source documents on the Web with the help of structural information. After analyzing the improperness of adopting existing edit-distance based methods, we propose a new similarity measure model that caters for the requirements of the problem. Given the asymmetric nature of the similarity comparisons of source schemas with a domain schema, similarity preserving rules and algorithm are designed to filter out uninteresting elements in source schemas for the purpose of optimizing the similarity computation. Based on the model, a basic algorithm and an improved algorithm are developed for structural similarity computation. The improved algorithm makes full use of a new coding scheme that is devised to reduce the number of comparisons. Complexities of both algorithms are analyzed and extensive experiments are conducted showing the significant performance gain achieved by the improved algorithm.

源语言	英语
页（从-至）	155-164
页数	10
期刊	Conferences in Research and Practice in Information Technology Series
卷	75
出版状态	已出版 - 2008
已对外发布	是
活动	19th Australasian Database Conference, ADC 2008 - Wollongong, NSW, 澳大利亚期限: 1 1月 2008 → 1 1月 2008

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{41e831496ff24c9090c2e6c3a714b8c7,

title = "Computing structural similarity of source XML schemas against domain XML schema",

abstract = "In this paper, we study the problem of measuring structural similarities of large number of source schemas against a single domain schema, which is useful for enhancing the quality of searching and ranking big volume of source documents on the Web with the help of structural information. After analyzing the improperness of adopting existing edit-distance based methods, we propose a new similarity measure model that caters for the requirements of the problem. Given the asymmetric nature of the similarity comparisons of source schemas with a domain schema, similarity preserving rules and algorithm are designed to filter out uninteresting elements in source schemas for the purpose of optimizing the similarity computation. Based on the model, a basic algorithm and an improved algorithm are developed for structural similarity computation. The improved algorithm makes full use of a new coding scheme that is devised to reduce the number of comparisons. Complexities of both algorithms are analyzed and extensive experiments are conducted showing the significant performance gain achieved by the improved algorithm.",

keywords = "Structural similarity, Xml schema",

author = "Jianxin Li and Chengfei Liu and Yu, {Jeffrey Xu} and Jixue Liu and Guoren Wang and Chi Yangt",

year = "2008",

language = "English",

volume = "75",

pages = "155--164",

journal = "Conferences in Research and Practice in Information Technology Series",

issn = "1445-1336",

note = "19th Australasian Database Conference, ADC 2008 ; Conference date: 01-01-2008 Through 01-01-2008",

}

TY - JOUR

T1 - Computing structural similarity of source XML schemas against domain XML schema

AU - Li, Jianxin

AU - Liu, Chengfei

AU - Yu, Jeffrey Xu

AU - Liu, Jixue

AU - Wang, Guoren

AU - Yangt, Chi

PY - 2008

Y1 - 2008

N2 - In this paper, we study the problem of measuring structural similarities of large number of source schemas against a single domain schema, which is useful for enhancing the quality of searching and ranking big volume of source documents on the Web with the help of structural information. After analyzing the improperness of adopting existing edit-distance based methods, we propose a new similarity measure model that caters for the requirements of the problem. Given the asymmetric nature of the similarity comparisons of source schemas with a domain schema, similarity preserving rules and algorithm are designed to filter out uninteresting elements in source schemas for the purpose of optimizing the similarity computation. Based on the model, a basic algorithm and an improved algorithm are developed for structural similarity computation. The improved algorithm makes full use of a new coding scheme that is devised to reduce the number of comparisons. Complexities of both algorithms are analyzed and extensive experiments are conducted showing the significant performance gain achieved by the improved algorithm.

AB - In this paper, we study the problem of measuring structural similarities of large number of source schemas against a single domain schema, which is useful for enhancing the quality of searching and ranking big volume of source documents on the Web with the help of structural information. After analyzing the improperness of adopting existing edit-distance based methods, we propose a new similarity measure model that caters for the requirements of the problem. Given the asymmetric nature of the similarity comparisons of source schemas with a domain schema, similarity preserving rules and algorithm are designed to filter out uninteresting elements in source schemas for the purpose of optimizing the similarity computation. Based on the model, a basic algorithm and an improved algorithm are developed for structural similarity computation. The improved algorithm makes full use of a new coding scheme that is devised to reduce the number of comparisons. Complexities of both algorithms are analyzed and extensive experiments are conducted showing the significant performance gain achieved by the improved algorithm.

KW - Structural similarity

KW - Xml schema

UR - http://www.scopus.com/inward/record.url?scp=84873287266&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:84873287266

SN - 1445-1336

VL - 75

SP - 155

EP - 164

JO - Conferences in Research and Practice in Information Technology Series

JF - Conferences in Research and Practice in Information Technology Series

T2 - 19th Australasian Database Conference, ADC 2008

Y2 - 1 January 2008 through 1 January 2008

ER -

Computing structural similarity of source XML schemas against domain XML schema

摘要

其它文件与链接

指纹

引用此