TY - GEN
T1 - Matching top-k answers of twig patterns in probabilistic XML
AU - Ning, Bo
AU - Liu, Chengfei
AU - Yu, Jeffrey Xu
AU - Wang, Guoren
AU - Li, Jianxin
PY - 2010
Y1 - 2010
N2 - The flexibility of XML data model allows a more natural representation of uncertain data compared with the relational model. The top-k matching of a twig pattern against probabilistic XML data is essential. Some classical twig pattern algorithms can be adjusted to process the probabilistic XML. However, as far as finding answers of the top-k probabilities is concerned, the existing algorithms suffer in performance, because many unnecessary intermediate path results, with small probabilities, need to be processed. To cope with this problem, we propose a new encoding scheme called PEDewey for probabilistic XML in this paper. Based on this encoding scheme, we then design two algorithms for finding answers of top-k probabilities for twig queries. One is called ProTJFast, to process probabilistic XML data based on element streams in document order, and the other is called PTopKTwig, based on the element streams ordered by the path probability values. Experiments have been conducted to study the performance of these algorithms.
AB - The flexibility of XML data model allows a more natural representation of uncertain data compared with the relational model. The top-k matching of a twig pattern against probabilistic XML data is essential. Some classical twig pattern algorithms can be adjusted to process the probabilistic XML. However, as far as finding answers of the top-k probabilities is concerned, the existing algorithms suffer in performance, because many unnecessary intermediate path results, with small probabilities, need to be processed. To cope with this problem, we propose a new encoding scheme called PEDewey for probabilistic XML in this paper. Based on this encoding scheme, we then design two algorithms for finding answers of top-k probabilities for twig queries. One is called ProTJFast, to process probabilistic XML data based on element streams in document order, and the other is called PTopKTwig, based on the element streams ordered by the path probability values. Experiments have been conducted to study the performance of these algorithms.
UR - https://www.scopus.com/pages/publications/78650473643
U2 - 10.1007/978-3-642-12026-8_12
DO - 10.1007/978-3-642-12026-8_12
M3 - Conference contribution
AN - SCOPUS:78650473643
SN - 3642120253
SN - 9783642120251
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 125
EP - 139
BT - Database Systems for Advanced Applications - 15th International Conference, DASFAA 2010, Proceedings
T2 - 15th International Conference on Database Systems for Advanced Applications, DASFAA 2010
Y2 - 1 April 2010 through 4 April 2010
ER -