Abstract
Holistic twig query processing techniques based on region encoding have been developed to minimize the intermediate results, namely, those root-to-leaf path matches that are not in the final twig results. These algorithms have to scan all the streams of tags in query patterns. However, useless path matches cannot be completely avoided. TJFast which is based on the labeling scheme of Extended Dewey has been proposed to avoid useless intermediate results, and it only needs to access the labels of the leaf query nodes. However, it don't concern about the characteristics of elements with the same parent, and it has to merge join all the intermediate results which are evaluated during the first phrase. We propose a new labeling scheme to compress the XML elements which have the same characteristic. Based on the compressed path-labeled streams, a new novel holistic twig query algorithm named CPJoin is designed. Finally, implementation results are provided to show that CPJoin has good performance on both real and synthetic data.
Original language | English |
---|---|
Pages (from-to) | 850-854 |
Number of pages | 5 |
Journal | Wuhan University Journal of Natural Sciences |
Volume | 12 |
Issue number | 5 |
DOIs | |
Publication status | Published - Sept 2007 |
Externally published | Yes |
Keywords
- Compressed path labeling
- Twig pattern
- XML