TY - GEN
T1 - Graph-based AJAX crawl
T2 - 2012 International Conference on Computer Science and Electronics Engineering, ICCSEE 2012
AU - Peng, Zhaomeng
AU - He, Nengqiang
AU - Jiang, Chunxiao
AU - Li, Zhihua
AU - Xu, Lei
AU - Li, Yipeng
AU - Ren, Yong
PY - 2012
Y1 - 2012
N2 - AJAX (Asynchronous JavaScript and XML) is becoming more and more popular with the prosperity of web 2.0. However, traditional crawlers fail to retrieve information from AJAX applications because of complex JavaScript operations. Moreover, a single AJAX application with one URL may have different page states, which violates the rule that one URL corresponds to one unique page. The AJAX application can be modeled as a state transition graph and to crawl AJAX is to traverse the graph without prior knowledge of its structure. In this paper, we have distinguished different AJAX events which are not well defined in previous work and proposed a Graph-based AJAX State Traversal (GAST) algorithm to crawl AJAX with minimal edge visits. If topology of the graph is given, this optimization problem turns into a Directed Rural Postman Problem (DRPP) and the optimal lower bound can be obtained. Experimental results show that the proposed algorithm approaches optimum and exhibits better performance than existing work.
AB - AJAX (Asynchronous JavaScript and XML) is becoming more and more popular with the prosperity of web 2.0. However, traditional crawlers fail to retrieve information from AJAX applications because of complex JavaScript operations. Moreover, a single AJAX application with one URL may have different page states, which violates the rule that one URL corresponds to one unique page. The AJAX application can be modeled as a state transition graph and to crawl AJAX is to traverse the graph without prior knowledge of its structure. In this paper, we have distinguished different AJAX events which are not well defined in previous work and proposed a Graph-based AJAX State Traversal (GAST) algorithm to crawl AJAX with minimal edge visits. If topology of the graph is given, this optimization problem turns into a Directed Rural Postman Problem (DRPP) and the optimal lower bound can be obtained. Experimental results show that the proposed algorithm approaches optimum and exhibits better performance than existing work.
KW - AJAX Crawl
KW - Directed Rural Postman Problem
KW - State Transition Graph
KW - State Traversal
UR - https://www.scopus.com/pages/publications/84861067377
U2 - 10.1109/ICCSEE.2012.38
DO - 10.1109/ICCSEE.2012.38
M3 - Conference contribution
AN - SCOPUS:84861067377
SN - 9780769546476
T3 - Proceedings - 2012 International Conference on Computer Science and Electronics Engineering, ICCSEE 2012
SP - 590
EP - 594
BT - Proceedings - 2012 International Conference on Computer Science and Electronics Engineering, ICCSEE 2012
Y2 - 23 March 2012 through 25 March 2012
ER -