Skip to main navigation Skip to search Skip to main content

Graph-based AJAX crawl: Mining data from rich internet applications

  • Zhaomeng Peng*
  • , Nengqiang He
  • , Chunxiao Jiang
  • , Zhihua Li
  • , Lei Xu
  • , Yipeng Li
  • , Yong Ren
  • *Corresponding author for this work
  • Tsinghua University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

AJAX (Asynchronous JavaScript and XML) is becoming more and more popular with the prosperity of web 2.0. However, traditional crawlers fail to retrieve information from AJAX applications because of complex JavaScript operations. Moreover, a single AJAX application with one URL may have different page states, which violates the rule that one URL corresponds to one unique page. The AJAX application can be modeled as a state transition graph and to crawl AJAX is to traverse the graph without prior knowledge of its structure. In this paper, we have distinguished different AJAX events which are not well defined in previous work and proposed a Graph-based AJAX State Traversal (GAST) algorithm to crawl AJAX with minimal edge visits. If topology of the graph is given, this optimization problem turns into a Directed Rural Postman Problem (DRPP) and the optimal lower bound can be obtained. Experimental results show that the proposed algorithm approaches optimum and exhibits better performance than existing work.

Original languageEnglish
Title of host publicationProceedings - 2012 International Conference on Computer Science and Electronics Engineering, ICCSEE 2012
Pages590-594
Number of pages5
DOIs
Publication statusPublished - 2012
Externally publishedYes
Event2012 International Conference on Computer Science and Electronics Engineering, ICCSEE 2012 - Hangzhou, Zhejiang, China
Duration: 23 Mar 201225 Mar 2012

Publication series

NameProceedings - 2012 International Conference on Computer Science and Electronics Engineering, ICCSEE 2012
Volume3

Conference

Conference2012 International Conference on Computer Science and Electronics Engineering, ICCSEE 2012
Country/TerritoryChina
CityHangzhou, Zhejiang
Period23/03/1225/03/12

Keywords

  • AJAX Crawl
  • Directed Rural Postman Problem
  • State Transition Graph
  • State Traversal

Fingerprint

Dive into the research topics of 'Graph-based AJAX crawl: Mining data from rich internet applications'. Together they form a unique fingerprint.

Cite this