跳到主要导航 跳到搜索 跳到主要内容

Graph-based AJAX crawl: Mining data from rich internet applications

  • Zhaomeng Peng*
  • , Nengqiang He
  • , Chunxiao Jiang
  • , Zhihua Li
  • , Lei Xu
  • , Yipeng Li
  • , Yong Ren
  • *此作品的通讯作者
  • Tsinghua University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

AJAX (Asynchronous JavaScript and XML) is becoming more and more popular with the prosperity of web 2.0. However, traditional crawlers fail to retrieve information from AJAX applications because of complex JavaScript operations. Moreover, a single AJAX application with one URL may have different page states, which violates the rule that one URL corresponds to one unique page. The AJAX application can be modeled as a state transition graph and to crawl AJAX is to traverse the graph without prior knowledge of its structure. In this paper, we have distinguished different AJAX events which are not well defined in previous work and proposed a Graph-based AJAX State Traversal (GAST) algorithm to crawl AJAX with minimal edge visits. If topology of the graph is given, this optimization problem turns into a Directed Rural Postman Problem (DRPP) and the optimal lower bound can be obtained. Experimental results show that the proposed algorithm approaches optimum and exhibits better performance than existing work.

源语言英语
主期刊名Proceedings - 2012 International Conference on Computer Science and Electronics Engineering, ICCSEE 2012
590-594
页数5
DOI
出版状态已出版 - 2012
已对外发布
活动2012 International Conference on Computer Science and Electronics Engineering, ICCSEE 2012 - Hangzhou, Zhejiang, 中国
期限: 23 3月 201225 3月 2012

出版系列

姓名Proceedings - 2012 International Conference on Computer Science and Electronics Engineering, ICCSEE 2012
3

会议

会议2012 International Conference on Computer Science and Electronics Engineering, ICCSEE 2012
国家/地区中国
Hangzhou, Zhejiang
时期23/03/1225/03/12

指纹

探究 'Graph-based AJAX crawl: Mining data from rich internet applications' 的科研主题。它们共同构成独一无二的指纹。

引用此