Technology of web page knowledge acquisition

Si Kang Hu*, Yuan Da Cao

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

Technology of automatic Web text knowledge acquisition is described, based on pseudo-natural language understanding. Web page texts are represented first by domain grammars. The domain grammars are transformed into rules that are used to describe the sentence information and are up to regular expression regulations. Then the Web page texts are transformed into semantic triples that represent Web knowledge by those rules. The semantic triples then form the domain knowledge base. Test data showed that the average recall rate and precision rate of different kinds of Web page data in domain knowledge base is 71.5% and 79.1% separately, as have been formed by the above technology.

Original languageEnglish
Pages (from-to)1065-1068
Number of pages4
JournalBeijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology
Volume26
Issue number12
Publication statusPublished - Dec 2006

Keywords

  • Pseudo-nature language understanding
  • Semantic triple
  • Web page grammar

Fingerprint

Dive into the research topics of 'Technology of web page knowledge acquisition'. Together they form a unique fingerprint.

Cite this