Technology of web page knowledge acquisition

Si Kang Hu; Yuan Da Cao

Technology of web page knowledge acquisition

Si Kang Hu^*, Yuan Da Cao

^*Corresponding author for this work

School of Computer Science and Technology

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

2 Citations (Scopus)

Abstract

Technology of automatic Web text knowledge acquisition is described, based on pseudo-natural language understanding. Web page texts are represented first by domain grammars. The domain grammars are transformed into rules that are used to describe the sentence information and are up to regular expression regulations. Then the Web page texts are transformed into semantic triples that represent Web knowledge by those rules. The semantic triples then form the domain knowledge base. Test data showed that the average recall rate and precision rate of different kinds of Web page data in domain knowledge base is 71.5% and 79.1% separately, as have been formed by the above technology.

Original language	English
Pages (from-to)	1065-1068
Number of pages	4
Journal	Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology
Volume	26
Issue number	12
Publication status	Published - Dec 2006

Keywords

Pseudo-nature language understanding
Semantic triple
Web page grammar

Cite this

@article{bc65cfa73b5a43159534d288e9aa394a,

title = "Technology of web page knowledge acquisition",

abstract = "Technology of automatic Web text knowledge acquisition is described, based on pseudo-natural language understanding. Web page texts are represented first by domain grammars. The domain grammars are transformed into rules that are used to describe the sentence information and are up to regular expression regulations. Then the Web page texts are transformed into semantic triples that represent Web knowledge by those rules. The semantic triples then form the domain knowledge base. Test data showed that the average recall rate and precision rate of different kinds of Web page data in domain knowledge base is 71.5% and 79.1% separately, as have been formed by the above technology.",

keywords = "Pseudo-nature language understanding, Semantic triple, Web page grammar",

author = "Hu, {Si Kang} and Cao, {Yuan Da}",

year = "2006",

month = dec,

language = "English",

volume = "26",

pages = "1065--1068",

journal = "Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology",

issn = "1001-0645",

publisher = "Beijing Institute of Technology",

number = "12",

}

TY - JOUR

T1 - Technology of web page knowledge acquisition

AU - Hu, Si Kang

AU - Cao, Yuan Da

PY - 2006/12

Y1 - 2006/12

N2 - Technology of automatic Web text knowledge acquisition is described, based on pseudo-natural language understanding. Web page texts are represented first by domain grammars. The domain grammars are transformed into rules that are used to describe the sentence information and are up to regular expression regulations. Then the Web page texts are transformed into semantic triples that represent Web knowledge by those rules. The semantic triples then form the domain knowledge base. Test data showed that the average recall rate and precision rate of different kinds of Web page data in domain knowledge base is 71.5% and 79.1% separately, as have been formed by the above technology.

AB - Technology of automatic Web text knowledge acquisition is described, based on pseudo-natural language understanding. Web page texts are represented first by domain grammars. The domain grammars are transformed into rules that are used to describe the sentence information and are up to regular expression regulations. Then the Web page texts are transformed into semantic triples that represent Web knowledge by those rules. The semantic triples then form the domain knowledge base. Test data showed that the average recall rate and precision rate of different kinds of Web page data in domain knowledge base is 71.5% and 79.1% separately, as have been formed by the above technology.

KW - Pseudo-nature language understanding

KW - Semantic triple

KW - Web page grammar

UR - http://www.scopus.com/inward/record.url?scp=33846925693&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:33846925693

SN - 1001-0645

VL - 26

SP - 1065

EP - 1068

JO - Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology

JF - Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology

IS - 12

ER -

Technology of web page knowledge acquisition

Abstract

Keywords

Other files and links

Fingerprint

Cite this