Abstract
Technology of automatic Web text knowledge acquisition is described, based on pseudo-natural language understanding. Web page texts are represented first by domain grammars. The domain grammars are transformed into rules that are used to describe the sentence information and are up to regular expression regulations. Then the Web page texts are transformed into semantic triples that represent Web knowledge by those rules. The semantic triples then form the domain knowledge base. Test data showed that the average recall rate and precision rate of different kinds of Web page data in domain knowledge base is 71.5% and 79.1% separately, as have been formed by the above technology.
Original language | English |
---|---|
Pages (from-to) | 1065-1068 |
Number of pages | 4 |
Journal | Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology |
Volume | 26 |
Issue number | 12 |
Publication status | Published - Dec 2006 |
Keywords
- Pseudo-nature language understanding
- Semantic triple
- Web page grammar