Vertical classification of web pages for structured data extraction

Long Li, Dandan Song*, Lejian Liao

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

We propose a general hierarchical vertical classification framework, which can automatically discover the inherent hierarchical structure of relationships among verticals based on flat datasets, and then build a hierarchical classifier. We conducted a set of comparison experiments to verify the performance of it, such as with flat vs hierarchical structure of relationships, as well as among different feature selection and classification methods. Experimental results show that the hierarchical classifiers built on the basis of the proposed framework make big improvements over the flat classifiers when classifying unseen web pages. Among them, the Support Vector Machine using Odds Ratio to select discriminative features performs best.

Original languageEnglish
Title of host publicationInformation Retrieval Technology - 8th Asia Information Retrieval Societies Conference, AIRS 2012, Proceedings
Pages486-495
Number of pages10
DOIs
Publication statusPublished - 2012
Event8th Asia Information Retrieval Societies Conference, AIRS 2012 - Tianjin, China
Duration: 17 Dec 201219 Dec 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7675 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference8th Asia Information Retrieval Societies Conference, AIRS 2012
Country/TerritoryChina
CityTianjin
Period17/12/1219/12/12

Keywords

  • Automatic hierarchy
  • Hierarchical classifiers
  • Structured data extracting
  • Vertical classification

Fingerprint

Dive into the research topics of 'Vertical classification of web pages for structured data extraction'. Together they form a unique fingerprint.

Cite this