A sequential latent topic-based readability model for domain-specific information retrieval

  • Wenya Zhang
  • , Dawei Song
  • , Peng Zhang*
  • , Xiaozhao Zhao
  • , Yuexian Hou
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In domain-specific information retrieval (IR), an emerging problem is how to provide different users with documents that are both relevant and readable, especially for the lay users. In this paper, we propose a novel document readability model to enhance the domain-specific IR. Our model incorporates the coverage and sequential dependency of latent topics in a document. Accordingly, two topical readability indicators, namely Topic Scope and Topic Trace are developed. These indicators, combined with the classical Surface-level indicator, can be used to rerank the initial list of documents returned by a conventional search engine. In order to extract the structured latent topics without supervision, the hierarchical Latent Dirichlet Allocation (hLDA) is used. We have evaluated our model from the user-oriented and system-oriented perspectives, in the medical domain. The user-oriented evaluation shows a good correlation between the readability scores given by our model and human judgments. Furthermore, our model also gains significant improvement in the system-oriented evaluation in comparison with one of the state-of-the-art readability methods.

Original languageEnglish
Title of host publicationInformation Retrieval Technology - 11th Asia Information Retrieval Societies Conference, AIRS 2015, Proceedings
EditorsFalk Scholer, Guido Zuccon, Shlomo Geva, Aixin Sun, Hideo Joho, Peng Zhang
PublisherSpringer Verlag
Pages241-252
Number of pages12
ISBN (Print)9783319289397
DOIs
Publication statusPublished - 2015
Externally publishedYes
Event11th Asia Information Retrieval Societies Conference, AIRS 2015 - Brisbane, Australia
Duration: 2 Dec 20154 Dec 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9460
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference11th Asia Information Retrieval Societies Conference, AIRS 2015
Country/TerritoryAustralia
CityBrisbane
Period2/12/154/12/15

Keywords

  • Documents reranking
  • Domain-specific retrieval
  • Readability

Fingerprint

Dive into the research topics of 'A sequential latent topic-based readability model for domain-specific information retrieval'. Together they form a unique fingerprint.

Cite this