TY - GEN
T1 - A sequential latent topic-based readability model for domain-specific information retrieval
AU - Zhang, Wenya
AU - Song, Dawei
AU - Zhang, Peng
AU - Zhao, Xiaozhao
AU - Hou, Yuexian
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2015.
PY - 2015
Y1 - 2015
N2 - In domain-specific information retrieval (IR), an emerging problem is how to provide different users with documents that are both relevant and readable, especially for the lay users. In this paper, we propose a novel document readability model to enhance the domain-specific IR. Our model incorporates the coverage and sequential dependency of latent topics in a document. Accordingly, two topical readability indicators, namely Topic Scope and Topic Trace are developed. These indicators, combined with the classical Surface-level indicator, can be used to rerank the initial list of documents returned by a conventional search engine. In order to extract the structured latent topics without supervision, the hierarchical Latent Dirichlet Allocation (hLDA) is used. We have evaluated our model from the user-oriented and system-oriented perspectives, in the medical domain. The user-oriented evaluation shows a good correlation between the readability scores given by our model and human judgments. Furthermore, our model also gains significant improvement in the system-oriented evaluation in comparison with one of the state-of-the-art readability methods.
AB - In domain-specific information retrieval (IR), an emerging problem is how to provide different users with documents that are both relevant and readable, especially for the lay users. In this paper, we propose a novel document readability model to enhance the domain-specific IR. Our model incorporates the coverage and sequential dependency of latent topics in a document. Accordingly, two topical readability indicators, namely Topic Scope and Topic Trace are developed. These indicators, combined with the classical Surface-level indicator, can be used to rerank the initial list of documents returned by a conventional search engine. In order to extract the structured latent topics without supervision, the hierarchical Latent Dirichlet Allocation (hLDA) is used. We have evaluated our model from the user-oriented and system-oriented perspectives, in the medical domain. The user-oriented evaluation shows a good correlation between the readability scores given by our model and human judgments. Furthermore, our model also gains significant improvement in the system-oriented evaluation in comparison with one of the state-of-the-art readability methods.
KW - Documents reranking
KW - Domain-specific retrieval
KW - Readability
UR - http://www.scopus.com/inward/record.url?scp=84958037604&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-28940-3_19
DO - 10.1007/978-3-319-28940-3_19
M3 - Conference contribution
AN - SCOPUS:84958037604
SN - 9783319289397
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 241
EP - 252
BT - Information Retrieval Technology - 11th Asia Information Retrieval Societies Conference, AIRS 2015, Proceedings
A2 - Scholer, Falk
A2 - Zuccon, Guido
A2 - Geva, Shlomo
A2 - Sun, Aixin
A2 - Joho, Hideo
A2 - Zhang, Peng
PB - Springer Verlag
T2 - 11th Asia Information Retrieval Societies Conference, AIRS 2015
Y2 - 2 December 2015 through 4 December 2015
ER -