MS-Ranker: Accumulating evidence from potentially correct candidates via reinforcement learning for answer selection

Yingxue Zhang; Fandong Meng; Peng Li; Ping Jian; Jie Zhou

doi:10.1016/j.neucom.2021.03.083

MS-Ranker: Accumulating evidence from potentially correct candidates via reinforcement learning for answer selection

Yingxue Zhang^*, Fandong Meng, Peng Li, Ping Jian, Jie Zhou

^*Corresponding author for this work

School of Computer Science and Technology

Tencent

Research output: Contribution to journal › Article › peer-review

2 Citations (Scopus)

Abstract

Answer selection (AS) aims to select correct answers for a question from an answer candidate set. Conventional AS methods generally address this task by independently matching the question and each candidate. However, since the matching information between the question and a single candidate is usually limited, it is not enough to use the question as the only evidence to estimate the correctness of each candidate. To address this problem, we propose a novel reinforcement learning (RL) based multi-step ranking model, named MS-Ranker, which accumulates candidate. In specific, we explicitly consider the potential correctness of candidates when accumulating information and update the evidence with a gating mechanism. Moreover, as we use a listwise ranking reward, our model learns to pay more attention to the overall performance. Experiments on three benchmarks, namely WikiQA, SemEval-2016 CQA and SelQA, show that our model significantly outperforms existing methods that do not rely on external resources.

Original language	English
Pages (from-to)	270-279
Number of pages	10
Journal	Neurocomputing
Volume	449
DOIs	https://doi.org/10.1016/j.neucom.2021.03.083
Publication status	Published - 18 Aug 2021

Keywords

Answer selection
Gating mechanism
Listwise ranking reward
MS-Ranker
Reinforcement learning

Access to Document

10.1016/j.neucom.2021.03.083

Cite this

@article{5505a5de5fc94aaebaac2c98934cf1a9,

title = "MS-Ranker: Accumulating evidence from potentially correct candidates via reinforcement learning for answer selection",

abstract = "Answer selection (AS) aims to select correct answers for a question from an answer candidate set. Conventional AS methods generally address this task by independently matching the question and each candidate. However, since the matching information between the question and a single candidate is usually limited, it is not enough to use the question as the only evidence to estimate the correctness of each candidate. To address this problem, we propose a novel reinforcement learning (RL) based multi-step ranking model, named MS-Ranker, which accumulates candidate. In specific, we explicitly consider the potential correctness of candidates when accumulating information and update the evidence with a gating mechanism. Moreover, as we use a listwise ranking reward, our model learns to pay more attention to the overall performance. Experiments on three benchmarks, namely WikiQA, SemEval-2016 CQA and SelQA, show that our model significantly outperforms existing methods that do not rely on external resources.",

keywords = "Answer selection, Gating mechanism, Listwise ranking reward, MS-Ranker, Reinforcement learning",

author = "Yingxue Zhang and Fandong Meng and Peng Li and Ping Jian and Jie Zhou",

note = "Publisher Copyright: {\textcopyright} 2021 Elsevier B.V.",

year = "2021",

month = aug,

day = "18",

doi = "10.1016/j.neucom.2021.03.083",

language = "English",

volume = "449",

pages = "270--279",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - MS-Ranker

T2 - Accumulating evidence from potentially correct candidates via reinforcement learning for answer selection

AU - Zhang, Yingxue

AU - Meng, Fandong

AU - Li, Peng

AU - Jian, Ping

AU - Zhou, Jie

PY - 2021/8/18

Y1 - 2021/8/18

N2 - Answer selection (AS) aims to select correct answers for a question from an answer candidate set. Conventional AS methods generally address this task by independently matching the question and each candidate. However, since the matching information between the question and a single candidate is usually limited, it is not enough to use the question as the only evidence to estimate the correctness of each candidate. To address this problem, we propose a novel reinforcement learning (RL) based multi-step ranking model, named MS-Ranker, which accumulates candidate. In specific, we explicitly consider the potential correctness of candidates when accumulating information and update the evidence with a gating mechanism. Moreover, as we use a listwise ranking reward, our model learns to pay more attention to the overall performance. Experiments on three benchmarks, namely WikiQA, SemEval-2016 CQA and SelQA, show that our model significantly outperforms existing methods that do not rely on external resources.

AB - Answer selection (AS) aims to select correct answers for a question from an answer candidate set. Conventional AS methods generally address this task by independently matching the question and each candidate. However, since the matching information between the question and a single candidate is usually limited, it is not enough to use the question as the only evidence to estimate the correctness of each candidate. To address this problem, we propose a novel reinforcement learning (RL) based multi-step ranking model, named MS-Ranker, which accumulates candidate. In specific, we explicitly consider the potential correctness of candidates when accumulating information and update the evidence with a gating mechanism. Moreover, as we use a listwise ranking reward, our model learns to pay more attention to the overall performance. Experiments on three benchmarks, namely WikiQA, SemEval-2016 CQA and SelQA, show that our model significantly outperforms existing methods that do not rely on external resources.

KW - Answer selection

KW - Gating mechanism

KW - Listwise ranking reward

KW - MS-Ranker

KW - Reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=85104711276&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2021.03.083

DO - 10.1016/j.neucom.2021.03.083

M3 - Article

AN - SCOPUS:85104711276

SN - 0925-2312

VL - 449

SP - 270

EP - 279

JO - Neurocomputing

JF - Neurocomputing

ER -

MS-Ranker: Accumulating evidence from potentially correct candidates via reinforcement learning for answer selection

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this