MS-Ranker: Accumulating evidence from potentially correct candidates via reinforcement learning for answer selection

Yingxue Zhang; Fandong Meng; Peng Li; Ping Jian; Jie Zhou

doi:10.1016/j.neucom.2021.03.083

MS-Ranker: Accumulating evidence from potentially correct candidates via reinforcement learning for answer selection

Yingxue Zhang^*, Fandong Meng, Peng Li, Ping Jian, Jie Zhou

^*此作品的通讯作者

计算机学院

Tencent

科研成果: 期刊稿件 › 文章 › 同行评审

2 引用（Scopus）

摘要

Answer selection (AS) aims to select correct answers for a question from an answer candidate set. Conventional AS methods generally address this task by independently matching the question and each candidate. However, since the matching information between the question and a single candidate is usually limited, it is not enough to use the question as the only evidence to estimate the correctness of each candidate. To address this problem, we propose a novel reinforcement learning (RL) based multi-step ranking model, named MS-Ranker, which accumulates candidate. In specific, we explicitly consider the potential correctness of candidates when accumulating information and update the evidence with a gating mechanism. Moreover, as we use a listwise ranking reward, our model learns to pay more attention to the overall performance. Experiments on three benchmarks, namely WikiQA, SemEval-2016 CQA and SelQA, show that our model significantly outperforms existing methods that do not rely on external resources.

源语言	英语
页（从-至）	270-279
页数	10
期刊	Neurocomputing
卷	449
DOI	https://doi.org/10.1016/j.neucom.2021.03.083
出版状态	已出版 - 18 8月 2021

访问文件

10.1016/j.neucom.2021.03.083

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{5505a5de5fc94aaebaac2c98934cf1a9,

title = "MS-Ranker: Accumulating evidence from potentially correct candidates via reinforcement learning for answer selection",

abstract = "Answer selection (AS) aims to select correct answers for a question from an answer candidate set. Conventional AS methods generally address this task by independently matching the question and each candidate. However, since the matching information between the question and a single candidate is usually limited, it is not enough to use the question as the only evidence to estimate the correctness of each candidate. To address this problem, we propose a novel reinforcement learning (RL) based multi-step ranking model, named MS-Ranker, which accumulates candidate. In specific, we explicitly consider the potential correctness of candidates when accumulating information and update the evidence with a gating mechanism. Moreover, as we use a listwise ranking reward, our model learns to pay more attention to the overall performance. Experiments on three benchmarks, namely WikiQA, SemEval-2016 CQA and SelQA, show that our model significantly outperforms existing methods that do not rely on external resources.",

keywords = "Answer selection, Gating mechanism, Listwise ranking reward, MS-Ranker, Reinforcement learning",

author = "Yingxue Zhang and Fandong Meng and Peng Li and Ping Jian and Jie Zhou",

note = "Publisher Copyright: {\textcopyright} 2021 Elsevier B.V.",

year = "2021",

month = aug,

day = "18",

doi = "10.1016/j.neucom.2021.03.083",

language = "English",

volume = "449",

pages = "270--279",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - MS-Ranker

T2 - Accumulating evidence from potentially correct candidates via reinforcement learning for answer selection

AU - Zhang, Yingxue

AU - Meng, Fandong

AU - Li, Peng

AU - Jian, Ping

AU - Zhou, Jie

PY - 2021/8/18

Y1 - 2021/8/18

N2 - Answer selection (AS) aims to select correct answers for a question from an answer candidate set. Conventional AS methods generally address this task by independently matching the question and each candidate. However, since the matching information between the question and a single candidate is usually limited, it is not enough to use the question as the only evidence to estimate the correctness of each candidate. To address this problem, we propose a novel reinforcement learning (RL) based multi-step ranking model, named MS-Ranker, which accumulates candidate. In specific, we explicitly consider the potential correctness of candidates when accumulating information and update the evidence with a gating mechanism. Moreover, as we use a listwise ranking reward, our model learns to pay more attention to the overall performance. Experiments on three benchmarks, namely WikiQA, SemEval-2016 CQA and SelQA, show that our model significantly outperforms existing methods that do not rely on external resources.

AB - Answer selection (AS) aims to select correct answers for a question from an answer candidate set. Conventional AS methods generally address this task by independently matching the question and each candidate. However, since the matching information between the question and a single candidate is usually limited, it is not enough to use the question as the only evidence to estimate the correctness of each candidate. To address this problem, we propose a novel reinforcement learning (RL) based multi-step ranking model, named MS-Ranker, which accumulates candidate. In specific, we explicitly consider the potential correctness of candidates when accumulating information and update the evidence with a gating mechanism. Moreover, as we use a listwise ranking reward, our model learns to pay more attention to the overall performance. Experiments on three benchmarks, namely WikiQA, SemEval-2016 CQA and SelQA, show that our model significantly outperforms existing methods that do not rely on external resources.

KW - Answer selection

KW - Gating mechanism

KW - Listwise ranking reward

KW - MS-Ranker

KW - Reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=85104711276&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2021.03.083

DO - 10.1016/j.neucom.2021.03.083

M3 - Article

AN - SCOPUS:85104711276

SN - 0925-2312

VL - 449

SP - 270

EP - 279

JO - Neurocomputing

JF - Neurocomputing

ER -

MS-Ranker: Accumulating evidence from potentially correct candidates via reinforcement learning for answer selection

摘要

访问文件

其它文件与链接

指纹

引用此