TY - JOUR
T1 - Trustworthy machine reading comprehension with conditional adversarial calibration
AU - Wu, Zhijing
AU - Xu, Hua
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2023/6
Y1 - 2023/6
N2 - Machine Reading Comprehension (MRC) has achieved impressive answer inference performance in recent years but rarely considers the trustworthiness and reliability of the deployed systems. However, it is crucial to estimate the predictive uncertainty in real-world applications to measure how likely the prediction is wrong. Hence it is possible to abstain from the uncertain prediction with low confidence and build a trustworthy system. Prior studies use post-processing ways to measure the predictive uncertainty, such as employing heuristic softmax probability or training a calibrator on top of a trained MRC model. However, they only calibrate the confidence without considering the domain adaptation relationship. To handle the limitations, this paper presents TrustMRC, a non-postprocessing trustworthy MRC system that leverages (1) conditional calibration strategy to get reliable uncertainty, and (2) conditional adversarial learning strategy to learn transfer representations under domain shift setting. On the one hand, to estimate the predictive uncertainty, a conditional calibration module is proposed to predict whether the output of the answer prediction module is correct, and it is combined with an additional ECE constraint to restrict the confidence more reliable. On the other hand, for domain shift, TrustMRC designs a conditional adversarial learning strategy to learn transfer representations through a domain discriminator with uncertainty constraints, which takes both input and uncertainty alignment into account. Besides, TrustMRC is a non-postprocessing model that completes the answer prediction and uncertainty prediction in an end-to-end framework, so that these two sub-tasks can benefit from each other via multi-task learning. Instead of traditional EM and F1 metrics, EM-coverage and F1-coverage curves are used, for the trustworthiness-aware MRC evaluation. The experimental results on SQuAD 1.1, Natural Questions, and NewsQA datasets indicate that TrustMRC can make reliable predictions under domain shift settings.
AB - Machine Reading Comprehension (MRC) has achieved impressive answer inference performance in recent years but rarely considers the trustworthiness and reliability of the deployed systems. However, it is crucial to estimate the predictive uncertainty in real-world applications to measure how likely the prediction is wrong. Hence it is possible to abstain from the uncertain prediction with low confidence and build a trustworthy system. Prior studies use post-processing ways to measure the predictive uncertainty, such as employing heuristic softmax probability or training a calibrator on top of a trained MRC model. However, they only calibrate the confidence without considering the domain adaptation relationship. To handle the limitations, this paper presents TrustMRC, a non-postprocessing trustworthy MRC system that leverages (1) conditional calibration strategy to get reliable uncertainty, and (2) conditional adversarial learning strategy to learn transfer representations under domain shift setting. On the one hand, to estimate the predictive uncertainty, a conditional calibration module is proposed to predict whether the output of the answer prediction module is correct, and it is combined with an additional ECE constraint to restrict the confidence more reliable. On the other hand, for domain shift, TrustMRC designs a conditional adversarial learning strategy to learn transfer representations through a domain discriminator with uncertainty constraints, which takes both input and uncertainty alignment into account. Besides, TrustMRC is a non-postprocessing model that completes the answer prediction and uncertainty prediction in an end-to-end framework, so that these two sub-tasks can benefit from each other via multi-task learning. Instead of traditional EM and F1 metrics, EM-coverage and F1-coverage curves are used, for the trustworthiness-aware MRC evaluation. The experimental results on SQuAD 1.1, Natural Questions, and NewsQA datasets indicate that TrustMRC can make reliable predictions under domain shift settings.
KW - Adversarial learning
KW - Domain adaptation
KW - Model uncertainty
KW - Trustworthy machine reading comprehension
UR - http://www.scopus.com/inward/record.url?scp=85141050207&partnerID=8YFLogxK
U2 - 10.1007/s10489-022-04235-3
DO - 10.1007/s10489-022-04235-3
M3 - Article
AN - SCOPUS:85141050207
SN - 0924-669X
VL - 53
SP - 14298
EP - 14315
JO - Applied Intelligence
JF - Applied Intelligence
IS - 11
ER -