TY - JOUR
T1 - Adverse drug reaction detection on social media with deep linguistic features
AU - Zhang, Ying
AU - Cui, Shaoze
AU - Gao, Huiying
N1 - Publisher Copyright:
© 2020 Elsevier Inc.
PY - 2020/6
Y1 - 2020/6
N2 - Adverse reactions caused by drugs are one of the most important public health problems. Social media has encouraged more patients to share their drug use experiences and has become a major source for the detection of professionally unreported adverse drug reactions (ADRs). Since a large number of user posts do not mention any ADR, accurate detection of the presence of ADRs in each user post is necessary before further research can be conducted. Previous feature-based methods focus on extracting more shallow linguistic features that are unable to capture deep and subtle information in the context, ultimately failing to provide satisfactory accuracy. To overcome the limitations of previous studies, this paper proposes a novel method that can extract deep linguistic features and then combine them with shallow linguistic features for ADR detection. We first extract predicate-ADR pairs under the guidance of extended syntactic dependencies and ADR lexicon. Then, we extract semantic and part-of-speech (POS) features for each pair and pool the features of different pairs to generate a holistic representation of deep linguistic features. Finally, we use the collection of deep features and several shallow features to train the predictive models. A series of experiments are performed on data sets collected from DailyStrength and Twitter. Our approach can achieve AUCs of 94.44% and 88.97% on the two data sets, respectively, outperforming other state-of-the-art methods. The results demonstrate the potential benefits of deep linguistic features for ADR detection on social data. This method can be applied to multiple other healthcare and text analysis tasks and can be used to support pharmacovigilance research.
AB - Adverse reactions caused by drugs are one of the most important public health problems. Social media has encouraged more patients to share their drug use experiences and has become a major source for the detection of professionally unreported adverse drug reactions (ADRs). Since a large number of user posts do not mention any ADR, accurate detection of the presence of ADRs in each user post is necessary before further research can be conducted. Previous feature-based methods focus on extracting more shallow linguistic features that are unable to capture deep and subtle information in the context, ultimately failing to provide satisfactory accuracy. To overcome the limitations of previous studies, this paper proposes a novel method that can extract deep linguistic features and then combine them with shallow linguistic features for ADR detection. We first extract predicate-ADR pairs under the guidance of extended syntactic dependencies and ADR lexicon. Then, we extract semantic and part-of-speech (POS) features for each pair and pool the features of different pairs to generate a holistic representation of deep linguistic features. Finally, we use the collection of deep features and several shallow features to train the predictive models. A series of experiments are performed on data sets collected from DailyStrength and Twitter. Our approach can achieve AUCs of 94.44% and 88.97% on the two data sets, respectively, outperforming other state-of-the-art methods. The results demonstrate the potential benefits of deep linguistic features for ADR detection on social data. This method can be applied to multiple other healthcare and text analysis tasks and can be used to support pharmacovigilance research.
KW - Adverse drug reactions
KW - Deep linguistic features
KW - Feature-based method
KW - Pharmacovigilance
KW - Social media
UR - http://www.scopus.com/inward/record.url?scp=85084927040&partnerID=8YFLogxK
U2 - 10.1016/j.jbi.2020.103437
DO - 10.1016/j.jbi.2020.103437
M3 - Article
C2 - 32360987
AN - SCOPUS:85084927040
SN - 1532-0464
VL - 106
JO - Journal of Biomedical Informatics
JF - Journal of Biomedical Informatics
M1 - 103437
ER -