Adverse drug reaction detection on social media with deep linguistic features

Ying Zhang; Shaoze Cui; Huiying Gao

doi:10.1016/j.jbi.2020.103437

Adverse drug reaction detection on social media with deep linguistic features

Ying Zhang^*, Shaoze Cui, Huiying Gao

^*此作品的通讯作者

管理学院

科研成果: 期刊稿件 › 文章 › 同行评审

31 引用（Scopus）

摘要

Adverse reactions caused by drugs are one of the most important public health problems. Social media has encouraged more patients to share their drug use experiences and has become a major source for the detection of professionally unreported adverse drug reactions (ADRs). Since a large number of user posts do not mention any ADR, accurate detection of the presence of ADRs in each user post is necessary before further research can be conducted. Previous feature-based methods focus on extracting more shallow linguistic features that are unable to capture deep and subtle information in the context, ultimately failing to provide satisfactory accuracy. To overcome the limitations of previous studies, this paper proposes a novel method that can extract deep linguistic features and then combine them with shallow linguistic features for ADR detection. We first extract predicate-ADR pairs under the guidance of extended syntactic dependencies and ADR lexicon. Then, we extract semantic and part-of-speech (POS) features for each pair and pool the features of different pairs to generate a holistic representation of deep linguistic features. Finally, we use the collection of deep features and several shallow features to train the predictive models. A series of experiments are performed on data sets collected from DailyStrength and Twitter. Our approach can achieve AUCs of 94.44% and 88.97% on the two data sets, respectively, outperforming other state-of-the-art methods. The results demonstrate the potential benefits of deep linguistic features for ADR detection on social data. This method can be applied to multiple other healthcare and text analysis tasks and can be used to support pharmacovigilance research.

源语言	英语
文章编号	103437
期刊	Journal of Biomedical Informatics
卷	106
DOI	https://doi.org/10.1016/j.jbi.2020.103437
出版状态	已出版 - 6月 2020

联合国可持续发展目标

此成果有助于实现下列可持续发展目标：

访问文件

10.1016/j.jbi.2020.103437

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{7bbbe56f3cd2469db26a7d312d11f0a5,

title = "Adverse drug reaction detection on social media with deep linguistic features",

abstract = "Adverse reactions caused by drugs are one of the most important public health problems. Social media has encouraged more patients to share their drug use experiences and has become a major source for the detection of professionally unreported adverse drug reactions (ADRs). Since a large number of user posts do not mention any ADR, accurate detection of the presence of ADRs in each user post is necessary before further research can be conducted. Previous feature-based methods focus on extracting more shallow linguistic features that are unable to capture deep and subtle information in the context, ultimately failing to provide satisfactory accuracy. To overcome the limitations of previous studies, this paper proposes a novel method that can extract deep linguistic features and then combine them with shallow linguistic features for ADR detection. We first extract predicate-ADR pairs under the guidance of extended syntactic dependencies and ADR lexicon. Then, we extract semantic and part-of-speech (POS) features for each pair and pool the features of different pairs to generate a holistic representation of deep linguistic features. Finally, we use the collection of deep features and several shallow features to train the predictive models. A series of experiments are performed on data sets collected from DailyStrength and Twitter. Our approach can achieve AUCs of 94.44% and 88.97% on the two data sets, respectively, outperforming other state-of-the-art methods. The results demonstrate the potential benefits of deep linguistic features for ADR detection on social data. This method can be applied to multiple other healthcare and text analysis tasks and can be used to support pharmacovigilance research.",

keywords = "Adverse drug reactions, Deep linguistic features, Feature-based method, Pharmacovigilance, Social media",

author = "Ying Zhang and Shaoze Cui and Huiying Gao",

note = "Publisher Copyright: {\textcopyright} 2020 Elsevier Inc.",

year = "2020",

month = jun,

doi = "10.1016/j.jbi.2020.103437",

language = "English",

volume = "106",

journal = "Journal of Biomedical Informatics",

issn = "1532-0464",

publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - Adverse drug reaction detection on social media with deep linguistic features

AU - Zhang, Ying

AU - Cui, Shaoze

AU - Gao, Huiying

PY - 2020/6

Y1 - 2020/6

N2 - Adverse reactions caused by drugs are one of the most important public health problems. Social media has encouraged more patients to share their drug use experiences and has become a major source for the detection of professionally unreported adverse drug reactions (ADRs). Since a large number of user posts do not mention any ADR, accurate detection of the presence of ADRs in each user post is necessary before further research can be conducted. Previous feature-based methods focus on extracting more shallow linguistic features that are unable to capture deep and subtle information in the context, ultimately failing to provide satisfactory accuracy. To overcome the limitations of previous studies, this paper proposes a novel method that can extract deep linguistic features and then combine them with shallow linguistic features for ADR detection. We first extract predicate-ADR pairs under the guidance of extended syntactic dependencies and ADR lexicon. Then, we extract semantic and part-of-speech (POS) features for each pair and pool the features of different pairs to generate a holistic representation of deep linguistic features. Finally, we use the collection of deep features and several shallow features to train the predictive models. A series of experiments are performed on data sets collected from DailyStrength and Twitter. Our approach can achieve AUCs of 94.44% and 88.97% on the two data sets, respectively, outperforming other state-of-the-art methods. The results demonstrate the potential benefits of deep linguistic features for ADR detection on social data. This method can be applied to multiple other healthcare and text analysis tasks and can be used to support pharmacovigilance research.

AB - Adverse reactions caused by drugs are one of the most important public health problems. Social media has encouraged more patients to share their drug use experiences and has become a major source for the detection of professionally unreported adverse drug reactions (ADRs). Since a large number of user posts do not mention any ADR, accurate detection of the presence of ADRs in each user post is necessary before further research can be conducted. Previous feature-based methods focus on extracting more shallow linguistic features that are unable to capture deep and subtle information in the context, ultimately failing to provide satisfactory accuracy. To overcome the limitations of previous studies, this paper proposes a novel method that can extract deep linguistic features and then combine them with shallow linguistic features for ADR detection. We first extract predicate-ADR pairs under the guidance of extended syntactic dependencies and ADR lexicon. Then, we extract semantic and part-of-speech (POS) features for each pair and pool the features of different pairs to generate a holistic representation of deep linguistic features. Finally, we use the collection of deep features and several shallow features to train the predictive models. A series of experiments are performed on data sets collected from DailyStrength and Twitter. Our approach can achieve AUCs of 94.44% and 88.97% on the two data sets, respectively, outperforming other state-of-the-art methods. The results demonstrate the potential benefits of deep linguistic features for ADR detection on social data. This method can be applied to multiple other healthcare and text analysis tasks and can be used to support pharmacovigilance research.

KW - Adverse drug reactions

KW - Deep linguistic features

KW - Feature-based method

KW - Pharmacovigilance

KW - Social media

UR - http://www.scopus.com/inward/record.url?scp=85084927040&partnerID=8YFLogxK

U2 - 10.1016/j.jbi.2020.103437

DO - 10.1016/j.jbi.2020.103437

M3 - Article

C2 - 32360987

AN - SCOPUS:85084927040

SN - 1532-0464

VL - 106

JO - Journal of Biomedical Informatics

JF - Journal of Biomedical Informatics

M1 - 103437

ER -

Adverse drug reaction detection on social media with deep linguistic features

摘要

联合国可持续发展目标

访问文件

其它文件与链接

指纹

引用此