Mining medical related temporal information from patients’ self-description

Lichao Zhu; Hangzhou Yang; Zhijun Yan

doi:10.1108/IJCS-08-2017-0018

Mining medical related temporal information from patients’ self-description

Lichao Zhu, Hangzhou Yang, Zhijun Yan^*

^*此作品的通讯作者

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

Purpose: The purpose of this paper is to develop a new method to extract medical temporal information from online health communities. Design/methodology/approach: The authors trained a conditional random-filed model for the extraction of temporal expressions. The temporal relation identification is considered as a classification task and several support vector machine classifiers are built in the proposed method. For the model training, the authors extracted some high-level semantic features including co-reference relationship of medical concepts and the semantic similarity among words. Findings: For the extraction of TIMEX, the authors find that well-formatted expressions are easy to recognize, and the main challenge is the relative TIMEX such as “three days after onset”. It also shows the same difficulty for normalization of absolute date or well-formatted duration, whereas frequency is easier to be normalized. For the identification of DocTimeRel, the result is fairly well, and the relation is difficult to identify when it involves a relative TIMEX or a hypothetical concept. Originality/value: The authors proposed a new method to extract temporal information from the online clinical data and evaluated the usefulness of different level of syntactic features in this task.

源语言	英语
页（从-至）	110-120
页数	11
期刊	International Journal of Crowd Science
卷	1
期	2
DOI	https://doi.org/10.1108/IJCS-08-2017-0018
出版状态	已出版 - 12 6月 2017

访问文件

10.1108/IJCS-08-2017-0018

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{c7823e40d38b48589388d1cd8bebb444,

title = "Mining medical related temporal information from patients{\textquoteright} self-description",

abstract = "Purpose: The purpose of this paper is to develop a new method to extract medical temporal information from online health communities. Design/methodology/approach: The authors trained a conditional random-filed model for the extraction of temporal expressions. The temporal relation identification is considered as a classification task and several support vector machine classifiers are built in the proposed method. For the model training, the authors extracted some high-level semantic features including co-reference relationship of medical concepts and the semantic similarity among words. Findings: For the extraction of TIMEX, the authors find that well-formatted expressions are easy to recognize, and the main challenge is the relative TIMEX such as “three days after onset”. It also shows the same difficulty for normalization of absolute date or well-formatted duration, whereas frequency is easier to be normalized. For the identification of DocTimeRel, the result is fairly well, and the relation is difficult to identify when it involves a relative TIMEX or a hypothetical concept. Originality/value: The authors proposed a new method to extract temporal information from the online clinical data and evaluated the usefulness of different level of syntactic features in this task.",

keywords = "Co-reference, Conditional random field, Support vector machine, Temporal information extraction, Word embedding",

author = "Lichao Zhu and Hangzhou Yang and Zhijun Yan",

note = "Publisher Copyright: {\textcopyright} 2017, Lichao Zhu, Hangzhou Yang and Zhijun Yan.",

year = "2017",

month = jun,

day = "12",

doi = "10.1108/IJCS-08-2017-0018",

language = "English",

volume = "1",

pages = "110--120",

journal = "International Journal of Crowd Science",

issn = "2398-7294",

publisher = "Emerald Group Publishing Ltd.",

number = "2",

}

TY - JOUR

T1 - Mining medical related temporal information from patients’ self-description

AU - Zhu, Lichao

AU - Yang, Hangzhou

AU - Yan, Zhijun

PY - 2017/6/12

Y1 - 2017/6/12

N2 - Purpose: The purpose of this paper is to develop a new method to extract medical temporal information from online health communities. Design/methodology/approach: The authors trained a conditional random-filed model for the extraction of temporal expressions. The temporal relation identification is considered as a classification task and several support vector machine classifiers are built in the proposed method. For the model training, the authors extracted some high-level semantic features including co-reference relationship of medical concepts and the semantic similarity among words. Findings: For the extraction of TIMEX, the authors find that well-formatted expressions are easy to recognize, and the main challenge is the relative TIMEX such as “three days after onset”. It also shows the same difficulty for normalization of absolute date or well-formatted duration, whereas frequency is easier to be normalized. For the identification of DocTimeRel, the result is fairly well, and the relation is difficult to identify when it involves a relative TIMEX or a hypothetical concept. Originality/value: The authors proposed a new method to extract temporal information from the online clinical data and evaluated the usefulness of different level of syntactic features in this task.

AB - Purpose: The purpose of this paper is to develop a new method to extract medical temporal information from online health communities. Design/methodology/approach: The authors trained a conditional random-filed model for the extraction of temporal expressions. The temporal relation identification is considered as a classification task and several support vector machine classifiers are built in the proposed method. For the model training, the authors extracted some high-level semantic features including co-reference relationship of medical concepts and the semantic similarity among words. Findings: For the extraction of TIMEX, the authors find that well-formatted expressions are easy to recognize, and the main challenge is the relative TIMEX such as “three days after onset”. It also shows the same difficulty for normalization of absolute date or well-formatted duration, whereas frequency is easier to be normalized. For the identification of DocTimeRel, the result is fairly well, and the relation is difficult to identify when it involves a relative TIMEX or a hypothetical concept. Originality/value: The authors proposed a new method to extract temporal information from the online clinical data and evaluated the usefulness of different level of syntactic features in this task.

KW - Co-reference

KW - Conditional random field

KW - Support vector machine

KW - Temporal information extraction

KW - Word embedding

UR - http://www.scopus.com/inward/record.url?scp=85123059725&partnerID=8YFLogxK

U2 - 10.1108/IJCS-08-2017-0018

DO - 10.1108/IJCS-08-2017-0018

M3 - Article

AN - SCOPUS:85123059725

SN - 2398-7294

VL - 1

SP - 110

EP - 120

JO - International Journal of Crowd Science

JF - International Journal of Crowd Science

IS - 2

ER -

Mining medical related temporal information from patients’ self-description

摘要

访问文件

其它文件与链接

指纹

引用此