TY - JOUR
T1 - Mining medical related temporal information from patients’ self-description
AU - Zhu, Lichao
AU - Yang, Hangzhou
AU - Yan, Zhijun
N1 - Publisher Copyright:
© 2017, Lichao Zhu, Hangzhou Yang and Zhijun Yan.
PY - 2017/6/12
Y1 - 2017/6/12
N2 - Purpose: The purpose of this paper is to develop a new method to extract medical temporal information from online health communities. Design/methodology/approach: The authors trained a conditional random-filed model for the extraction of temporal expressions. The temporal relation identification is considered as a classification task and several support vector machine classifiers are built in the proposed method. For the model training, the authors extracted some high-level semantic features including co-reference relationship of medical concepts and the semantic similarity among words. Findings: For the extraction of TIMEX, the authors find that well-formatted expressions are easy to recognize, and the main challenge is the relative TIMEX such as “three days after onset”. It also shows the same difficulty for normalization of absolute date or well-formatted duration, whereas frequency is easier to be normalized. For the identification of DocTimeRel, the result is fairly well, and the relation is difficult to identify when it involves a relative TIMEX or a hypothetical concept. Originality/value: The authors proposed a new method to extract temporal information from the online clinical data and evaluated the usefulness of different level of syntactic features in this task.
AB - Purpose: The purpose of this paper is to develop a new method to extract medical temporal information from online health communities. Design/methodology/approach: The authors trained a conditional random-filed model for the extraction of temporal expressions. The temporal relation identification is considered as a classification task and several support vector machine classifiers are built in the proposed method. For the model training, the authors extracted some high-level semantic features including co-reference relationship of medical concepts and the semantic similarity among words. Findings: For the extraction of TIMEX, the authors find that well-formatted expressions are easy to recognize, and the main challenge is the relative TIMEX such as “three days after onset”. It also shows the same difficulty for normalization of absolute date or well-formatted duration, whereas frequency is easier to be normalized. For the identification of DocTimeRel, the result is fairly well, and the relation is difficult to identify when it involves a relative TIMEX or a hypothetical concept. Originality/value: The authors proposed a new method to extract temporal information from the online clinical data and evaluated the usefulness of different level of syntactic features in this task.
KW - Co-reference
KW - Conditional random field
KW - Support vector machine
KW - Temporal information extraction
KW - Word embedding
UR - http://www.scopus.com/inward/record.url?scp=85123059725&partnerID=8YFLogxK
U2 - 10.1108/IJCS-08-2017-0018
DO - 10.1108/IJCS-08-2017-0018
M3 - Article
AN - SCOPUS:85123059725
SN - 2398-7294
VL - 1
SP - 110
EP - 120
JO - International Journal of Crowd Science
JF - International Journal of Crowd Science
IS - 2
ER -