TY - JOUR
T1 - XTime
T2 - A general rule-based method for time expression recognition and normalization
AU - Zhong, Xiaoshi
AU - Jin, Chenyu
AU - An, Mengyu
AU - Cambria, Erik
N1 - Publisher Copyright:
© 2024 Elsevier B.V.
PY - 2024/8/3
Y1 - 2024/8/3
N2 - Time expression (a.k.a., timex) recognition and normalization (TERN) is a crucial task for downstream research. However, previous studies have overlooked the critical characteristics of timexes that significantly impact the task. To gain deeper insights, we conduct an analysis across four diverse English datasets to examine the key attributes of timex constituents. Our analysis reveals several noteworthy observations, such as: timexes tend to very short; the majority of timexes contain time tokens; there exist strong mapping relationships between time tokens and timex types; there exists a priority relationship among timex types; and timex values exhibit only some standard formats. Based on these insights, we propose a novel general rule-based method termed XTime1 to recognize timexes from free text and normalize them into standard formats. Notably, XTime's rules are designed in a general and heuristic manner, enabling its independence of diverse domains and text types. Experimental evaluations conducted on both in-domain and out-of-domain English datasets demonstrate that XTime consistently outperforms or performs comparably to representative state-of-the-art methods.
AB - Time expression (a.k.a., timex) recognition and normalization (TERN) is a crucial task for downstream research. However, previous studies have overlooked the critical characteristics of timexes that significantly impact the task. To gain deeper insights, we conduct an analysis across four diverse English datasets to examine the key attributes of timex constituents. Our analysis reveals several noteworthy observations, such as: timexes tend to very short; the majority of timexes contain time tokens; there exist strong mapping relationships between time tokens and timex types; there exists a priority relationship among timex types; and timex values exhibit only some standard formats. Based on these insights, we propose a novel general rule-based method termed XTime1 to recognize timexes from free text and normalize them into standard formats. Notably, XTime's rules are designed in a general and heuristic manner, enabling its independence of diverse domains and text types. Experimental evaluations conducted on both in-domain and out-of-domain English datasets demonstrate that XTime consistently outperforms or performs comparably to representative state-of-the-art methods.
KW - Mapping relations
KW - Priority relationship
KW - Time expression (timex)
KW - Time expression recognition and normalization (TERN)
KW - Token triples
KW - Token types
UR - http://www.scopus.com/inward/record.url?scp=85193630483&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2024.111921
DO - 10.1016/j.knosys.2024.111921
M3 - Article
AN - SCOPUS:85193630483
SN - 0950-7051
VL - 297
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 111921
ER -