XTime: A general rule-based method for time expression recognition and normalization

Xiaoshi Zhong; Chenyu Jin; Mengyu An; Erik Cambria

doi:10.1016/j.knosys.2024.111921

XTime: A general rule-based method for time expression recognition and normalization

Xiaoshi Zhong^*, Chenyu Jin, Mengyu An, Erik Cambria

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Time expression (a.k.a., timex) recognition and normalization (TERN) is a crucial task for downstream research. However, previous studies have overlooked the critical characteristics of timexes that significantly impact the task. To gain deeper insights, we conduct an analysis across four diverse English datasets to examine the key attributes of timex constituents. Our analysis reveals several noteworthy observations, such as: timexes tend to very short; the majority of timexes contain time tokens; there exist strong mapping relationships between time tokens and timex types; there exists a priority relationship among timex types; and timex values exhibit only some standard formats. Based on these insights, we propose a novel general rule-based method termed XTime¹ to recognize timexes from free text and normalize them into standard formats. Notably, XTime's rules are designed in a general and heuristic manner, enabling its independence of diverse domains and text types. Experimental evaluations conducted on both in-domain and out-of-domain English datasets demonstrate that XTime consistently outperforms or performs comparably to representative state-of-the-art methods.

源语言	英语
文章编号	111921
期刊	Knowledge-Based Systems
卷	297
DOI	https://doi.org/10.1016/j.knosys.2024.111921
出版状态	已出版 - 3 8月 2024

访问文件

10.1016/j.knosys.2024.111921

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{928c95b178f14363b20457b8f9b1df8c,

title = "XTime: A general rule-based method for time expression recognition and normalization",

abstract = "Time expression (a.k.a., timex) recognition and normalization (TERN) is a crucial task for downstream research. However, previous studies have overlooked the critical characteristics of timexes that significantly impact the task. To gain deeper insights, we conduct an analysis across four diverse English datasets to examine the key attributes of timex constituents. Our analysis reveals several noteworthy observations, such as: timexes tend to very short; the majority of timexes contain time tokens; there exist strong mapping relationships between time tokens and timex types; there exists a priority relationship among timex types; and timex values exhibit only some standard formats. Based on these insights, we propose a novel general rule-based method termed XTime1 to recognize timexes from free text and normalize them into standard formats. Notably, XTime's rules are designed in a general and heuristic manner, enabling its independence of diverse domains and text types. Experimental evaluations conducted on both in-domain and out-of-domain English datasets demonstrate that XTime consistently outperforms or performs comparably to representative state-of-the-art methods.",

keywords = "Mapping relations, Priority relationship, Time expression (timex), Time expression recognition and normalization (TERN), Token triples, Token types",

author = "Xiaoshi Zhong and Chenyu Jin and Mengyu An and Erik Cambria",

note = "Publisher Copyright: {\textcopyright} 2024 Elsevier B.V.",

year = "2024",

month = aug,

day = "3",

doi = "10.1016/j.knosys.2024.111921",

language = "English",

volume = "297",

journal = "Knowledge-Based Systems",

issn = "0950-7051",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - XTime

T2 - A general rule-based method for time expression recognition and normalization

AU - Zhong, Xiaoshi

AU - Jin, Chenyu

AU - An, Mengyu

AU - Cambria, Erik

PY - 2024/8/3

Y1 - 2024/8/3

N2 - Time expression (a.k.a., timex) recognition and normalization (TERN) is a crucial task for downstream research. However, previous studies have overlooked the critical characteristics of timexes that significantly impact the task. To gain deeper insights, we conduct an analysis across four diverse English datasets to examine the key attributes of timex constituents. Our analysis reveals several noteworthy observations, such as: timexes tend to very short; the majority of timexes contain time tokens; there exist strong mapping relationships between time tokens and timex types; there exists a priority relationship among timex types; and timex values exhibit only some standard formats. Based on these insights, we propose a novel general rule-based method termed XTime1 to recognize timexes from free text and normalize them into standard formats. Notably, XTime's rules are designed in a general and heuristic manner, enabling its independence of diverse domains and text types. Experimental evaluations conducted on both in-domain and out-of-domain English datasets demonstrate that XTime consistently outperforms or performs comparably to representative state-of-the-art methods.

AB - Time expression (a.k.a., timex) recognition and normalization (TERN) is a crucial task for downstream research. However, previous studies have overlooked the critical characteristics of timexes that significantly impact the task. To gain deeper insights, we conduct an analysis across four diverse English datasets to examine the key attributes of timex constituents. Our analysis reveals several noteworthy observations, such as: timexes tend to very short; the majority of timexes contain time tokens; there exist strong mapping relationships between time tokens and timex types; there exists a priority relationship among timex types; and timex values exhibit only some standard formats. Based on these insights, we propose a novel general rule-based method termed XTime1 to recognize timexes from free text and normalize them into standard formats. Notably, XTime's rules are designed in a general and heuristic manner, enabling its independence of diverse domains and text types. Experimental evaluations conducted on both in-domain and out-of-domain English datasets demonstrate that XTime consistently outperforms or performs comparably to representative state-of-the-art methods.

KW - Mapping relations

KW - Priority relationship

KW - Time expression (timex)

KW - Time expression recognition and normalization (TERN)

KW - Token triples

KW - Token types

UR - http://www.scopus.com/inward/record.url?scp=85193630483&partnerID=8YFLogxK

U2 - 10.1016/j.knosys.2024.111921

DO - 10.1016/j.knosys.2024.111921

M3 - Article

AN - SCOPUS:85193630483

SN - 0950-7051

VL - 297

JO - Knowledge-Based Systems

JF - Knowledge-Based Systems

M1 - 111921

ER -

XTime: A general rule-based method for time expression recognition and normalization

摘要

访问文件

其它文件与链接

指纹

引用此