Time expression analysis and recognition using syntactic token types and general heuristic rules

Xiaoshi Zhong; Aixin Sun; Erik Cambria

doi:10.18653/v1/P17-1039

Time expression analysis and recognition using syntactic token types and general heuristic rules

Xiaoshi Zhong, Aixin Sun, Erik Cambria

Nanyang Technological University

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

58 引用（Scopus）

摘要

Extracting time expressions from free text is a fundamental task for many applications. We analyze time expressions from four different datasets and find that only a small group of words are used to express time information and that the words in time expressions demonstrate similar syntactic behaviour. Based on the findings, we propose a type-based approach named SynTime¹ for time expression recognition. Specifically, we define three main syntactic token types, namely time token, modifier, and numeral, to group time-related token regular expressions. On the types we design general heuristic rules to recognize time expressions. In recognition, SynTime first identifies time tokens from raw text, then searches their surroundings for modifiers and numerals to form time segments, and finally merges the time segments to time expressions. As a lightweight rule-based tagger, SynTime runs in real time, and can be easily expanded by simply adding keywords for the text from different domains and different text types. Experiments on benchmark datasets and tweets data show that SynTime outperforms state-of-the-art methods.

源语言	英语
主期刊名	ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
出版商	Association for Computational Linguistics (ACL)
页	420-429
页数	10
ISBN（电子版）	9781945626753
DOI	https://doi.org/10.18653/v1/P17-1039
出版状态	已出版 - 2017
已对外发布	是
活动	55th Annual Meeting of the Association for Computational Linguistics, ACL 2017 - Vancouver, 加拿大期限: 30 7月 2017 → 4 8月 2017

出版系列

姓名	ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
卷	1

会议

会议	55th Annual Meeting of the Association for Computational Linguistics, ACL 2017
国家/地区	加拿大
市	Vancouver
时期	30/07/17 → 4/08/17

访问文件

10.18653/v1/P17-1039

其它文件与链接

链接到 Scopus 的出版物

引用此

Zhong, X., Sun, A., & Cambria, E. (2017). Time expression analysis and recognition using syntactic token types and general heuristic rules. 在 ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) (页码 420-429). (ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers); 卷 1). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/P17-1039

Zhong, Xiaoshi ; Sun, Aixin ; Cambria, Erik. / Time expression analysis and recognition using syntactic token types and general heuristic rules. ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers). Association for Computational Linguistics (ACL), 2017. 页码 420-429 (ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)).

@inproceedings{531b61a15fcd4736a75ddcdb151ae0fe,

title = "Time expression analysis and recognition using syntactic token types and general heuristic rules",

abstract = "Extracting time expressions from free text is a fundamental task for many applications. We analyze time expressions from four different datasets and find that only a small group of words are used to express time information and that the words in time expressions demonstrate similar syntactic behaviour. Based on the findings, we propose a type-based approach named SynTime1 for time expression recognition. Specifically, we define three main syntactic token types, namely time token, modifier, and numeral, to group time-related token regular expressions. On the types we design general heuristic rules to recognize time expressions. In recognition, SynTime first identifies time tokens from raw text, then searches their surroundings for modifiers and numerals to form time segments, and finally merges the time segments to time expressions. As a lightweight rule-based tagger, SynTime runs in real time, and can be easily expanded by simply adding keywords for the text from different domains and different text types. Experiments on benchmark datasets and tweets data show that SynTime outperforms state-of-the-art methods.",

author = "Xiaoshi Zhong and Aixin Sun and Erik Cambria",

note = "Publisher Copyright: {\textcopyright} 2017 Association for Computational Linguistics.; 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017 ; Conference date: 30-07-2017 Through 04-08-2017",

year = "2017",

doi = "10.18653/v1/P17-1039",

language = "English",

series = "ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)",

publisher = "Association for Computational Linguistics (ACL)",

pages = "420--429",

booktitle = "ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)",

address = "United States",

}

Zhong, X, Sun, A & Cambria, E 2017, Time expression analysis and recognition using syntactic token types and general heuristic rules. 在 ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers). ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers), 卷 1, Association for Computational Linguistics (ACL), 页码 420-429, 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, 加拿大, 30/07/17. https://doi.org/10.18653/v1/P17-1039

Time expression analysis and recognition using syntactic token types and general heuristic rules. / Zhong, Xiaoshi; Sun, Aixin; Cambria, Erik.
ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers). Association for Computational Linguistics (ACL), 2017. 页码 420-429 (ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers); 卷 1).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Time expression analysis and recognition using syntactic token types and general heuristic rules

AU - Zhong, Xiaoshi

AU - Sun, Aixin

AU - Cambria, Erik

PY - 2017

Y1 - 2017

N2 - Extracting time expressions from free text is a fundamental task for many applications. We analyze time expressions from four different datasets and find that only a small group of words are used to express time information and that the words in time expressions demonstrate similar syntactic behaviour. Based on the findings, we propose a type-based approach named SynTime1 for time expression recognition. Specifically, we define three main syntactic token types, namely time token, modifier, and numeral, to group time-related token regular expressions. On the types we design general heuristic rules to recognize time expressions. In recognition, SynTime first identifies time tokens from raw text, then searches their surroundings for modifiers and numerals to form time segments, and finally merges the time segments to time expressions. As a lightweight rule-based tagger, SynTime runs in real time, and can be easily expanded by simply adding keywords for the text from different domains and different text types. Experiments on benchmark datasets and tweets data show that SynTime outperforms state-of-the-art methods.

AB - Extracting time expressions from free text is a fundamental task for many applications. We analyze time expressions from four different datasets and find that only a small group of words are used to express time information and that the words in time expressions demonstrate similar syntactic behaviour. Based on the findings, we propose a type-based approach named SynTime1 for time expression recognition. Specifically, we define three main syntactic token types, namely time token, modifier, and numeral, to group time-related token regular expressions. On the types we design general heuristic rules to recognize time expressions. In recognition, SynTime first identifies time tokens from raw text, then searches their surroundings for modifiers and numerals to form time segments, and finally merges the time segments to time expressions. As a lightweight rule-based tagger, SynTime runs in real time, and can be easily expanded by simply adding keywords for the text from different domains and different text types. Experiments on benchmark datasets and tweets data show that SynTime outperforms state-of-the-art methods.

UR - http://www.scopus.com/inward/record.url?scp=85038556598&partnerID=8YFLogxK

U2 - 10.18653/v1/P17-1039

DO - 10.18653/v1/P17-1039

M3 - Conference contribution

AN - SCOPUS:85038556598

T3 - ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)

SP - 420

EP - 429

BT - ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)

PB - Association for Computational Linguistics (ACL)

T2 - 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017

Y2 - 30 July 2017 through 4 August 2017

ER -

Zhong X, Sun A, Cambria E. Time expression analysis and recognition using syntactic token types and general heuristic rules. 在 ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers). Association for Computational Linguistics (ACL). 2017. 页码 420-429. (ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)). doi: 10.18653/v1/P17-1039

Time expression analysis and recognition using syntactic token types and general heuristic rules

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此