TY - GEN
T1 - Time expression recognition using a constituent-based tagging scheme
AU - Zhong, Xiaoshi
AU - Cambria, Erik
N1 - Publisher Copyright:
© 2018 IW3C2 (International World Wide Web Conference Committee), published under Creative Commons CC BY 4.0 License.
PY - 2018/4/10
Y1 - 2018/4/10
N2 - We find from four datasets that time expressions are formed by loose structure and the words used to express time information can differentiate time expressions from common text. The findings drive us to design a learning method named TOMN to model time expressions. TOMN defines a time-related tagging scheme named TOMN scheme with four tags, namely \tomnT,\tomnO, \tomnM,and \tomnN, indicating the constituents of time expression, namely \tomnT ime token, \tomnM odifier, \tomnN umeral, and the words \tomnO utside time expression. In modeling, TOMN assigns a word with a TOMN tag under conditional random fields with minimal features. Essentially, our constituent-based TOMN scheme overcomes the problem of inconsistent tag assignment that is caused by the conventional position-based tagging schemes (\eg BIO scheme and BILOU scheme). Experiments show that TOMN is equally or more effective than state-of-the-art methods on various datasets, and much more robust on cross-datasets. Moreover, our analysis can explain many empirical observations in other works about time expression recognition and named entity recognition.
AB - We find from four datasets that time expressions are formed by loose structure and the words used to express time information can differentiate time expressions from common text. The findings drive us to design a learning method named TOMN to model time expressions. TOMN defines a time-related tagging scheme named TOMN scheme with four tags, namely \tomnT,\tomnO, \tomnM,and \tomnN, indicating the constituents of time expression, namely \tomnT ime token, \tomnM odifier, \tomnN umeral, and the words \tomnO utside time expression. In modeling, TOMN assigns a word with a TOMN tag under conditional random fields with minimal features. Essentially, our constituent-based TOMN scheme overcomes the problem of inconsistent tag assignment that is caused by the conventional position-based tagging schemes (\eg BIO scheme and BILOU scheme). Experiments show that TOMN is equally or more effective than state-of-the-art methods on various datasets, and much more robust on cross-datasets. Moreover, our analysis can explain many empirical observations in other works about time expression recognition and named entity recognition.
KW - Constituent-based tagging scheme
KW - Inconsistent tag assignment
KW - Named entity recognition
KW - Position-based tagging scheme
KW - Time expression recognition
UR - http://www.scopus.com/inward/record.url?scp=85071036794&partnerID=8YFLogxK
U2 - 10.1145/3178876.3185997
DO - 10.1145/3178876.3185997
M3 - Conference contribution
AN - SCOPUS:85071036794
T3 - The Web Conference 2018 - Proceedings of the World Wide Web Conference, WWW 2018
SP - 983
EP - 992
BT - The Web Conference 2018 - Proceedings of the World Wide Web Conference, WWW 2018
PB - Association for Computing Machinery, Inc
T2 - 27th International World Wide Web, WWW 2018
Y2 - 23 April 2018 through 27 April 2018
ER -