TY - JOUR
T1 - 基于 BERT‑BiGRU‑CRF 模型的岩土工程实体识别
AU - Quanyu, Wang
AU - Li, Zhenhua
AU - Tu, Zhipeng
AU - Chen, Guanyu
AU - Hu, Jun
AU - Chen, Jiaqi
AU - Chen, Jianjun
AU - Lv, Guobin
N1 - Publisher Copyright:
© 2023 China University of Geosciences. All rights reserved.
PY - 2023/8
Y1 - 2023/8
N2 - Geotechnical engineering named entity recognition is an important prerequisite and the work foundation for geotechnical information mining and knowledge Graph. Aiming at the recognition and classification of named entities in geotechnical texts, this article first designs and constructs a named entity corpus of geotechnical engineering according to Standard for Fundamental Terms of Geotechnical Engineering (GB/T 50279-2014) and other national industry standards; and based on deep learning technologies, a named entity recognition and classification deep learning model GENER is proposed for geotechnical engineering text. In GENER, the distributed representation learning of geotechnical engineering text features is realized based on the BERT pretrained language model; the geotechnical engineering text context feature encoding is achieved based on the BiGRU context coding layer; and based on the label decoding layer of CRF, the context features are decoded to generate the label sequence of geotechnical engineering named entity. Finally, based on the geotechnical engineering corpus, the GENER model is experimentally analyzed. comparing with other deep learning models for named entity recognition based on pretrained language models, the GENER model has better performance. The precision reaches 90.94%, the recall reaches 92.88%, the F1 - score reaches 91.89%and model training speed increased by 4.735% respectively.Experiments show that compared with BiLSTM-CRF and CNN-BiLSTM-CRF models, this model is more effective in small-scale corpus geotechnical engineering entity recognition.
AB - Geotechnical engineering named entity recognition is an important prerequisite and the work foundation for geotechnical information mining and knowledge Graph. Aiming at the recognition and classification of named entities in geotechnical texts, this article first designs and constructs a named entity corpus of geotechnical engineering according to Standard for Fundamental Terms of Geotechnical Engineering (GB/T 50279-2014) and other national industry standards; and based on deep learning technologies, a named entity recognition and classification deep learning model GENER is proposed for geotechnical engineering text. In GENER, the distributed representation learning of geotechnical engineering text features is realized based on the BERT pretrained language model; the geotechnical engineering text context feature encoding is achieved based on the BiGRU context coding layer; and based on the label decoding layer of CRF, the context features are decoded to generate the label sequence of geotechnical engineering named entity. Finally, based on the geotechnical engineering corpus, the GENER model is experimentally analyzed. comparing with other deep learning models for named entity recognition based on pretrained language models, the GENER model has better performance. The precision reaches 90.94%, the recall reaches 92.88%, the F1 - score reaches 91.89%and model training speed increased by 4.735% respectively.Experiments show that compared with BiLSTM-CRF and CNN-BiLSTM-CRF models, this model is more effective in small-scale corpus geotechnical engineering entity recognition.
KW - corpus
KW - deep learning
KW - geological bigdata
KW - geotechnical engineering
KW - named entity recognition
UR - http://www.scopus.com/inward/record.url?scp=85171573554&partnerID=8YFLogxK
U2 - 10.3799/dqkx.2022.462
DO - 10.3799/dqkx.2022.462
M3 - 文章
AN - SCOPUS:85171573554
SN - 1000-2383
VL - 48
SP - 3137
EP - 3150
JO - Diqiu Kexue - Zhongguo Dizhi Daxue Xuebao/Earth Science - Journal of China University of Geosciences
JF - Diqiu Kexue - Zhongguo Dizhi Daxue Xuebao/Earth Science - Journal of China University of Geosciences
IS - 8
ER -