中文文本蕴含气象灾害事件信息多模型融合抽取方法

Duanmu Hu, Wu Yuan, Fangqu Niu, Wen Yuan*, Aiai Han

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

4 引用 (Scopus)

摘要

With global warming, the frequency of extreme weather events and major meteorological disasters is increasing globally. It is important to study the relationship between climate change and the frequency of meteorological disasters for disaster prevention and mitigation in the context of climate change. In this paper, a method is proposed for automatic extraction of spatial and temporal events of meteorological disasters based on natural language processing technology. Because there is a huge amount of spatial and temporal information of meteorological disasters available in literature and web data. Specifically, (1) A coarse-to-fine method was proposed to build a training corpus of meteorological disaster annotations based on professional literature. Firstly, a unified meteorological disaster knowledge system oriented to textual events is constructed to address the problems of ambiguity and incompatibility of different literature materials. Then a coarse annotation method based on chapter structure was constructed, and a Labeled LDA model-based and a fine-grained annotated corpus screening method based on TF-IDF and N-gram models were developed for long texts (modern texts) and short texts (literary texts), respectively, solving the problem of rapid corpus construction; (2) A method for automatic classification of spatiotemporal events of meteorological disasters based on the BERT-CNN model, which integrates contextual semantic features and local semantic features at multiple granularities, was developed for the integrated processing of short and long texts; (3) Using this method, the spatiotemporal events of meteorological disasters were automatically extracted from the textual and web data, and their macro F1 values reached 89.09% and 80.06%, respectively. The spatiotemporal distributions of major events of meteorological disasters were highly correlated with professional statistics; (4) Based on the above results, the spatiotemporal evolution of disasters in various historical periods in China was also reconstructed. We found that the overall volume of disaster data in each period showed a gradual increasing trend, with heavy rainfall disasters, floods, and droughts being the main types of disasters in China. Our method enables both the automatic extraction of long text events from the web and the automatic detection of short text events from literatures, providing a new technique for application of text data to meteorological disaster research and monitoring.

投稿的翻译标题Multi-model Fusion Extraction Method for Chinese Text Implicative Meteorological Disasters Event Information
源语言繁体中文
页(从-至)2342-2355
页数14
期刊Journal of Geo-Information Science
24
12
DOI
出版状态已出版 - 25 12月 2022

关键词

  • BERT-CNN models
  • Corpora
  • Event extraction
  • Knowledge systems
  • Meteorological disasters
  • Spatial and temporal events
  • Text classification

指纹

探究 '中文文本蕴含气象灾害事件信息多模型融合抽取方法' 的科研主题。它们共同构成独一无二的指纹。

引用此