TY - JOUR
T1 - Automated Construction and Mining of Text-Based Modern Chinese Character Databases
T2 - A Case Study of Fujian
AU - Jian, Xueyan
AU - Yuan, Wen
AU - Yuan, Wu
AU - Gao, Xinqi
AU - Wang, Rong
N1 - Publisher Copyright:
© 2025 by the authors.
PY - 2025/4
Y1 - 2025/4
N2 - Historical figures are crucial for understanding historical processes and social changes. However, existing databases of historical figures primarily focused on ancient Chinese individuals and are limited by the simplistic organization of textual information, lacking structured processing. Therefore, this study proposes an automatic method for constructing a spatio-temporal database of modern Chinese figures. The character state transition matrix reveals the spatio-temporal evolution of historical figures, while the random walk algorithm identifies their primary migration patterns. Using historical figures from Fujian Province (1840–2009) as a case study, the results demonstrate that this method effectively constructs the spatio-temporal chain of figures, encompassing time, space, and events. The character state transition matrix indicates a fluctuating trend of state change from 1840 to 2009, initially increasing and then decreasing. By applying keyword extraction and the random walk method, this study finds that the state transitions and their causes align with the historical trends. The four-dimensional analytical framework of “character-time-space-event” established in this study holds significant value for the field of digital humanities.
AB - Historical figures are crucial for understanding historical processes and social changes. However, existing databases of historical figures primarily focused on ancient Chinese individuals and are limited by the simplistic organization of textual information, lacking structured processing. Therefore, this study proposes an automatic method for constructing a spatio-temporal database of modern Chinese figures. The character state transition matrix reveals the spatio-temporal evolution of historical figures, while the random walk algorithm identifies their primary migration patterns. Using historical figures from Fujian Province (1840–2009) as a case study, the results demonstrate that this method effectively constructs the spatio-temporal chain of figures, encompassing time, space, and events. The character state transition matrix indicates a fluctuating trend of state change from 1840 to 2009, initially increasing and then decreasing. By applying keyword extraction and the random walk method, this study finds that the state transitions and their causes align with the historical trends. The four-dimensional analytical framework of “character-time-space-event” established in this study holds significant value for the field of digital humanities.
KW - character information
KW - data mining
KW - digital humanities
KW - spatio-temporal data
UR - http://www.scopus.com/inward/record.url?scp=105003539029&partnerID=8YFLogxK
U2 - 10.3390/info16040324
DO - 10.3390/info16040324
M3 - Article
AN - SCOPUS:105003539029
SN - 2078-2489
VL - 16
JO - Information (Switzerland)
JF - Information (Switzerland)
IS - 4
M1 - 324
ER -