TY - JOUR
T1 - Unraveling Scientific Evolutionary Paths
T2 - An Embedding-Based Topic Analysis
AU - Jin, Qianqian
AU - Chen, Hongshu
AU - Zhang, Yi
AU - Wang, Xuefeng
AU - Zhu, Donghua
N1 - Publisher Copyright:
© 1988-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Understanding the evolution of knowledge has been and will continue to be the key task of science, technology, and innovation management. Existing research on evolutionary path identification relies primarily on traditional co-occurrence analysis and bag-of-words (BOW)-based models for topic extraction. However, these approaches have limitations in effectively capturing the underlying semantics and linkages of the topics. In this article, we propose a novel embedding-based methodology for scientific evolution analysis, in which word embedding, document embedding, clustering, and network analysis are applied to extract topics, measure topical semantic similarities, and quantitatively distinguish topics' evolutionary states. We first perform benchmark experiments to demonstrate that doc2vec generally outperforms the BOW-based models in topic extraction before evolution analysis. We then consider topic consistency in vector spaces to identify evolutionary states including newborn, convergence, inheritance, and extinction. Scientific evolutionary paths are finally unraveled based on topic similarity matrixes and evolutionary states. We conduct a case study on object detection research to validate the effectiveness of our methodology. The empirical results, validated by domain experts, demonstrate that the proposed methodology is capable of effectively revealing patterns of knowledge inheritance and integration. Consequently, this methodology can be used to improve decision-making processes in future innovation management.
AB - Understanding the evolution of knowledge has been and will continue to be the key task of science, technology, and innovation management. Existing research on evolutionary path identification relies primarily on traditional co-occurrence analysis and bag-of-words (BOW)-based models for topic extraction. However, these approaches have limitations in effectively capturing the underlying semantics and linkages of the topics. In this article, we propose a novel embedding-based methodology for scientific evolution analysis, in which word embedding, document embedding, clustering, and network analysis are applied to extract topics, measure topical semantic similarities, and quantitatively distinguish topics' evolutionary states. We first perform benchmark experiments to demonstrate that doc2vec generally outperforms the BOW-based models in topic extraction before evolution analysis. We then consider topic consistency in vector spaces to identify evolutionary states including newborn, convergence, inheritance, and extinction. Scientific evolutionary paths are finally unraveled based on topic similarity matrixes and evolutionary states. We conduct a case study on object detection research to validate the effectiveness of our methodology. The empirical results, validated by domain experts, demonstrate that the proposed methodology is capable of effectively revealing patterns of knowledge inheritance and integration. Consequently, this methodology can be used to improve decision-making processes in future innovation management.
KW - Doc2vec
KW - embedding
KW - evolution analysis
KW - evolutionary paths
KW - topic extraction
KW - word2vec
UR - http://www.scopus.com/inward/record.url?scp=85174835965&partnerID=8YFLogxK
U2 - 10.1109/TEM.2023.3312923
DO - 10.1109/TEM.2023.3312923
M3 - Article
AN - SCOPUS:85174835965
SN - 0018-9391
VL - 71
SP - 8964
EP - 8978
JO - IEEE Transactions on Engineering Management
JF - IEEE Transactions on Engineering Management
ER -