TY - JOUR
T1 - SAO semantic information identification for text mining
AU - Yang, Chao
AU - Zhu, Donghua
AU - Wang, Xuefeng
N1 - Publisher Copyright:
© 2017, the Authors.
PY - 2017/1
Y1 - 2017/1
N2 - A Subject-Action-Object (SAO) is a triple structure which can be used to both describe topics in detail and explore the relationship between them. SAO analysis has become popular in bibliometrics, however there are two challenges in the identification of SAO structures: low relevance of SAOs to domain topics; and synonyms in SAO. These problems make the identification of SAO greatly dependent upon domain experts, limiting the further usage of SAO and influencing further the mining of SAO characteristics. This paper proposes a parse tree-based SAO identification method that includes (1) a model to identify the core components (candidate terms for subject & object) of SAO structures, where term clumping processes and co-word analysis are involved; (2) a parse tree-based hierarchical SAO extraction model to divide entire SAO structures into a collection of simpler sub-tasks for separate subject, action, and object identification; and (3) an SAO weighting model to rank SAO structures for result selection. The proposed method is applied to publications in the Journal of Scientometrics (SCIM), to identify and rank significant SAO structures. Our experiment results demonstrate the validity and feasibility of the proposed method.
AB - A Subject-Action-Object (SAO) is a triple structure which can be used to both describe topics in detail and explore the relationship between them. SAO analysis has become popular in bibliometrics, however there are two challenges in the identification of SAO structures: low relevance of SAOs to domain topics; and synonyms in SAO. These problems make the identification of SAO greatly dependent upon domain experts, limiting the further usage of SAO and influencing further the mining of SAO characteristics. This paper proposes a parse tree-based SAO identification method that includes (1) a model to identify the core components (candidate terms for subject & object) of SAO structures, where term clumping processes and co-word analysis are involved; (2) a parse tree-based hierarchical SAO extraction model to divide entire SAO structures into a collection of simpler sub-tasks for separate subject, action, and object identification; and (3) an SAO weighting model to rank SAO structures for result selection. The proposed method is applied to publications in the Journal of Scientometrics (SCIM), to identify and rank significant SAO structures. Our experiment results demonstrate the validity and feasibility of the proposed method.
KW - Computational intelligence
KW - Semantic analysis
KW - Subject-Action-Object
KW - Technology intelligence
KW - Topic model
UR - http://www.scopus.com/inward/record.url?scp=85018733807&partnerID=8YFLogxK
U2 - 10.2991/ijcis.2017.10.1.40
DO - 10.2991/ijcis.2017.10.1.40
M3 - Article
AN - SCOPUS:85018733807
SN - 1875-6891
VL - 10
SP - 593
EP - 604
JO - International Journal of Computational Intelligence Systems
JF - International Journal of Computational Intelligence Systems
IS - 1
ER -