TY - GEN
T1 - An approach for identifying author profiles of blogs
AU - Zhang, Chunxia
AU - Guo, Yu
AU - Wu, Jiayu
AU - Wang, Shuliang
AU - Niu, Zhendong
AU - Cheng, Wen
N1 - Publisher Copyright:
© Springer International Publishing AG 2017.
PY - 2017
Y1 - 2017
N2 - Author profile identification has been an important research problem in the areas of web mining, network public opinion monitoring and social network analysis. The aim of this problem is to identify characteristics or traits of authors of textual information such as blogs, microblogs or reviews in social network platforms or commercial platforms. The technology of author profile identification can be employed into many applications including cyberspace forensics, electronic commerce and information security. In this paper, we propose a hybrid framework or technique to solve the author profile identification problem. In this framework, we design a distributed integrated representation approach of blogs based on Doc2vec and term frequency-inverse document frequency, and apply the convolutional neural network to predict age, gender and education status of authors of blogs. The benefit of our technique is that it predicts three different traits of authors in a uniform way, is an unsupervised method which can learn representation vectors of blog posts based on unlabeled data, and does not need any syntactic and semantic parsing of sentences. Experimental results on blogs show that our approach achieves a promising performance.
AB - Author profile identification has been an important research problem in the areas of web mining, network public opinion monitoring and social network analysis. The aim of this problem is to identify characteristics or traits of authors of textual information such as blogs, microblogs or reviews in social network platforms or commercial platforms. The technology of author profile identification can be employed into many applications including cyberspace forensics, electronic commerce and information security. In this paper, we propose a hybrid framework or technique to solve the author profile identification problem. In this framework, we design a distributed integrated representation approach of blogs based on Doc2vec and term frequency-inverse document frequency, and apply the convolutional neural network to predict age, gender and education status of authors of blogs. The benefit of our technique is that it predicts three different traits of authors in a uniform way, is an unsupervised method which can learn representation vectors of blog posts based on unlabeled data, and does not need any syntactic and semantic parsing of sentences. Experimental results on blogs show that our approach achieves a promising performance.
KW - Age prediction
KW - Author profile identification
KW - Convolutional neural network
KW - Doc2vec
KW - Education status prediction
KW - Gender prediction
UR - http://www.scopus.com/inward/record.url?scp=85033687300&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-69179-4_33
DO - 10.1007/978-3-319-69179-4_33
M3 - Conference contribution
AN - SCOPUS:85033687300
SN - 9783319691787
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 475
EP - 487
BT - Advanced Data Mining and Applications - 13th International Conference, ADMA 2017, Proceedings
A2 - Peng, Wen-Chih
A2 - Zhang, Wei Emma
A2 - Cong, Gao
A2 - Sun, Aixin
A2 - Li, Chengliang
PB - Springer Verlag
T2 - 13th International Conference on Advanced Data Mining and Applications, ADMA 2017
Y2 - 5 November 2017 through 6 November 2017
ER -