跳到主要导航 跳到搜索 跳到主要内容

基于深度随机森林的商品类超短文本分类研究

  • Zhendong Niu
  • , Pengfei Shi*
  • , Yifan Zhu
  • , Sifan Zhang
  • *此作品的通讯作者
  • Beijing Institute of Technology

科研成果: 期刊稿件文章同行评审

摘要

In recent years, with the development of mobile communication and information technology, more and more ultra-short text data with a length of no more than 20 words and no auxiliary tag information need to be processed on the network and in actual application scenarios. Because of inherent ambiguity and feature sparseness of ultra-short text, obvious lack of context, and difficulty in distinguishing semantics, an effective classification method is needed in the field of text categorization. To solve the performance problem of those classifiers based on the traditional short text classification method KNN and the decision tree, a new method was proposed based on deep random forest for the classification of commodity short texts. Using a "diversion" strategy and taking an external knowledge base as assistance, the method was arranged to directly determine the commodity name with the clear category in the knowledge base, and to vectorize the description of the incapable extracted commodity name based on a Word2vec tool. And then the vectors in the external knowledge base were classified according to deep random forest. Finally, the classifier was continually optimized until the threshold of training set size was reached. The experimental results show that compared with the traditional classification method KNN and decision tree, the classification method proposed in this paper can improve the average accuracy by 22.78% and 17.22%, and the average recall rate by 22.85% and 15.23% respectively.

投稿的翻译标题Research on Classification of Commodity Ultra-Short Text Based on Deep Random Forest
源语言繁体中文
页(从-至)1277-1285
页数9
期刊Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology
41
12
DOI
出版状态已出版 - 12月 2021

关键词

  • Commodity
  • Deep random forest
  • Ultra-short text classification

指纹

探究 '基于深度随机森林的商品类超短文本分类研究' 的科研主题。它们共同构成独一无二的指纹。

引用此