摘要
To calculate the question similarity in the community interlocution systems, an improved TFIDF algorithm was proposed in this paper. Firstly, the questions were divided into different categories according to the users' retrieval intention, and the weight of every feature word was adjusted based on the distribution in the categories. And then, the topic words were adopted in the feature words for TFIDF algorithm. The experimental results show that, compared with the traditional TFIDF, the P@3 increases 7.66%. Compared with TFIDF-IG, the P@3 increases 5.31%. And different improvements can be obtained in P@5 and P@10. The new algorithm shows better search performance.
源语言 | 英语 |
---|---|
页(从-至) | 982-985 |
页数 | 4 |
期刊 | Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology |
卷 | 37 |
期 | 9 |
DOI | |
出版状态 | 已出版 - 1 9月 2017 |