Abstract
To calculate the question similarity in the community interlocution systems, an improved TFIDF algorithm was proposed in this paper. Firstly, the questions were divided into different categories according to the users' retrieval intention, and the weight of every feature word was adjusted based on the distribution in the categories. And then, the topic words were adopted in the feature words for TFIDF algorithm. The experimental results show that, compared with the traditional TFIDF, the P@3 increases 7.66%. Compared with TFIDF-IG, the P@3 increases 5.31%. And different improvements can be obtained in P@5 and P@10. The new algorithm shows better search performance.
Original language | English |
---|---|
Pages (from-to) | 982-985 |
Number of pages | 4 |
Journal | Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology |
Volume | 37 |
Issue number | 9 |
DOIs | |
Publication status | Published - 1 Sept 2017 |
Keywords
- Community interlocution system
- Question similarity
- TFIDF algorithm
- Vector space model