BIT at SemEval-2016 task 1: Sentence similarity based on alignments and vector with the weight of information content

Hao Wu, Heyan Huang, Wenpeng Lu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)

Abstract

This paper describes three unsupervised systems for determining the semantic similarity between two short texts or sentences submitted to the SemEval 2016 Task 1, all of which make use of only off-the-shelf software and data making them easy to replicate. Two systems achieved a similar Pearson correlation coefficient (0.64661 by simple vector, 0.65319 by word alignments). We include experiments on using our alignment based system on evaluation data from the 2014 and 2015 STS shared task. The results suggest that beyond the core similarity algorithm, other factors such as data preprocessing and use of domain-specific knowledge are also important to similarity prediction performance.

Original languageEnglish
Title of host publicationSemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings
PublisherAssociation for Computational Linguistics (ACL)
Pages686-690
Number of pages5
ISBN (Electronic)9781941643952
DOIs
Publication statusPublished - 2016
Event10th International Workshop on Semantic Evaluation, SemEval 2016 - San Diego, United States
Duration: 16 Jun 201617 Jun 2016

Publication series

NameSemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings

Conference

Conference10th International Workshop on Semantic Evaluation, SemEval 2016
Country/TerritoryUnited States
CitySan Diego
Period16/06/1617/06/16

Fingerprint

Dive into the research topics of 'BIT at SemEval-2016 task 1: Sentence similarity based on alignments and vector with the weight of information content'. Together they form a unique fingerprint.

Cite this