BIT at SemEval-2016 task 1: Sentence similarity based on alignments and vector with the weight of information content

Hao Wu; Heyan Huang; Wenpeng Lu

doi:10.18653/v1/s16-1105

BIT at SemEval-2016 task 1: Sentence similarity based on alignments and vector with the weight of information content

Hao Wu, Heyan Huang, Wenpeng Lu

Qilu University of Technology

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

3 引用（Scopus）

摘要

This paper describes three unsupervised systems for determining the semantic similarity between two short texts or sentences submitted to the SemEval 2016 Task 1, all of which make use of only off-the-shelf software and data making them easy to replicate. Two systems achieved a similar Pearson correlation coefficient (0.64661 by simple vector, 0.65319 by word alignments). We include experiments on using our alignment based system on evaluation data from the 2014 and 2015 STS shared task. The results suggest that beyond the core similarity algorithm, other factors such as data preprocessing and use of domain-specific knowledge are also important to similarity prediction performance.

源语言	英语
主期刊名	SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings
出版商	Association for Computational Linguistics (ACL)
页	686-690
页数	5
ISBN（电子版）	9781941643952
DOI	https://doi.org/10.18653/v1/s16-1105
出版状态	已出版 - 2016
活动	10th International Workshop on Semantic Evaluation, SemEval 2016 - San Diego, 美国期限: 16 6月 2016 → 17 6月 2016

出版系列

姓名	SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings

会议

会议	10th International Workshop on Semantic Evaluation, SemEval 2016
国家/地区	美国
市	San Diego
时期	16/06/16 → 17/06/16

访问文件

10.18653/v1/s16-1105

其它文件与链接

链接到 Scopus 的出版物

引用此

Wu, H., Huang, H., & Lu, W. (2016). BIT at SemEval-2016 task 1: Sentence similarity based on alignments and vector with the weight of information content. 在 SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings (页码 686-690). (SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/s16-1105

Wu, Hao ; Huang, Heyan ; Lu, Wenpeng. / BIT at SemEval-2016 task 1 : Sentence similarity based on alignments and vector with the weight of information content. SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings. Association for Computational Linguistics (ACL), 2016. 页码 686-690 (SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings).

@inproceedings{c904feae375643eeb8ec7a3931db5e78,

title = "BIT at SemEval-2016 task 1: Sentence similarity based on alignments and vector with the weight of information content",

abstract = "This paper describes three unsupervised systems for determining the semantic similarity between two short texts or sentences submitted to the SemEval 2016 Task 1, all of which make use of only off-the-shelf software and data making them easy to replicate. Two systems achieved a similar Pearson correlation coefficient (0.64661 by simple vector, 0.65319 by word alignments). We include experiments on using our alignment based system on evaluation data from the 2014 and 2015 STS shared task. The results suggest that beyond the core similarity algorithm, other factors such as data preprocessing and use of domain-specific knowledge are also important to similarity prediction performance.",

author = "Hao Wu and Heyan Huang and Wenpeng Lu",

note = "Publisher Copyright: {\textcopyright} 2016 Association for Computational Linguistics.; 10th International Workshop on Semantic Evaluation, SemEval 2016 ; Conference date: 16-06-2016 Through 17-06-2016",

year = "2016",

doi = "10.18653/v1/s16-1105",

language = "English",

series = "SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings",

publisher = "Association for Computational Linguistics (ACL)",

pages = "686--690",

booktitle = "SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings",

address = "United States",

}

Wu, H, Huang, H & Lu, W 2016, BIT at SemEval-2016 task 1: Sentence similarity based on alignments and vector with the weight of information content. 在 SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings. SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings, Association for Computational Linguistics (ACL), 页码 686-690, 10th International Workshop on Semantic Evaluation, SemEval 2016, San Diego, 美国, 16/06/16. https://doi.org/10.18653/v1/s16-1105

BIT at SemEval-2016 task 1: Sentence similarity based on alignments and vector with the weight of information content. / Wu, Hao; Huang, Heyan; Lu, Wenpeng.
SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings. Association for Computational Linguistics (ACL), 2016. 页码 686-690 (SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - BIT at SemEval-2016 task 1

T2 - 10th International Workshop on Semantic Evaluation, SemEval 2016

AU - Wu, Hao

AU - Huang, Heyan

AU - Lu, Wenpeng

PY - 2016

Y1 - 2016

N2 - This paper describes three unsupervised systems for determining the semantic similarity between two short texts or sentences submitted to the SemEval 2016 Task 1, all of which make use of only off-the-shelf software and data making them easy to replicate. Two systems achieved a similar Pearson correlation coefficient (0.64661 by simple vector, 0.65319 by word alignments). We include experiments on using our alignment based system on evaluation data from the 2014 and 2015 STS shared task. The results suggest that beyond the core similarity algorithm, other factors such as data preprocessing and use of domain-specific knowledge are also important to similarity prediction performance.

AB - This paper describes three unsupervised systems for determining the semantic similarity between two short texts or sentences submitted to the SemEval 2016 Task 1, all of which make use of only off-the-shelf software and data making them easy to replicate. Two systems achieved a similar Pearson correlation coefficient (0.64661 by simple vector, 0.65319 by word alignments). We include experiments on using our alignment based system on evaluation data from the 2014 and 2015 STS shared task. The results suggest that beyond the core similarity algorithm, other factors such as data preprocessing and use of domain-specific knowledge are also important to similarity prediction performance.

UR - http://www.scopus.com/inward/record.url?scp=85035774126&partnerID=8YFLogxK

U2 - 10.18653/v1/s16-1105

DO - 10.18653/v1/s16-1105

M3 - Conference contribution

AN - SCOPUS:85035774126

T3 - SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings

SP - 686

EP - 690

BT - SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings

PB - Association for Computational Linguistics (ACL)

Y2 - 16 June 2016 through 17 June 2016

ER -

Wu H, Huang H, Lu W. BIT at SemEval-2016 task 1: Sentence similarity based on alignments and vector with the weight of information content. 在 SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings. Association for Computational Linguistics (ACL). 2016. 页码 686-690. (SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings). doi: 10.18653/v1/s16-1105

BIT at SemEval-2016 task 1: Sentence similarity based on alignments and vector with the weight of information content

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此