STEP: Generating Semantic Text Embeddings with Prompt

Wenqiang Cao, Qing Li, Siying Zhang, Rixin Xu*, Youqi Li

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

In recent years, semantic embeddings for text has played a bigger role in the field of natural language processing (NLP), additionally, it has shown great potential in real-life applications like search and recommendation systems. Therefore, models for generating semantic text embeddings have received extensive study. State-of-the-art solutions for text embeddings have evolved from traditional methods (like Word2Vec, Glove, etc.) to deep neural network based solutions (such as LSTM, Transformer, and pre-trained models like BERT and RoBERTa, etc), besides, frameworks like Sentence Transformer have already lowered the bar of training models for semantic text representation using customized models and datasets. In this paper, we investigated several well trained models according to Massive Text Embedding Benchmark (MTEB) in Huggingface website. Enlighted by the extensive use of prompt engineering in large language models like Llama or GPT3, we proposed STEP: a novel method using prompt to improve performance of text embeddings on downstream tasks, making it applicable to almost any pre-trained language models for text embeddings. Besides, STEP does not need to modify base model structure. In the experiment, we applied STEP to five pre-trained models chosen from MTEB, trained and evaluated our approach on two separated datasets, final results indicated that our approach could improve performance of tasks related to semantic text similarity.

源语言英语
主期刊名Proceedings - 2023 11th International Conference on Advanced Cloud and Big Data, CBD 2023
出版商Institute of Electrical and Electronics Engineers Inc.
180-185
页数6
ISBN(电子版)9798350345346
DOI
出版状态已出版 - 2023
活动11th International Conference on Advanced Cloud and Big Data, CBD 2023 - Hainan, 中国
期限: 18 12月 202319 12月 2023

出版系列

姓名Proceedings - 2023 11th International Conference on Advanced Cloud and Big Data, CBD 2023

会议

会议11th International Conference on Advanced Cloud and Big Data, CBD 2023
国家/地区中国
Hainan
时期18/12/2319/12/23

指纹

探究 'STEP: Generating Semantic Text Embeddings with Prompt' 的科研主题。它们共同构成独一无二的指纹。

引用此

Cao, W., Li, Q., Zhang, S., Xu, R., & Li, Y. (2023). STEP: Generating Semantic Text Embeddings with Prompt. 在 Proceedings - 2023 11th International Conference on Advanced Cloud and Big Data, CBD 2023 (页码 180-185). (Proceedings - 2023 11th International Conference on Advanced Cloud and Big Data, CBD 2023). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CBD63341.2023.00040