STEP: Generating Semantic Text Embeddings with Prompt

Wenqiang Cao, Qing Li, Siying Zhang, Rixin Xu*, Youqi Li

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In recent years, semantic embeddings for text has played a bigger role in the field of natural language processing (NLP), additionally, it has shown great potential in real-life applications like search and recommendation systems. Therefore, models for generating semantic text embeddings have received extensive study. State-of-the-art solutions for text embeddings have evolved from traditional methods (like Word2Vec, Glove, etc.) to deep neural network based solutions (such as LSTM, Transformer, and pre-trained models like BERT and RoBERTa, etc), besides, frameworks like Sentence Transformer have already lowered the bar of training models for semantic text representation using customized models and datasets. In this paper, we investigated several well trained models according to Massive Text Embedding Benchmark (MTEB) in Huggingface website. Enlighted by the extensive use of prompt engineering in large language models like Llama or GPT3, we proposed STEP: a novel method using prompt to improve performance of text embeddings on downstream tasks, making it applicable to almost any pre-trained language models for text embeddings. Besides, STEP does not need to modify base model structure. In the experiment, we applied STEP to five pre-trained models chosen from MTEB, trained and evaluated our approach on two separated datasets, final results indicated that our approach could improve performance of tasks related to semantic text similarity.

Original languageEnglish
Title of host publicationProceedings - 2023 11th International Conference on Advanced Cloud and Big Data, CBD 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages180-185
Number of pages6
ISBN (Electronic)9798350345346
DOIs
Publication statusPublished - 2023
Event11th International Conference on Advanced Cloud and Big Data, CBD 2023 - Hainan, China
Duration: 18 Dec 202319 Dec 2023

Publication series

NameProceedings - 2023 11th International Conference on Advanced Cloud and Big Data, CBD 2023

Conference

Conference11th International Conference on Advanced Cloud and Big Data, CBD 2023
Country/TerritoryChina
CityHainan
Period18/12/2319/12/23

Keywords

  • NLP
  • embedding
  • prompt
  • semantic

Fingerprint

Dive into the research topics of 'STEP: Generating Semantic Text Embeddings with Prompt'. Together they form a unique fingerprint.

Cite this