Bypassing LLM Safeguards: The In-Context Tense Attack Approach

Shaohuang Wang, Ruijing Geng, Shuai Lei, Yanfei Lv, Huaping Zhang*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In the field of cybersecurity, the emergence of Large Language Models (LLMs) has opened up a new domain of potential risks. Although these models show an impressive level of capability across a range of applications, they are not free from the possibility of creating content that could negatively impact both social and digital safety. This paper examines the implications of In-Context Learning based command attacks, a burgeoning threat to the security and ethical integrity of LLMs. We introduce the In-Context Tense Attack(ITA) framework, a novel approach that employs harmful examples to undermine the integrity of LLMs. Our theoretical analysis elucidates how a constrained set of context examples can significantly influence the security mechanisms of LLMs. Through rigorous experimentation, we have substantiated the potency of ITA in elevating the success rate of jailbreaking prompts. On the PKU-Alignment/SafeRLHF dataset, ITA achieved a remarkable 92.99% increase in Accuracy, a 73.36% improvement in Rouge-L, and a 27.01% enhancement in Bleu-4 scores. Similarly, on the NVIDIA/Aegis-Safety dataset, ITA demonstrated a 72.03% increase in Accuracy, an 80.87% rise in Rouge-L, and a 40.24% boost in Bleu-4 scores. These results underscore the effectiveness of ITA in manipulating LLMs to generate harmful outputs, thereby highlighting the necessity for more robust security measures.

Original languageEnglish
Title of host publicationProceedings of the 14th International Conference on Computer Engineering and Networks - Volume III
EditorsGuangqiang Yin, Xiaodong Liu, Jian Su, Yangzhao Yang
PublisherSpringer Science and Business Media Deutschland GmbH
Pages266-277
Number of pages12
ISBN (Print)9789819642441
DOIs
Publication statusPublished - 2025
Event14th International Conference on Computer Engineering and Networks, CENet 2024 - Kashi, China
Duration: 18 Oct 202421 Oct 2024

Publication series

NameLecture Notes in Electrical Engineering
Volume1387 LNEE
ISSN (Print)1876-1100
ISSN (Electronic)1876-1119

Conference

Conference14th International Conference on Computer Engineering and Networks, CENet 2024
Country/TerritoryChina
CityKashi
Period18/10/2421/10/24

Keywords

  • Cyber-Security
  • In-Context Learning
  • Jailbreak
  • LLM Safety

Fingerprint

Dive into the research topics of 'Bypassing LLM Safeguards: The In-Context Tense Attack Approach'. Together they form a unique fingerprint.

Cite this