TokenFree: A Tokenization-Free Generative Linguistic Steganographic Approach with Enhanced Imperceptibility

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Since tokenization serves a fundamental preprocessing step in numerous language models, tokens naturally constitute the basic embedding units for generative linguistic steganography. However, tokenization-based methods face challenges including limited embedding capacity and possible segmentation ambiguity. Despite existing character-level (one tokenization-free type) linguistic steganographic approaches, they face the problem of generating unknown or out-of-vocabulary words, potentially compromising steganographic imperceptibility. In this paper, we focus on both embedding capacity and imperceptibility of tokenization-free linguistic steganography. First, we suggest that unknown words mainly result from low-entropy distributions and rigid coding rules used in candidate pools, thus we propose an entropy-based selection approach to flexibly construct candidate pools. Further, we present a lexical emphasis approach, prioritizing characters within candidate pools capable of forming in-vocabulary words. Experiments show that, across a range of high embedding rates, our approaches achieve considerably higher imperceptibility and text fluency, increase anti-steganalysis capacity averagely by 14.4%, and particularly reduce out-of-vocabulary rate averagely by 88.7%, compared to the existing state-of-the-art character-level steganographic methods.

Original languageEnglish
Title of host publication2024 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2024 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages449-455
Number of pages7
ISBN (Electronic)9781665410205
DOIs
Publication statusPublished - 2024
Event2024 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2024 - Kuching, Malaysia
Duration: 6 Oct 202410 Oct 2024

Publication series

NameConference Proceedings - IEEE International Conference on Systems, Man and Cybernetics
ISSN (Print)1062-922X

Conference

Conference2024 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2024
Country/TerritoryMalaysia
CityKuching
Period6/10/2410/10/24

Fingerprint

Dive into the research topics of 'TokenFree: A Tokenization-Free Generative Linguistic Steganographic Approach with Enhanced Imperceptibility'. Together they form a unique fingerprint.

Cite this