Context Length Extension via Generalized Extrapolation Scale

Linhan Li, Huaping Zhang*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Context length expansion of transformer models is considered a key challenge, especially when handling context beyond the training length during inference stage. In this paper, we propose Generalized extrapolatioN scalE (GeNE), a straightforward and effective method applied to the interpolate function of positional embeddings to achieve training short, test long. Experimental results show that GeNE notably improves long context language modeling. By randomly scaling the extrapolation ratio during the finetuning, GeNE achieves stable extrapolation on 64k contexts by training on 16k length. Further, the instruction following Llama2 model based on GeNE achieved competitive results compared with other open-source models of the same parameter scale. Our code is available at https://github.com/LhLi-QED/GeNE.

Original languageEnglish
Title of host publicationThe 62nd Annual Meeting of the Association for Computational Linguistics
Subtitle of host publicationFindings of the Association for Computational Linguistics, ACL 2024
EditorsLun-Wei Ku, Andre Martins, Vivek Srikumar
PublisherAssociation for Computational Linguistics (ACL)
Pages4211-4218
Number of pages8
ISBN (Electronic)9798891760998
DOIs
Publication statusPublished - 2024
Externally publishedYes
EventFindings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 - Hybrid, Bangkok, Thailand
Duration: 11 Aug 202416 Aug 2024

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

ConferenceFindings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024
Country/TerritoryThailand
CityHybrid, Bangkok
Period11/08/2416/08/24

Fingerprint

Dive into the research topics of 'Context Length Extension via Generalized Extrapolation Scale'. Together they form a unique fingerprint.

Cite this