TY - GEN
T1 - Context Length Extension via Generalized Extrapolation Scale
AU - Li, Linhan
AU - Zhang, Huaping
N1 - Publisher Copyright:
© 2024 Association for Computational Linguistics.
PY - 2024
Y1 - 2024
N2 - Context length expansion of transformer models is considered a key challenge, especially when handling context beyond the training length during inference stage. In this paper, we propose Generalized extrapolatioN scalE (GeNE), a straightforward and effective method applied to the interpolate function of positional embeddings to achieve training short, test long. Experimental results show that GeNE notably improves long context language modeling. By randomly scaling the extrapolation ratio during the finetuning, GeNE achieves stable extrapolation on 64k contexts by training on 16k length. Further, the instruction following Llama2 model based on GeNE achieved competitive results compared with other open-source models of the same parameter scale. Our code is available at https://github.com/LhLi-QED/GeNE.
AB - Context length expansion of transformer models is considered a key challenge, especially when handling context beyond the training length during inference stage. In this paper, we propose Generalized extrapolatioN scalE (GeNE), a straightforward and effective method applied to the interpolate function of positional embeddings to achieve training short, test long. Experimental results show that GeNE notably improves long context language modeling. By randomly scaling the extrapolation ratio during the finetuning, GeNE achieves stable extrapolation on 64k contexts by training on 16k length. Further, the instruction following Llama2 model based on GeNE achieved competitive results compared with other open-source models of the same parameter scale. Our code is available at https://github.com/LhLi-QED/GeNE.
UR - http://www.scopus.com/inward/record.url?scp=85205303256&partnerID=8YFLogxK
U2 - 10.18653/v1/2024.findings-acl.249
DO - 10.18653/v1/2024.findings-acl.249
M3 - Conference contribution
AN - SCOPUS:85205303256
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 4211
EP - 4218
BT - The 62nd Annual Meeting of the Association for Computational Linguistics
A2 - Ku, Lun-Wei
A2 - Martins, Andre
A2 - Srikumar, Vivek
PB - Association for Computational Linguistics (ACL)
T2 - Findings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024
Y2 - 11 August 2024 through 16 August 2024
ER -