Design prokaryotic cis-regulatory elements using language model

Research output: Contribution to journalArticlepeer-review

Abstract

Deep learning has successfully been applied to design cis-regulatory elements (CREs) for a few species, but a broadly applicable platform for generating functional promoters for thousands of prokaryotes remains lacking. In this study, we introduce a language model for prokaryotic CREs, referred to as PromoGen2, to design CREs without prior experimental data. PromoGen2 was pretrained on CREs derived from 17000 prokaryotic genomes. It achieved the highest zero-shot prediction correlation of promoter strength across species, improving the average Spearman correlation from 0.27 to 0.50 compared to the best baseline, while reducing the number of parameters by 103. Artificial CREs designed with PromoGen2 demonstrated a 100% success rate in Escherichia coli, Bacillus subtilis, Bacillus licheniformis, and Agrobacterium tumefaciens. Based on PromoGen2, we developed the Promoter-Factory framework to design promoters from unannotated genomes. Experimental validation showed that most of the promoters designed for Jejubacter sp. L23, a newly isolated halophilic bacterium with no available CREs, were active and capable of driving lycopene overproduction. Additionally, we introduced PromoGen2-proka, a taxonomy-aware model for CRE design based on prokaryotic genera. Experimental validation confirmed its reliable success rate. The combined use of PromoGen2-proka and Promoter-Factory offers a broadly applicable tool for designing CREs for prokaryotes, fulfilling the needs of synthetic biology and microbiology research.

Original languageEnglish
Article numbergkag122
JournalNucleic Acids Research
Volume54
Issue number4
DOIs
Publication statusPublished - 27 Feb 2026

Fingerprint

Dive into the research topics of 'Design prokaryotic cis-regulatory elements using language model'. Together they form a unique fingerprint.

Cite this