摘要
Inspired by the success of large language models (LLMs), we develop a long-context generative model for genomes. Our multiscale transformer model, megaDNA, is pre-trained on unannotated bacteriophage genomes with nucleotide-level tokenization. We demonstrate the foundational capabilities of our model including the prediction of essential genes, genetic variant effects, regulatory element activity and taxonomy of unannotated sequences. Furthermore, it generates de novo sequences up to 96 K base pairs, which contain potential regulatory elements and annotated proteins with phage-related functions.
| 源语言 | 英语 |
|---|---|
| 文章编号 | 9392 |
| 期刊 | Nature Communications |
| 卷 | 15 |
| 期 | 1 |
| DOI | |
| 出版状态 | 已出版 - 12月 2024 |
| 已对外发布 | 是 |
指纹
探究 'A long-context language model for deciphering and generating bacteriophage genomes' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver