跳到主要导航 跳到搜索 跳到主要内容

A long-context language model for deciphering and generating bacteriophage genomes

  • Bin Shao*
  • , Jiawei Yan
  • *此作品的通讯作者
  • Beijing Institute of Technology
  • Harvard University
  • Independent Researcher

科研成果: 期刊稿件文章同行评审

摘要

Inspired by the success of large language models (LLMs), we develop a long-context generative model for genomes. Our multiscale transformer model, megaDNA, is pre-trained on unannotated bacteriophage genomes with nucleotide-level tokenization. We demonstrate the foundational capabilities of our model including the prediction of essential genes, genetic variant effects, regulatory element activity and taxonomy of unannotated sequences. Furthermore, it generates de novo sequences up to 96 K base pairs, which contain potential regulatory elements and annotated proteins with phage-related functions.

源语言英语
文章编号9392
期刊Nature Communications
15
1
DOI
出版状态已出版 - 12月 2024
已对外发布

指纹

探究 'A long-context language model for deciphering and generating bacteriophage genomes' 的科研主题。它们共同构成独一无二的指纹。

引用此