摘要
The never-ending emergence of SARS-CoV-2 variations of concern (VOCs) has challenged the whole world for pandemic control. In order to develop effective drugs and vaccines, one needs to efficiently simulate SARS-CoV-2 spike receptor-binding domain (RBD) mutations and identify high-risk variants. We pretrain a large protein language model with approximately 408 million protein sequences and construct a high-throughput screening for the prediction of binding affinity and antibody escape. As the first work on SARS-CoV-2 RBD mutation simulation, we successfully identify mutations in the RBD regions of 5 VOCs and can screen millions of potential variants in seconds. Our workflow scales to 4096 NPUs with 96.5% scalability and 493.9× speedup in mixed-precision computing, while achieving a peak performance of 366.8 PFLOPS (reaching 34.9% theoretical peak) on Pengcheng Cloudbrain-II. Our method paves the way for simulating coronavirus evolution in order to prepare for a future pandemic that will inevitably take place. Our models are released at https://github.com/ZhiweiNiepku/SARS-CoV-2_mutation_simulation to facilitate future related work.
| 源语言 | 英语 |
|---|---|
| 页(从-至) | 650-665 |
| 页数 | 16 |
| 期刊 | International Journal of High Performance Computing Applications |
| 卷 | 37 |
| 期 | 6 |
| DOI | |
| 出版状态 | 已出版 - 11月 2023 |
| 已对外发布 | 是 |
联合国可持续发展目标
此成果有助于实现下列可持续发展目标:
-
可持续发展目标 3 良好健康与福祉
指纹
探究 'Running ahead of evolution—AI-based simulation for predicting future high-risk SARS-CoV-2 variants' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver