Gene2vec: gene subsequence embedding for prediction of mammalian N 6 -methyladenosine sites from mRNA

  • Quan Zou*
  • , Pengwei Xing
  • , Leyi Wei
  • , Bin Liu
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

N 6 -Methyladenosine (m 6 A) refers to methylation modification of the adenosine nucleotide acid at the nitrogen-6 position. Many conventional computational methods for identifying N 6 -methyladenosine sites are limited by the small amount of data available. Taking advantage of the thousands of m 6 A sites detected by high-throughput sequencing, it is now possible to discover the characteristics of m 6 A sequences using deep learning techniques. To the best of our knowledge, our work is the first attempt to use word embedding and deep neural networks for m 6 A prediction from mRNA sequences. Using four deep neural networks, we developed a model inferred from a larger sequence shifting window that can predict m 6 A accurately and robustly. Four prediction schemes were built with various RNA sequence representations and optimized convolutional neural networks. The soft voting results from the four deep networks were shown to outperform all of the state-of-the-art methods. We evaluated these predictors mentioned above on a rigorous independent test data set and proved that our proposed method outperforms the state-of-the-art predictors. The training, independent, and cross-species testing data sets are much larger than in previous studies, which could help to avoid the problem of overfitting. Furthermore, an online prediction web server implementing the four proposed predictors has been built and is available at http://server.malab.cn/Gene2vec/.

Original languageEnglish
Pages (from-to)205-218
Number of pages14
JournalRNA
Volume25
Issue number2
DOIs
Publication statusPublished - Feb 2019
Externally publishedYes

Keywords

  • Deep learning
  • Machine learning
  • N6-methyladenosine
  • RNA word embedding
  • mRNA

Fingerprint

Dive into the research topics of 'Gene2vec: gene subsequence embedding for prediction of mammalian N 6 -methyladenosine sites from mRNA'. Together they form a unique fingerprint.

Cite this