PreHom-PCLM: protein remote homology detection by combing motifs and protein cubic language model

Jiangyi Shao, Qi Zhang, Ke Yan*, Bin Liu*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Protein remote homology detection is essential for structure prediction, function prediction, disease mechanism understanding, etc. The remote homology relationship depends on multiple protein properties, such as structural information and local sequence patterns. Previous studies have shown the challenges for predicting remote homology relationship by protein features at sequence level (e.g. position-specific score matrix). Protein motifs have been used in structure and function analysis due to their unique sequence patterns and implied structural information. Therefore, designing a usable architecture to fuse multiple protein properties based on motifs is urgently needed to improve protein remote homology detection performance. To make full use of the characteristics of motifs, we employed the language model called the protein cubic language model (PCLM). It combines multiple properties by constructing a motif-based neural network. Based on the PCLM, we proposed a predictor called PreHom-PCLM by extracting and fusing multiple motif features for protein remote homology detection. PreHom-PCLM outperforms the other state-of-the-art methods on the test set and independent test set. Experimental results further prove the effectiveness of multiple features fused by PreHom-PCLM for remote homology detection. Furthermore, the protein features derived from the PreHom-PCLM show strong discriminative power for proteins from different structural classes in the high-dimensional space. Availability and Implementation: http://bliulab.net/PreHom-PCLM.

Original languageEnglish
Article numberbbad347
JournalBriefings in Bioinformatics
Volume24
Issue number6
DOIs
Publication statusPublished - 1 Nov 2023

Keywords

  • protein cubic language model
  • protein remote homology detection
  • protein structural motif

Fingerprint

Dive into the research topics of 'PreHom-PCLM: protein remote homology detection by combing motifs and protein cubic language model'. Together they form a unique fingerprint.

Cite this