TY - JOUR
T1 - Parameterized BLOSUM matrices for protein alignment
AU - Song, Dandan
AU - Chen, Jiaxing
AU - Chen, Guang
AU - Li, Ning
AU - Li, Jin
AU - Fan, Jun
AU - Bu, Dongbo
AU - Li, Shuai Cheng
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2015/5/1
Y1 - 2015/5/1
N2 - Protein alignment is a basic step for many molecular biology researches. The BLOSUM matrices, especially BLOSUM62, are the de facto standard matrices for protein alignments. However, after widely utilization of the matrices for 15 years, programming errors were surprisingly found in the initial version of source codes for their generation. And amazingly, after bug correction, the "intended" BLOSUM62 matrix performs consistently worse than the "miscalculated" one. In this paper, we find linear relationships among the eigenvalues of the matrices and propose an algorithm to find optimal unified eigenvectors. With them, we can parameterize matrix BLOSUMx for any given variable x that could change continuously. We compare the effectiveness of our parameterized isentropic matrix with BLOSUM62. Furthermore, an iterative alignment and matrix selection process is proposed to adaptively find the best parameter and globally align two sequences. Experiments are conducted on aligning 13, 667 families of Pfam database and on clustering MHC II protein sequences, whose improved accuracy demonstrates the effectiveness of our proposed method.
AB - Protein alignment is a basic step for many molecular biology researches. The BLOSUM matrices, especially BLOSUM62, are the de facto standard matrices for protein alignments. However, after widely utilization of the matrices for 15 years, programming errors were surprisingly found in the initial version of source codes for their generation. And amazingly, after bug correction, the "intended" BLOSUM62 matrix performs consistently worse than the "miscalculated" one. In this paper, we find linear relationships among the eigenvalues of the matrices and propose an algorithm to find optimal unified eigenvectors. With them, we can parameterize matrix BLOSUMx for any given variable x that could change continuously. We compare the effectiveness of our parameterized isentropic matrix with BLOSUM62. Furthermore, an iterative alignment and matrix selection process is proposed to adaptively find the best parameter and globally align two sequences. Experiments are conducted on aligning 13, 667 families of Pfam database and on clustering MHC II protein sequences, whose improved accuracy demonstrates the effectiveness of our proposed method.
KW - Parameterized BLOSUM matrices
KW - Protein alignment
KW - Substitution matrix
UR - http://www.scopus.com/inward/record.url?scp=84940370058&partnerID=8YFLogxK
U2 - 10.1109/TCBB.2014.2366126
DO - 10.1109/TCBB.2014.2366126
M3 - Article
C2 - 26357279
AN - SCOPUS:84940370058
SN - 1545-5963
VL - 12
SP - 686
EP - 694
JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics
JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics
IS - 3
M1 - 2366126
ER -