TY - GEN
T1 - Clustering orthologs based on sequence and domain similarities
AU - Zhang, Fa
AU - Feng, Sheng Zhong
AU - Ozer, Hatice
AU - Yuan, Bo
PY - 2005
Y1 - 2005
N2 - In this paper, we present a fully automatic computational method to cluster orthologs and inparalogs from multiple species. We use the program Blastp to generate a pairwise distance matrix, which is then normalized for each homologous group between and within the species included. We also used protein domains and their organization in protein sequences as an additional criterion for filtering false relationships. Ortholog clusters are first seeded with multiple reciprocal best pairwise matches, after which the Markov graph-flow algorithm is applied to include in-paralogs. Classification parameters such as the inflation index are optimized according to the functional consistency in each of the clusters. This was inferred by the comparison of ontological annotations available for each of the sequences belonging to the same cluster. We apply our program on six completely sequenced eukaryotic genomes, assigns confidence values for both orthologs and in-paralogs. We note significant improvement for the clustering of orthologs with recent paralogs, comparing our results with similar efforts at NCBI and TIGR. This provides an automatic and robust method to cluster orthologous genes of multiple genomes.
AB - In this paper, we present a fully automatic computational method to cluster orthologs and inparalogs from multiple species. We use the program Blastp to generate a pairwise distance matrix, which is then normalized for each homologous group between and within the species included. We also used protein domains and their organization in protein sequences as an additional criterion for filtering false relationships. Ortholog clusters are first seeded with multiple reciprocal best pairwise matches, after which the Markov graph-flow algorithm is applied to include in-paralogs. Classification parameters such as the inflation index are optimized according to the functional consistency in each of the clusters. This was inferred by the comparison of ontological annotations available for each of the sequences belonging to the same cluster. We apply our program on six completely sequenced eukaryotic genomes, assigns confidence values for both orthologs and in-paralogs. We note significant improvement for the clustering of orthologs with recent paralogs, comparing our results with similar efforts at NCBI and TIGR. This provides an automatic and robust method to cluster orthologous genes of multiple genomes.
UR - http://www.scopus.com/inward/record.url?scp=33847114503&partnerID=8YFLogxK
U2 - 10.1109/HPCASIA.2005.27
DO - 10.1109/HPCASIA.2005.27
M3 - Conference contribution
AN - SCOPUS:33847114503
SN - 0769524869
SN - 9780769524863
T3 - Proceedings - Eighth International Conference on High-Performance Computing in Asia-Pacific Region, HPC Asia 2005
SP - 645
EP - 651
BT - Proceedings - Eighth International Conference on High-Performance Computing in Asia-Pacific Region, HPC Asia 2005
T2 - 8th International Conference on High-Performance Computing in Asia-Pacific Region, HPC Asia 2005
Y2 - 30 November 2005 through 3 December 2005
ER -