Clustering orthologs based on sequence and domain similarities

Fa Zhang; Sheng Zhong Feng; Hatice Ozer; Bo Yuan

doi:10.1109/HPCASIA.2005.27

Clustering orthologs based on sequence and domain similarities

Fa Zhang^*, Sheng Zhong Feng, Hatice Ozer, Bo Yuan

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

1 Citation (Scopus)

Abstract

In this paper, we present a fully automatic computational method to cluster orthologs and inparalogs from multiple species. We use the program Blastp to generate a pairwise distance matrix, which is then normalized for each homologous group between and within the species included. We also used protein domains and their organization in protein sequences as an additional criterion for filtering false relationships. Ortholog clusters are first seeded with multiple reciprocal best pairwise matches, after which the Markov graph-flow algorithm is applied to include in-paralogs. Classification parameters such as the inflation index are optimized according to the functional consistency in each of the clusters. This was inferred by the comparison of ontological annotations available for each of the sequences belonging to the same cluster. We apply our program on six completely sequenced eukaryotic genomes, assigns confidence values for both orthologs and in-paralogs. We note significant improvement for the clustering of orthologs with recent paralogs, comparing our results with similar efforts at NCBI and TIGR. This provides an automatic and robust method to cluster orthologous genes of multiple genomes.

Original language	English
Title of host publication	Proceedings - Eighth International Conference on High-Performance Computing in Asia-Pacific Region, HPC Asia 2005
Pages	645-651
Number of pages	7
DOIs	https://doi.org/10.1109/HPCASIA.2005.27
Publication status	Published - 2005
Externally published	Yes
Event	8th International Conference on High-Performance Computing in Asia-Pacific Region, HPC Asia 2005 - Beijing, China Duration: 30 Nov 2005 → 3 Dec 2005

Publication series

Name	Proceedings - Eighth International Conference on High-Performance Computing in Asia-Pacific Region, HPC Asia 2005
Volume	2005

Conference

Conference	8th International Conference on High-Performance Computing in Asia-Pacific Region, HPC Asia 2005
Country/Territory	China
City	Beijing
Period	30/11/05 → 3/12/05

Access to Document

10.1109/HPCASIA.2005.27

Cite this

Zhang, F., Feng, S. Z., Ozer, H., & Yuan, B. (2005). Clustering orthologs based on sequence and domain similarities. In Proceedings - Eighth International Conference on High-Performance Computing in Asia-Pacific Region, HPC Asia 2005 (pp. 645-651). Article 1592336 (Proceedings - Eighth International Conference on High-Performance Computing in Asia-Pacific Region, HPC Asia 2005; Vol. 2005). https://doi.org/10.1109/HPCASIA.2005.27

@inproceedings{fb75294eec114a32ac8787326c5ef6fe,

title = "Clustering orthologs based on sequence and domain similarities",

abstract = "In this paper, we present a fully automatic computational method to cluster orthologs and inparalogs from multiple species. We use the program Blastp to generate a pairwise distance matrix, which is then normalized for each homologous group between and within the species included. We also used protein domains and their organization in protein sequences as an additional criterion for filtering false relationships. Ortholog clusters are first seeded with multiple reciprocal best pairwise matches, after which the Markov graph-flow algorithm is applied to include in-paralogs. Classification parameters such as the inflation index are optimized according to the functional consistency in each of the clusters. This was inferred by the comparison of ontological annotations available for each of the sequences belonging to the same cluster. We apply our program on six completely sequenced eukaryotic genomes, assigns confidence values for both orthologs and in-paralogs. We note significant improvement for the clustering of orthologs with recent paralogs, comparing our results with similar efforts at NCBI and TIGR. This provides an automatic and robust method to cluster orthologous genes of multiple genomes.",

author = "Fa Zhang and Feng, {Sheng Zhong} and Hatice Ozer and Bo Yuan",

year = "2005",

doi = "10.1109/HPCASIA.2005.27",

language = "English",

isbn = "0769524869",

series = "Proceedings - Eighth International Conference on High-Performance Computing in Asia-Pacific Region, HPC Asia 2005",

pages = "645--651",

booktitle = "Proceedings - Eighth International Conference on High-Performance Computing in Asia-Pacific Region, HPC Asia 2005",

note = "8th International Conference on High-Performance Computing in Asia-Pacific Region, HPC Asia 2005 ; Conference date: 30-11-2005 Through 03-12-2005",

}

Zhang, F, Feng, SZ, Ozer, H & Yuan, B 2005, Clustering orthologs based on sequence and domain similarities. in Proceedings - Eighth International Conference on High-Performance Computing in Asia-Pacific Region, HPC Asia 2005., 1592336, Proceedings - Eighth International Conference on High-Performance Computing in Asia-Pacific Region, HPC Asia 2005, vol. 2005, pp. 645-651, 8th International Conference on High-Performance Computing in Asia-Pacific Region, HPC Asia 2005, Beijing, China, 30/11/05. https://doi.org/10.1109/HPCASIA.2005.27

Clustering orthologs based on sequence and domain similarities. / Zhang, Fa; Feng, Sheng Zhong; Ozer, Hatice et al.
Proceedings - Eighth International Conference on High-Performance Computing in Asia-Pacific Region, HPC Asia 2005. 2005. p. 645-651 1592336 (Proceedings - Eighth International Conference on High-Performance Computing in Asia-Pacific Region, HPC Asia 2005; Vol. 2005).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Clustering orthologs based on sequence and domain similarities

AU - Zhang, Fa

AU - Feng, Sheng Zhong

AU - Ozer, Hatice

AU - Yuan, Bo

PY - 2005

Y1 - 2005

N2 - In this paper, we present a fully automatic computational method to cluster orthologs and inparalogs from multiple species. We use the program Blastp to generate a pairwise distance matrix, which is then normalized for each homologous group between and within the species included. We also used protein domains and their organization in protein sequences as an additional criterion for filtering false relationships. Ortholog clusters are first seeded with multiple reciprocal best pairwise matches, after which the Markov graph-flow algorithm is applied to include in-paralogs. Classification parameters such as the inflation index are optimized according to the functional consistency in each of the clusters. This was inferred by the comparison of ontological annotations available for each of the sequences belonging to the same cluster. We apply our program on six completely sequenced eukaryotic genomes, assigns confidence values for both orthologs and in-paralogs. We note significant improvement for the clustering of orthologs with recent paralogs, comparing our results with similar efforts at NCBI and TIGR. This provides an automatic and robust method to cluster orthologous genes of multiple genomes.

AB - In this paper, we present a fully automatic computational method to cluster orthologs and inparalogs from multiple species. We use the program Blastp to generate a pairwise distance matrix, which is then normalized for each homologous group between and within the species included. We also used protein domains and their organization in protein sequences as an additional criterion for filtering false relationships. Ortholog clusters are first seeded with multiple reciprocal best pairwise matches, after which the Markov graph-flow algorithm is applied to include in-paralogs. Classification parameters such as the inflation index are optimized according to the functional consistency in each of the clusters. This was inferred by the comparison of ontological annotations available for each of the sequences belonging to the same cluster. We apply our program on six completely sequenced eukaryotic genomes, assigns confidence values for both orthologs and in-paralogs. We note significant improvement for the clustering of orthologs with recent paralogs, comparing our results with similar efforts at NCBI and TIGR. This provides an automatic and robust method to cluster orthologous genes of multiple genomes.

UR - http://www.scopus.com/inward/record.url?scp=33847114503&partnerID=8YFLogxK

U2 - 10.1109/HPCASIA.2005.27

DO - 10.1109/HPCASIA.2005.27

M3 - Conference contribution

AN - SCOPUS:33847114503

SN - 0769524869

SN - 9780769524863

T3 - Proceedings - Eighth International Conference on High-Performance Computing in Asia-Pacific Region, HPC Asia 2005

SP - 645

EP - 651

BT - Proceedings - Eighth International Conference on High-Performance Computing in Asia-Pacific Region, HPC Asia 2005

T2 - 8th International Conference on High-Performance Computing in Asia-Pacific Region, HPC Asia 2005

Y2 - 30 November 2005 through 3 December 2005

ER -

Zhang F, Feng SZ, Ozer H, Yuan B. Clustering orthologs based on sequence and domain similarities. In Proceedings - Eighth International Conference on High-Performance Computing in Asia-Pacific Region, HPC Asia 2005. 2005. p. 645-651. 1592336. (Proceedings - Eighth International Conference on High-Performance Computing in Asia-Pacific Region, HPC Asia 2005). doi: 10.1109/HPCASIA.2005.27

Clustering orthologs based on sequence and domain similarities

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this