TY - GEN
T1 - CIS
T2 - 11th Joint International Computer Conference, JICC 2005
AU - Zhao, Yuhai
AU - Yin, Ying
AU - Wang, Guoren
AU - Mao, Keming
PY - 2005
Y1 - 2005
N2 - The rapid development of the microarray technology brings about a great challenge to conventional clustering methods. The sparsity of data (sample), the high dimensionality of feature (gene) space and many irrelevant or redundant features all make it difficult to find correct clusters in gene expression data by using conventional clustering methods directly. In this paper, we present CIS, an algorithm for clustering biological samples using gene expression microarray data. Different from other approaches, CIS iterates between two processes, reclustering genes (not filtering genes) and clustering samples. Inspired by the policy of refining progressively, CIS repeatedly partition the set of initial genes with the new-generated sample clusters as features, and then partition samples with the new-generated gene clusters as features over again to identify the significant sample clusters and relevant genes. The method is applied to two gene microarray data sets, on colon cancer and leukemia. The experiment result show that CIS works well on both two datasets. We partition the two sample sets by eight and twentynine genes respectively, thus both acquire the accuracy about 90%. All these indicate that the CIS might be a promising approach for gene expression data analysis when domain knowledge is absent.
AB - The rapid development of the microarray technology brings about a great challenge to conventional clustering methods. The sparsity of data (sample), the high dimensionality of feature (gene) space and many irrelevant or redundant features all make it difficult to find correct clusters in gene expression data by using conventional clustering methods directly. In this paper, we present CIS, an algorithm for clustering biological samples using gene expression microarray data. Different from other approaches, CIS iterates between two processes, reclustering genes (not filtering genes) and clustering samples. Inspired by the policy of refining progressively, CIS repeatedly partition the set of initial genes with the new-generated sample clusters as features, and then partition samples with the new-generated gene clusters as features over again to identify the significant sample clusters and relevant genes. The method is applied to two gene microarray data sets, on colon cancer and leukemia. The experiment result show that CIS works well on both two datasets. We partition the two sample sets by eight and twentynine genes respectively, thus both acquire the accuracy about 90%. All these indicate that the CIS might be a promising approach for gene expression data analysis when domain knowledge is absent.
KW - clustering
KW - gene expression data
KW - microarray
KW - nonparametric clustering
UR - http://www.scopus.com/inward/record.url?scp=84903581513&partnerID=8YFLogxK
U2 - 10.1142/9789812701534_0147
DO - 10.1142/9789812701534_0147
M3 - Conference contribution
AN - SCOPUS:84903581513
SN - 9812565329
SN - 9789812565327
T3 - Proceedings of the 11th Joint International Computer Conference, JICC 2005
SP - 651
EP - 656
BT - Proceedings of the 11th Joint International Computer Conference, JICC 2005
PB - World Scientific Publishing Co. Pte Ltd
Y2 - 10 November 2005 through 12 November 2005
ER -