TY - JOUR
T1 - Efficiently mining local conserved clusters from gene expression data
AU - Wang, Guoren
AU - Zhao, Yuhai
AU - Zhao, Xiangguo
AU - Wang, Botao
AU - Qiao, Baiyou
PY - 2010/3
Y1 - 2010/3
N2 - Extensive studies have shown that mining gene expression data is important for both bioinformatics research and biomedical applications. However, most existing studies focus only on either co-regulated gene clusters or emerging patterns. Factually, another analysis scheme, i.e. simultaneously mining phenotypes and diagnostic genes, is also biologically significant, which has received relative little attention so far. In this paper, we explore a novel concept of local conserved gene cluster (LC-Cluster) to address this problem. Specifically, an LC-Cluster contains a subset of genes and a subset of conditions such that the genes show steady expression values (instead of the coherent pattern rising and falling synchronously defined by some previous work) only on the subset of conditions but not along all given conditions. To avoid the exponential growth in subspace search, we further present two efficient algorithms, namely FALCONER and E-FALCONER, to mine the complete set of maximal LC-Clusters from gene expression data sets based on enumeration tree. Extensive experiments conducted on both real gene expression data sets and synthetic data sets show: (1) our approaches are efficient and effective, (2) our approaches outperform the existing enumeration tree based algorithms, and (3) our approaches can discover an amount of LC-Clusters, which are potentially of high biological significance.
AB - Extensive studies have shown that mining gene expression data is important for both bioinformatics research and biomedical applications. However, most existing studies focus only on either co-regulated gene clusters or emerging patterns. Factually, another analysis scheme, i.e. simultaneously mining phenotypes and diagnostic genes, is also biologically significant, which has received relative little attention so far. In this paper, we explore a novel concept of local conserved gene cluster (LC-Cluster) to address this problem. Specifically, an LC-Cluster contains a subset of genes and a subset of conditions such that the genes show steady expression values (instead of the coherent pattern rising and falling synchronously defined by some previous work) only on the subset of conditions but not along all given conditions. To avoid the exponential growth in subspace search, we further present two efficient algorithms, namely FALCONER and E-FALCONER, to mine the complete set of maximal LC-Clusters from gene expression data sets based on enumeration tree. Extensive experiments conducted on both real gene expression data sets and synthetic data sets show: (1) our approaches are efficient and effective, (2) our approaches outperform the existing enumeration tree based algorithms, and (3) our approaches can discover an amount of LC-Clusters, which are potentially of high biological significance.
KW - Bioinformatics
KW - Clustering
KW - Gene expression data
UR - http://www.scopus.com/inward/record.url?scp=77649236301&partnerID=8YFLogxK
U2 - 10.1016/j.neucom.2009.11.009
DO - 10.1016/j.neucom.2009.11.009
M3 - Article
AN - SCOPUS:77649236301
SN - 0925-2312
VL - 73
SP - 1425
EP - 1437
JO - Neurocomputing
JF - Neurocomputing
IS - 7-9
ER -