TY - GEN
T1 - Finding novel diagnostic gene patterns based on interesting non-redundant contrast sequence rules
AU - Zhao, Yuhai
AU - Wang, Guoren
AU - Li, Yuan
AU - Wang, Zhanghui
PY - 2011
Y1 - 2011
N2 - Diagnostic genes refer to the genes closely related to a specific disease phenotype, the powers of which to distinguish between different classes are often high. Most methods to discovering the powerful diagnostic genes are either singleton discriminability-based or combination discriminability-based. However, both ignore the abundant interactions among genes, which widely exist in the real world. In this paper, we tackle the problem from a new point of view and make the following contributions: (1) we propose an EWave model, which profitably exploits the ordered expressions among genes based on the defined equivalent dimension group sequences taking into account the "noise" universal in the real data; (2) we devise a novel sequence rule, namely interesting non-redundant contrast sequence rule, which is able to capture the difference between different phenotypes in a high accuracy using as few as possible genes; (3) we present an efficient algorithm called NRMINER to find such rules. Unlike the conventional column enumeration and the more recent rowenumeration, it performs a novel template-driven enumeration by making use of the special characteristic of microarray data modeled by EWave. Extensive experiments conducted on various synthetic and real datasets show that: (1) NRMINER is significantly faster than the competing algorithm by up to about one order of magnitude; (2) it provides a higher accuracy using fewer genes. Many diagnostic genes discovered by NRMINER are proved biologically related to some disease.
AB - Diagnostic genes refer to the genes closely related to a specific disease phenotype, the powers of which to distinguish between different classes are often high. Most methods to discovering the powerful diagnostic genes are either singleton discriminability-based or combination discriminability-based. However, both ignore the abundant interactions among genes, which widely exist in the real world. In this paper, we tackle the problem from a new point of view and make the following contributions: (1) we propose an EWave model, which profitably exploits the ordered expressions among genes based on the defined equivalent dimension group sequences taking into account the "noise" universal in the real data; (2) we devise a novel sequence rule, namely interesting non-redundant contrast sequence rule, which is able to capture the difference between different phenotypes in a high accuracy using as few as possible genes; (3) we present an efficient algorithm called NRMINER to find such rules. Unlike the conventional column enumeration and the more recent rowenumeration, it performs a novel template-driven enumeration by making use of the special characteristic of microarray data modeled by EWave. Extensive experiments conducted on various synthetic and real datasets show that: (1) NRMINER is significantly faster than the competing algorithm by up to about one order of magnitude; (2) it provides a higher accuracy using fewer genes. Many diagnostic genes discovered by NRMINER are proved biologically related to some disease.
KW - Data mining
KW - Diagnostic gene
KW - Sequence rule
UR - https://www.scopus.com/pages/publications/84863168918
U2 - 10.1109/ICDM.2011.68
DO - 10.1109/ICDM.2011.68
M3 - Conference contribution
AN - SCOPUS:84863168918
SN - 9780769544083
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 972
EP - 981
BT - Proceedings - 11th IEEE International Conference on Data Mining, ICDM 2011
T2 - 11th IEEE International Conference on Data Mining, ICDM 2011
Y2 - 11 December 2011 through 14 December 2011
ER -