Finding novel diagnostic gene patterns based on interesting non-redundant contrast sequence rules

  • Yuhai Zhao
  • , Guoren Wang*
  • , Yuan Li
  • , Zhanghui Wang
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Diagnostic genes refer to the genes closely related to a specific disease phenotype, the powers of which to distinguish between different classes are often high. Most methods to discovering the powerful diagnostic genes are either singleton discriminability-based or combination discriminability-based. However, both ignore the abundant interactions among genes, which widely exist in the real world. In this paper, we tackle the problem from a new point of view and make the following contributions: (1) we propose an EWave model, which profitably exploits the ordered expressions among genes based on the defined equivalent dimension group sequences taking into account the "noise" universal in the real data; (2) we devise a novel sequence rule, namely interesting non-redundant contrast sequence rule, which is able to capture the difference between different phenotypes in a high accuracy using as few as possible genes; (3) we present an efficient algorithm called NRMINER to find such rules. Unlike the conventional column enumeration and the more recent rowenumeration, it performs a novel template-driven enumeration by making use of the special characteristic of microarray data modeled by EWave. Extensive experiments conducted on various synthetic and real datasets show that: (1) NRMINER is significantly faster than the competing algorithm by up to about one order of magnitude; (2) it provides a higher accuracy using fewer genes. Many diagnostic genes discovered by NRMINER are proved biologically related to some disease.

Original languageEnglish
Title of host publicationProceedings - 11th IEEE International Conference on Data Mining, ICDM 2011
Pages972-981
Number of pages10
DOIs
Publication statusPublished - 2011
Externally publishedYes
Event11th IEEE International Conference on Data Mining, ICDM 2011 - Vancouver, BC, Canada
Duration: 11 Dec 201114 Dec 2011

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Conference

Conference11th IEEE International Conference on Data Mining, ICDM 2011
Country/TerritoryCanada
CityVancouver, BC
Period11/12/1114/12/11

Keywords

  • Data mining
  • Diagnostic gene
  • Sequence rule

Fingerprint

Dive into the research topics of 'Finding novel diagnostic gene patterns based on interesting non-redundant contrast sequence rules'. Together they form a unique fingerprint.

Cite this