一种面向大规模序列数据的交互特征并行挖掘算法

Translated title of the contribution: A Parallel Algorithm for Mining Interactive Features from Large Scale Sequences

Research output: Contribution to journalArticlepeer-review

Abstract

Sequence is an important type of data which is widely existing in various domains, and thus feature selection from sequence data is of practical significance in extensive applications. Interactive features refer to a set of features, each of which is weakly correlated with the target, but the whole of which is strongly correlated with the target. It is of great challenge to mine interactive features from large scale sequence data for the combinatorial explosion problem of loci. To address the problem, against the background of high-throughput sequencing in biology, a parallel evolutionary algorithm for high-order interactive features mining is proposed in this paper. Instead of sequence-block based parallel strategy, the work is inspired by loci-based idea since the number of loci is the fundamental factor that restricts the efficiency. Further, we propose the conception of maximal allelic common subsequence (MACS) and MACS based strategy for feature region partition. According to the strategy, the search range of interactive features is narrowed to many fragged spaces and interactions are guaranteed not to exist among different fragments. Finally, a parallel ant algorithm based on substitution search is developed to conduct interactive feature selection. Extensive experiments on real and synthetic datasets show that the efficiency and effectiveness of the proposed PACOIFS algorithm is superior to that of competitive algorithms.

Translated title of the contributionA Parallel Algorithm for Mining Interactive Features from Large Scale Sequences
Original languageChinese (Traditional)
Pages (from-to)992-1006
Number of pages15
JournalJisuanji Yanjiu yu Fazhan/Computer Research and Development
Volume56
Issue number5
DOIs
Publication statusPublished - 1 May 2019
Externally publishedYes

Fingerprint

Dive into the research topics of 'A Parallel Algorithm for Mining Interactive Features from Large Scale Sequences'. Together they form a unique fingerprint.

Cite this