Partition frequency distance based filter method for finding approximate repetitions in DNA sequences

Di Wang*, Guoren Wang, Qingquan Wu, Baichen Chen, Changyong Yu, Yi Zhao, Ge Yu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Searching for approximate repetitions in a DNA sequence has been an important topic in gene analysis. One of the problems in the study is that because of the varying lengths of patterns, the similarity between patterns cannot be judged accurately if we use only the concept of ED (Edit Distance). In this paper we shall use the function Similar to compute similarity, which considers both the difference and sameness between patterns at the same time. Seeing the computational complexity, we shall also propose a new distance PFD (Partition Frequency Distance) and design a new filter based on PFD, with which we can sort out candidate set of approximate repetitions efficiently. We use SUA instead of sliding window to get the fragments in a DNA sequence, so that the patterns of an approximate repetition have no limitation on length. The results show that with this technique we are able to find a bigger number of approximate repetitions than that of those found with tandem repeat finder.

Original languageEnglish
Title of host publicationProceedings - Sixth IEEE Symposium on BioInformatics and BioEngineering, BIBE 2006
Pages45-52
Number of pages8
DOIs
Publication statusPublished - 2006
Externally publishedYes
Event6th IEEE Symposium on BioInformatics and BioEngineering, BIBE 2006 - Arlington, VA, United States
Duration: 16 Oct 200618 Oct 2006

Publication series

NameProceedings - Sixth IEEE Symposium on BioInformatics and BioEngineering, BIBE 2006

Conference

Conference6th IEEE Symposium on BioInformatics and BioEngineering, BIBE 2006
Country/TerritoryUnited States
CityArlington, VA
Period16/10/0618/10/06

Fingerprint

Dive into the research topics of 'Partition frequency distance based filter method for finding approximate repetitions in DNA sequences'. Together they form a unique fingerprint.

Cite this