TY - GEN
T1 - Partition frequency distance based filter method for finding approximate repetitions in DNA sequences
AU - Wang, Di
AU - Wang, Guoren
AU - Wu, Qingquan
AU - Chen, Baichen
AU - Yu, Changyong
AU - Zhao, Yi
AU - Yu, Ge
PY - 2006
Y1 - 2006
N2 - Searching for approximate repetitions in a DNA sequence has been an important topic in gene analysis. One of the problems in the study is that because of the varying lengths of patterns, the similarity between patterns cannot be judged accurately if we use only the concept of ED (Edit Distance). In this paper we shall use the function Similar to compute similarity, which considers both the difference and sameness between patterns at the same time. Seeing the computational complexity, we shall also propose a new distance PFD (Partition Frequency Distance) and design a new filter based on PFD, with which we can sort out candidate set of approximate repetitions efficiently. We use SUA instead of sliding window to get the fragments in a DNA sequence, so that the patterns of an approximate repetition have no limitation on length. The results show that with this technique we are able to find a bigger number of approximate repetitions than that of those found with tandem repeat finder.
AB - Searching for approximate repetitions in a DNA sequence has been an important topic in gene analysis. One of the problems in the study is that because of the varying lengths of patterns, the similarity between patterns cannot be judged accurately if we use only the concept of ED (Edit Distance). In this paper we shall use the function Similar to compute similarity, which considers both the difference and sameness between patterns at the same time. Seeing the computational complexity, we shall also propose a new distance PFD (Partition Frequency Distance) and design a new filter based on PFD, with which we can sort out candidate set of approximate repetitions efficiently. We use SUA instead of sliding window to get the fragments in a DNA sequence, so that the patterns of an approximate repetition have no limitation on length. The results show that with this technique we are able to find a bigger number of approximate repetitions than that of those found with tandem repeat finder.
UR - https://www.scopus.com/pages/publications/34547412415
U2 - 10.1109/BIBE.2006.253314
DO - 10.1109/BIBE.2006.253314
M3 - Conference contribution
AN - SCOPUS:34547412415
SN - 0769527272
SN - 9780769527277
T3 - Proceedings - Sixth IEEE Symposium on BioInformatics and BioEngineering, BIBE 2006
SP - 45
EP - 52
BT - Proceedings - Sixth IEEE Symposium on BioInformatics and BioEngineering, BIBE 2006
T2 - 6th IEEE Symposium on BioInformatics and BioEngineering, BIBE 2006
Y2 - 16 October 2006 through 18 October 2006
ER -