Abstract
Local alignment is a common technique for finding a pair of highly similar substrings from two given sequences, which is very important in the biological information field. With the enlargement of data scale, the state of arts memory-based algorithms are not suitable for answering local alignment when handling long text data. In this paper, we study the problem of local alignment top-k query over external suffix tree. It could break the bottleneck limited by the memory space. In order to avoid unnecessary computing cost, we firstly employ a series of filtering strategies based on the classic memory-based algorithms. Via property amending them, these algorithms could effectively enhance the performance of our solution. We then propose a novel algorithm for answering top-k query local alignment over external suffix tree. It empolies the heuristic strategy for avoiding the defect of TA-algorithm. For one thing, it could provide a powerful threshold for filtering. For another, it could efficiently reduce the candidates maintainance cost.Then, we deeply study the operational principle of external suffix tree and disk. As the basis, we propose several techniques for optimizing external memory accessing. The results of the experiments on the real genetic data demonstrate the effectiveness of our algorithms.
Original language | English |
---|---|
Pages (from-to) | 2061-2074 |
Number of pages | 14 |
Journal | Jisuanji Xuebao/Chinese Journal of Computers |
Volume | 39 |
Issue number | 10 |
DOIs | |
Publication status | Published - 1 Oct 2016 |
Externally published | Yes |
Keywords
- External suffix tree
- Fork area
- Local alignment
- Top-k