TY - JOUR
T1 - Parallel divide and conquer bio-sequence comparison based on Smith-Waterman algorithm
AU - Zhang, Fa
AU - Qiao, Xiangzhen
AU - Liu, Zhiyong
PY - 2004/4
Y1 - 2004/4
N2 - Tools for pair-wise bio-sequence alignment have for long played a central role in computation biology. Several algorithms for bio-sequence alignment have been developed. The Smith-Waterman algorithm, based on dynamic programming, is considered the most fundamental alignment algorithm in bioinformatics. However the existing parallel Smith-Waterman algorithm needs large memory space, and this disadvantage limits the size of a sequence to be handled. As the data of biological sequences expand rapidly, the memory requirement of the existing parallel Smith-Waterman algorithm has become a critical problem. For solving this problem, we develop a new parallel bio-sequence alignment algorithm, using the strategy of divide and conquer, named PSW-DC algorithm. In our algorithm, first, we partition the query sequence into several subsequences and distribute them to every processor respectively, then compare each subsequence with the whole subject sequence in parallel, using the Smith-Waterman algorithm, and get an interim result, finally obtain the optimal alignment between the query sequence and subject sequence, through the special combination and extension method. Memory space required in our algorithm is reduced significantly in comparison with existing ones. We also develop a key technique of combination and extension, named the C&E method, to manipulate the interim results and obtain the final sequences alignment. We implement the new parallel bio-sequences alignment algorithm, the PSW-DC, in a cluster parallel system. Copyright by Science in China Press 2004.
AB - Tools for pair-wise bio-sequence alignment have for long played a central role in computation biology. Several algorithms for bio-sequence alignment have been developed. The Smith-Waterman algorithm, based on dynamic programming, is considered the most fundamental alignment algorithm in bioinformatics. However the existing parallel Smith-Waterman algorithm needs large memory space, and this disadvantage limits the size of a sequence to be handled. As the data of biological sequences expand rapidly, the memory requirement of the existing parallel Smith-Waterman algorithm has become a critical problem. For solving this problem, we develop a new parallel bio-sequence alignment algorithm, using the strategy of divide and conquer, named PSW-DC algorithm. In our algorithm, first, we partition the query sequence into several subsequences and distribute them to every processor respectively, then compare each subsequence with the whole subject sequence in parallel, using the Smith-Waterman algorithm, and get an interim result, finally obtain the optimal alignment between the query sequence and subject sequence, through the special combination and extension method. Memory space required in our algorithm is reduced significantly in comparison with existing ones. We also develop a key technique of combination and extension, named the C&E method, to manipulate the interim results and obtain the final sequences alignment. We implement the new parallel bio-sequences alignment algorithm, the PSW-DC, in a cluster parallel system. Copyright by Science in China Press 2004.
KW - Biological sequence alignment
KW - Divide and conquer
KW - Dynamic programming
KW - Parallel
UR - http://www.scopus.com/inward/record.url?scp=24944456320&partnerID=8YFLogxK
U2 - 10.1360/02yf0362
DO - 10.1360/02yf0362
M3 - Article
AN - SCOPUS:24944456320
SN - 1009-2757
VL - 47
SP - 221
EP - 231
JO - Science in China, Series F: Information Sciences
JF - Science in China, Series F: Information Sciences
IS - 2
ER -