TY - GEN
T1 - Evaluation of compression methods for genomic sequence
AU - Dai, Lin
AU - Wang, Li
AU - Wang, Jingru
AU - Zhang, Zhang
N1 - Publisher Copyright:
© 2015 Taylor & Francis Group, London.
PY - 2015
Y1 - 2015
N2 - Along with the development of sequencing technology, the volume of genome datasets have increased greatly at a fast rate. The excessive surging of genome data causes storage issues to public or private databases as well to upload or transmit genome data via Internet. Data compression is an effective method to solve these problems. However, various genome compression methods adopting different strategies have been presented during the previous years, make it challenging to choose the optimal method for practical use. In this paper, we first review state of the art on genome compression, then evaluate three excellent algorithms (GReEn, GDC and DELIMINATE) on real data and compare their performance with popular general-purpose compression algorithms, i.e., gizp, bzip2, xz and their parallel versions. Instead of declaring the best method, we give advices to choose appropriate methods for specific genome dataset.
AB - Along with the development of sequencing technology, the volume of genome datasets have increased greatly at a fast rate. The excessive surging of genome data causes storage issues to public or private databases as well to upload or transmit genome data via Internet. Data compression is an effective method to solve these problems. However, various genome compression methods adopting different strategies have been presented during the previous years, make it challenging to choose the optimal method for practical use. In this paper, we first review state of the art on genome compression, then evaluate three excellent algorithms (GReEn, GDC and DELIMINATE) on real data and compare their performance with popular general-purpose compression algorithms, i.e., gizp, bzip2, xz and their parallel versions. Instead of declaring the best method, we give advices to choose appropriate methods for specific genome dataset.
UR - http://www.scopus.com/inward/record.url?scp=84943648342&partnerID=8YFLogxK
U2 - 10.1201/b18508-56
DO - 10.1201/b18508-56
M3 - Conference contribution
AN - SCOPUS:84943648342
SN - 9781138028111
T3 - Computer Science and Applications - Proceedings of the Asia-Pacific Conference on Computer Science and Applications, CSAC 2014
SP - 319
EP - 325
BT - Computer Science and Applications - Proceedings of the Asia-Pacific Conference on Computer Science and Applications, CSAC 2014
A2 - Hu, Ally
PB - CRC Press/Balkema
T2 - Proceedings of the Asia-Pacific Conference on Computer Science and Applications, CSAC 2014
Y2 - 27 December 2014 through 28 December 2014
ER -