FLAS: Fast and high-throughput algorithm for PacBio long-read self-correction

  • Ergude Bao*
  • , Fei Xie
  • , Changjin Song
  • , Dandan Song
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

The third generation PacBio long reads have greatly facilitated sequencing projects with very large read lengths, but they contain about 15% sequencing errors and need error correction. For the projects with long reads only, it is challenging to make correction with fast speed, and also challenging to correct a sufficient amount of read bases, i.e. to achieve high-throughput self-correction. MECAT is currently among the fastest self-correction algorithms, but its throughput is relatively small (Xiao et al., 2017). Results: Here, we introduce FLAS, a wrapper algorithm of MECAT, to achieve high-throughput long-read self-correction while keeping MECAT's fast speed. FLAS finds additional alignments from MECAT prealigned long reads to improve the correction throughput, and removes misalignments for accuracy. In addition, FLAS also uses the corrected long-read regions to correct the uncorrected ones to further improve the throughput. In our performance tests on Escherichia coli, Saccharomyces cerevisiae, Arabidopsis thaliana and human long reads, FLAS can achieve 22.0-50.6% larger throughput than MECAT. FLAS is 2-13× faster compared to the self-correction algorithms other than MECAT, and its throughput is also 9.8-281.8% larger. The FLAS corrected long reads can be assembled into contigs of 13.1-29.8% larger N50 sizes than MECAT. Availability and implementation: The FLAS software can be downloaded for free from this site: https://github.com/baoe/flas. Supplementary information: Supplementary data are available at Bioinformatics online.

Original languageEnglish
Pages (from-to)3953-3960
Number of pages8
JournalBioinformatics
Volume35
Issue number20
DOIs
Publication statusPublished - 15 Oct 2019

Fingerprint

Dive into the research topics of 'FLAS: Fast and high-throughput algorithm for PacBio long-read self-correction'. Together they form a unique fingerprint.

Cite this