Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads
Top Cited Papers
Open Access
- 8 June 2017
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 13 (6), e1005595
- https://doi.org/10.1371/journal.pcbi.1005595
Abstract
The Illumina DNA sequencing platform generates accurate but short reads, which can be used to produce accurate but fragmented genome assemblies. Pacific Biosciences and Oxford Nanopore Technologies DNA sequencing platforms generate long reads that can produce complete genome assemblies, but the sequencing is more expensive and error-prone. There is significant interest in combining data from these complementary sequencing technologies to generate more accurate “hybrid” assemblies. However, few tools exist that truly leverage the benefits of both types of data, namely the accuracy of short reads and the structural resolving power of long reads. Here we present Unicycler, a new tool for assembling bacterial genomes from a combination of short and long reads, which produces assemblies that are accurate, complete and cost-effective. Unicycler builds an initial assembly graph from short reads using the de novo assembler SPAdes and then simplifies the graph using information from short and long reads. Unicycler uses a novel semi-global aligner to align long reads to the assembly graph. Tests on both synthetic and real reads show Unicycler can assemble larger contigs with fewer misassemblies than other hybrid assemblers, even when long-read depth and accuracy are low. Unicycler is open source (GPLv3) and available at github.com/rrwick/Unicycler.Funding Information
- National Health and Medical Research Council (1043822)
- National Health and Medical Research Council (1061409)
This publication has 47 references indexed in Scilit:
- Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theoryBMC Bioinformatics, 2012
- Finished bacterial genomes from shotgun sequence dataGenome Research, 2012
- Hybrid error correction and de novo assembly of single-molecule sequencing readsNature Biotechnology, 2012
- SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell SequencingJournal of Computational Biology, 2012
- Fast gapped-read alignment with Bowtie 2Nature Methods, 2012
- ART: a next-generation sequencing read simulatorBioinformatics, 2011
- Adaptive seeds tame genomic sequence comparisonGenome Research, 2011
- Genome Sequencing and Comparative Analysis ofKlebsiella pneumoniaeNTUH-K2044, a Strain Causing Liver Abscess and MeningitisJournal of Bacteriology, 2009
- T-coffee: a novel method for fast and accurate multiple sequence alignment 1 1Edited by J. ThorntonJournal of Molecular Biology, 2000
- A Whole-Genome Assembly of DrosophilaScience, 2000