Gap5—editing the billion fragment sequence assembly
Open Access
- 30 May 2010
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 26 (14), 1699-1703
- https://doi.org/10.1093/bioinformatics/btq268
Abstract
Motivation: Existing sequence assembly editors struggle with the volumes of data now readily available from the latest generation of DNA sequencing instruments. Results: We describe the Gap5 software along with the data structures and algorithms used that allow it to be scalable. We demonstrate this with an assembly of 1.1 billion sequence fragments and compare the performance with several other programs. We analyse the memory, CPU, I/O usage and file sizes used by Gap5. Availability and Implementation: Gap5 is part of the Staden Package and is available under an Open Source licence from http://staden.sourceforge.net. It is implemented in C and Tcl/Tk. Currently it works on Unix systems only. Contact:jkb@sanger.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.Keywords
This publication has 17 references indexed in Scilit:
- NGSView: an extensible open source editor for next-generation sequencing dataBioinformatics, 2009
- Genome Project Standards in a New Era of SequencingScience, 2009
- MapView: visualization of short reads alignment on a desktop computerBioinformatics, 2009
- Accurate whole human genome sequencing using reversible terminator chemistryNature, 2008
- EagleView: A genome assembly viewer for next-generation sequencing technologiesGenome Research, 2008
- The Human Genome Browser at UCSCGenome Research, 2002
- Consed: A Graphical Tool for Sequence FinishingGenome Research, 1998
- Sequence Assembly with CAFTOOLSGenome Research, 1998
- A new DNA sequence assembly programNucleic Acids Research, 1995
- R-treesPublished by Association for Computing Machinery (ACM) ,1984