The Genomedata format for storing large-scale functional genomics data

Open Access

29 April 2010

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 26 (11), 1458-1459
https://doi.org/10.1093/bioinformatics/btq164

Abstract

Summary: We present a format for efficient storage of multiple tracks of numeric data anchored to a genome. The format allows fast random access to hundreds of gigabytes of data, while retaining a small disk space footprint. We have also developed utilities to load data into this format. We show that retrieving data from this format is more than 2900 times faster than a naive approach using wiggle files. Availability and Implementation: Reference implementation in Python and C components available at http://noble.gs.washington.edu/proj/genomedata/ under the GNU General Public License. Contact: william-noble@uw.edu

Keywords

This publication has 7 references indexed in Scilit:

Standardizing the Next Generation of Bioinformatics Software Development with BioHDF (HDF5)
Advances in experimental medicine and biology, 2010
The UCSC Genome Browser database: update 2010
Nucleic Acids Research, 2009
ChIP–seq: advantages and challenges of a maturing technology
Nature Reviews Genetics, 2009
The Sequence Alignment/Map format and SAMtools
Bioinformatics, 2009
Global mapping of protein-DNA interactions in vivo by digital genomic footprinting
Nature Methods, 2009
Distinct DNA methylation patterns characterize differentiated human embryonic stem cells and developing human fetal liver
Genome Research, 2009
Sequence census methods for functional genomics
Nature Methods, 2007

Cited by 20 articles