The Genomedata format for storing large-scale functional genomics data
Open Access
- 29 April 2010
- journal article
- research article
- Published by Oxford University Press (OUP) in Bioinformatics
- Vol. 26 (11), 1458-1459
- https://doi.org/10.1093/bioinformatics/btq164
Abstract
Summary: We present a format for efficient storage of multiple tracks of numeric data anchored to a genome. The format allows fast random access to hundreds of gigabytes of data, while retaining a small disk space footprint. We have also developed utilities to load data into this format. We show that retrieving data from this format is more than 2900 times faster than a naive approach using wiggle files. Availability and Implementation: Reference implementation in Python and C components available at http://noble.gs.washington.edu/proj/genomedata/ under the GNU General Public License. Contact: william-noble@uw.eduKeywords
This publication has 7 references indexed in Scilit:
- Standardizing the Next Generation of Bioinformatics Software Development with BioHDF (HDF5)Advances in experimental medicine and biology, 2010
- The UCSC Genome Browser database: update 2010Nucleic Acids Research, 2009
- ChIP–seq: advantages and challenges of a maturing technologyNature Reviews Genetics, 2009
- The Sequence Alignment/Map format and SAMtoolsBioinformatics, 2009
- Global mapping of protein-DNA interactions in vivo by digital genomic footprintingNature Methods, 2009
- Distinct DNA methylation patterns characterize differentiated human embryonic stem cells and developing human fetal liverGenome Research, 2009
- Sequence census methods for functional genomicsNature Methods, 2007