NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins

Top Cited Papers

Open Access

17 December 2004

journal article
Published by Oxford University Press (OUP) in Nucleic Acids Research

Vol. 33 (Database ), D501-D504
https://doi.org/10.1093/nar/gki025

Abstract

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff.

Keywords

This publication has 14 references indexed in Scilit:

GenBank
Nucleic Acids Research, 2004
Entrez Gene: gene-centered information at NCBI
Nucleic Acids Research, 2004
Database resources of the National Center for Biotechnology Information
Nucleic Acids Research, 2004
The Mouse Genome Database (MGD): integrating biology with the genome
Nucleic Acids Research, 2004
Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms
Nucleic Acids Research, 2004
Regulation of gene expression by stop codon recoding: selenocysteine
Gene, 2003
Generation of protein isoform diversity by alternative initiation of translation at non‐AUG codons
Biology of the Cell, 2003
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
[10] Entrez: Molecular biology database and retrieval system
Methods in Enzymology, 1996
Basic local alignment search tool
Journal of Molecular Biology, 1990

Cited by 1458 articles