An estimate of the sequencing error frequency in the DNA sequence databases

Abstract
We have examined vector sequences fortuitously present in the EMBL sequence database as contaminating parts of submitted sequences, and found a sequencing error frequency of 3.55% in this subset of release 27 of the database. We discuss the possibility that this value may be representative for corresponding errors in the database as a whole.

This publication has 5 references indexed in Scilit: