Clustering of highly homologous sequences to reduce the size of large protein databases

Top Cited Papers

1 March 2001

journal article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 17 (3), 282-283
https://doi.org/10.1093/bioinformatics/17.3.282

Abstract

We present a fast and flexible program for clustering large protein databases at different sequence identity levels. It takes less than 2 h for the all-against-all sequence comparison and clustering of the non-redundant protein database of over 560,000 sequences on a high-end PC. The output database, including only the representative sequences, can be used for more efficient and sensitive database searches.

Keywords

DATABASE SEARCH

Cited by 892 articles