Abstract
The Human Genome Variation Database (HGVbase; http://hgvbase.cgb.ki.se) has provided a curated summary of human DNA variation for more than 5 years, thus facilitating research into DNA sequence variation and human phenotypes. The database has undergone many changes and improvements to accommodate increasing volumes and new types of data. The focus of HGVbase has recently shifted towards information on haplotypes and phenotypes, relationships between phenotypes and DNA variation, and collaborative efforts to provide a global resource for genome-phenome data. Open sharing and precise phenotype definitions are necessary to advance the current understanding of common diseases that are typified by complex aetiologies, small genetic effect sizes and multiple confounding factors that obscure positive study results. Association data will increasingly be collected as part of this new project thrust. This report describes the evolving features of HGVbase, and covers in detail the technological choices we have made to enable efficient storage and data mining of increasingly large and complex data sets.