Haplotype-resolved diverse human genomes and integrated analysis of structural variation

Abstract
Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent–child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average contig N50: 26 Mbp) integrate all forms of genetic variation even across complex loci. We identify 107,590 structural variants (SVs), of which 68% are not discovered by short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterize 130 of the most active mobile element source elements and find that 63% of all SVs arise by homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1,526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.
Funding Information
  • National Institutes of Health (U24HG007497)
  • National Institutes of Health (U24HG007497)
  • National Institutes of Health (U24HG007497)
  • National Institutes of Health (U24HG007497)
  • National Institutes of Health (R01HG002898)
  • National Institutes of Health (R01HD081256)
  • National Institutes of Health (U24HG007497)
  • National Institutes of Health (1U01HG010973)
  • National Institutes of Health (U24HG007497)
  • National Institutes of Health (1U01HG010973)
  • National Institutes of Health (R01HG002385)
  • National Institutes of Health (R15HG009565)
  • National Institutes of Health (1R35GM138212)
  • National Institutes of Health (1OT3HL147154)
  • National Institutes of Health (1U01HG010973)
  • National Institutes of Health (1R01HG007068-01A1)
  • National Institutes of Health (R01MH115957)
  • National Institutes of Health (R01GM130738)
  • National Human Genome Research Institute (3UM1HG008901-03S1)
  • National Human Genome Research Institute (3UM1HG008901-04S2)
  • National Science Foundation of China (32070663)
  • National Human Genome Research Institute (K99HG011041)
  • Wellcome (WT104947/Z/14/Z)
  • National Human Genome Research Institute (UM1HG008901)
  • National Human Genome Research Institute (UM1HG008901)
  • Bundesministerium für Bildung und Forschung (031L0184)
  • Deutsche Forschungsgemeinschaft (391137747)
  • Deutsche Forschungsgemeinschaft (395192176)
  • National Human Genome Research Institute (UM1HG008901)
  • European Research Council (773026)
  • Bundesministerium für Bildung und Forschung (031L0184)
  • Bundesministerium für Bildung und Forschung (031L0181A)
  • European Research Council (716290)