Selection of Target Sites for Mobile DNA Integration in the Human Genome

Abstract
DNA sequences from retroviruses, retrotransposons, DNA transposons, and parvoviruses can all become integrated into the human genome. Accumulation of such sequences accounts for at least 40% of our genome today. These integrating elements are also of interest as gene-delivery vectors for human gene therapy. Here we present a comprehensive bioinformatic analysis of integration targeting by HIV, MLV, ASLV, SFV, L1, SB, and AAV. We used a mathematical method which allowed annotation of each base pair in the human genome for its likelihood of hosting an integration event by each type of element, taking advantage of more than 200 types of genomic annotation. This bioinformatic resource documents a wealth of new associations between genomic features and integration targeting. The study also revealed that the length of genomic intervals analyzed strongly affected the conclusions drawn—thus, answering the question “What genomic features affect integration?” requires carefully specifying the length scale of interest. Many types of genomic parasites insert their DNA sequences into the human genome. Among these are retroviruses such as HIV, transposons, and adeno-associated virus. These integrating elements are important for the changes they cause in the human genome upon insertion of new DNA, and as gene-delivery vehicles for use in gene therapy. Previous studies have generated sequences of genomic targets of integration by these elements. Here Berry, Hannenhalli, Leipzig, and Bushman present a comprehensive bioinformatic analysis that allows them to annotate each base pair in the human genome for its likelihood of hosting an integration event by each type of element. This resource should prove useful in understanding genomic evolution and optimizing gene delivery vectors for use in human gene therapy.