HIV integration site selection: Analysis by massively parallel pyrosequencing reveals association with epigenetic modifications

Abstract
Integration of retroviral DNA into host cell DNA is a defining feature of retroviral replication. HIV integration is known to be favored in active transcription units, which promotes efficient transcription of the viral genes, but the molecular mechanisms responsible for targeting are not fully clarified. Here we used pyrosequencing to map 40,569 unique sites of HIV integration. Computational prediction of nucleosome positions in target DNA indicated that integration sites are periodically distributed on the nucleosome surface, consistent with favored integration into outward-facing DNA major grooves in chromatin. Analysis of integration site positions in the densely annotated ENCODE regions revealed a wealth of new associations between integration frequency and genomic features. Integration was particularly favored near transcription-associated histone modifications, including H3 acetylation, H4 acetylation, and H3 K4 methylation, but was disfavored in regions rich in transcription-inhibiting modifications, which include H3 K27 trimethylation and DNA CpG methylation. Statistical modeling indicated that effects of histone modification on HIV integration were partially independent of other genomic features influencing integration. The pyrosequencing and bioinformatic methods described here should be useful for investigating many aspects of retroviral DNA integration.