Decomposition of DNA Sequence Complexity

Abstract
Profiles of sequence compositional complexity provide a view of the spatial heterogeneity of symbolic sequences at different levels of detail. Sequence compositional complexity profiles are here decomposed into partial profiles using the branching property of the Shannon entropy. This decomposition shows the complexity contributed by each individual symbol or group of symbols. In particular, we apply this method to the mapping rules (symbol groupings) commonly used in DNA sequence analysis. We find that strong-weak bindings are remarkable homogeneously distributed as compared to purine pyrimidine, and that A and T are the most heterogeneous distributed bases.