Base Information Content in Organic Formulas

Abstract
Three questions are addressed concerning organic formulas at their most primitive level: (1) What is the information per atomic symbol? (2) What is the level of system redundancy? (3) How are high-information formulas distinguished from low-information ones? The results are simple yet interesting. Carbon chemistry embodies a code which is low in base information and high in redundancy, irrespective of database size. Moreover, code units associated with halocarbons, proteins, and polynucleotides are especially high in information. Low-information units are more often associated with simple alkanes, aromatics, and common functional groups. Overall, the work for this paper quantifies the base information content in organic formulas; this contributes to research on symbolic language, chemical information, and molecular diversity.

This publication has 7 references indexed in Scilit: