Base Information Content in Organic Formulas

21 June 2000

journal article
research article
Published by American Chemical Society (ACS) in Journal of Chemical Information and Computer Sciences

Vol. 40 (4), 942-946
https://doi.org/10.1021/ci990182k

Abstract

Three questions are addressed concerning organic formulas at their most primitive level: (1) What is the information per atomic symbol? (2) What is the level of system redundancy? (3) How are high-information formulas distinguished from low-information ones? The results are simple yet interesting. Carbon chemistry embodies a code which is low in base information and high in redundancy, irrespective of database size. Moreover, code units associated with halocarbons, proteins, and polynucleotides are especially high in information. Low-information units are more often associated with simple alkanes, aromatics, and common functional groups. Overall, the work for this paper quantifies the base information content in organic formulas; this contributes to research on symbolic language, chemical information, and molecular diversity.

Keywords

This publication has 7 references indexed in Scilit:

Molecular Diversity and Representativity in Chemical Databases
Journal of Chemical Information and Computer Sciences, 1998
On the Properties of Bit String-Based Measures of Chemical Similarity
Journal of Chemical Information and Computer Sciences, 1998
Rapid Quantification of Molecular Diversity for Selective Database Acquisition
Journal of Chemical Information and Computer Sciences, 1997
Historic development of chemical notations
Journal of Chemical Information and Computer Sciences, 1985
Chemical inference. 1. Formalization of the language of organic chemistry: generic structural formulas
Journal of Chemical Information and Computer Sciences, 1983
The Advanced Theory of Language as Choice and Chance
Published by Springer Nature ,1966
Prediction and Entropy of Printed English
Bell System Technical Journal, 1951

Cited by 19 articles