Base Information Content in Organic Formulas
- 21 June 2000
- journal article
- research article
- Published by American Chemical Society (ACS) in Journal of Chemical Information and Computer Sciences
- Vol. 40 (4), 942-946
- https://doi.org/10.1021/ci990182k
Abstract
Three questions are addressed concerning organic formulas at their most primitive level: (1) What is the information per atomic symbol? (2) What is the level of system redundancy? (3) How are high-information formulas distinguished from low-information ones? The results are simple yet interesting. Carbon chemistry embodies a code which is low in base information and high in redundancy, irrespective of database size. Moreover, code units associated with halocarbons, proteins, and polynucleotides are especially high in information. Low-information units are more often associated with simple alkanes, aromatics, and common functional groups. Overall, the work for this paper quantifies the base information content in organic formulas; this contributes to research on symbolic language, chemical information, and molecular diversity.Keywords
This publication has 7 references indexed in Scilit:
- Molecular Diversity and Representativity in Chemical DatabasesJournal of Chemical Information and Computer Sciences, 1998
- On the Properties of Bit String-Based Measures of Chemical SimilarityJournal of Chemical Information and Computer Sciences, 1998
- Rapid Quantification of Molecular Diversity for Selective Database AcquisitionJournal of Chemical Information and Computer Sciences, 1997
- Historic development of chemical notationsJournal of Chemical Information and Computer Sciences, 1985
- Chemical inference. 1. Formalization of the language of organic chemistry: generic structural formulasJournal of Chemical Information and Computer Sciences, 1983
- The Advanced Theory of Language as Choice and ChancePublished by Springer Nature ,1966
- Prediction and Entropy of Printed EnglishBell System Technical Journal, 1951