Structural Diversity of Organic Chemistry. A Scaffold Analysis of the CAS Registry
Open Access
- 28 May 2008
- journal article
- research article
- Published by American Chemical Society (ACS) in The Journal of Organic Chemistry
- Vol. 73 (12), 4443-4451
- https://doi.org/10.1021/jo8001276
Abstract
By analyzing the scaffold content of the CAS Registry, we attempt to characterize in a comprehensive way the structural diversity of organic chemistry. The scaffold of a molecule is taken to be its framework, defined as all its ring systems and all the linkers that connect them. Framework data from more than 24 million organic compounds is analyzed. The distribution of frameworks among compounds is found to be top-heavy, i.e., a small percentage of frameworks occur in a large percentage of compounds. When frameworks are analyzed at the graph level, an even more top-heavy distribution is found: half of the compounds can be described by only 143 framework shapes. The most significant finding is that the framework distribution conforms almost exactly to a power law. This suggests that the more often a framework has been used as the basis for a compound, the more likely it is to be used in another compound. This may be explained by the cost of synthesis: making a new derivative of a framework is probably less costly if many other derivatives are known. We believe this power law is evidence that the minimization of synthetic cost has been a key factor in shaping the known universe of organic chemistry.Keywords
This publication has 30 references indexed in Scilit:
- Managing, profiling and analyzing a library of 2.6 million compounds gathered from 32 chemical providersMolecular Diversity, 2006
- Quest for the Rings. In Silico Exploration of Ring Universe To Identify Novel Bioactive Heteroaromatic ScaffoldsJournal of Medicinal Chemistry, 2006
- Ring Systems in Mutagenicity DatabasesJournal of Medicinal Chemistry, 2005
- Power laws, Pareto distributions and Zipf's lawContemporary Physics, 2005
- Generating Diverse Skeletons of Small Molecules CombinatoriallyScience, 2003
- Distribution of Molecular Scaffolds and R-Groups Isolated from Large Compound DatabasesJournal of Molecular Modeling, 1999
- Even–odd carbon atom disparityNature, 1996
- Topological statistics on a large structural fileJournal of Chemical Information and Modeling, 1990
- Installation and Operation of a Registry for Chemical Compounds.Journal of Chemical Documentation, 1965
- The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service.Journal of Chemical Documentation, 1965