A detailed statistical model for relational query optimization

Abstract
This paper develops an approach to estimate the cardinality of results of relational operations of select, project, and semijoin, using detailed database statistics computed from the instances of a relational database. A Detailed Database Statistics Model (DDSM) is presented which, 1) portrays statistics about a relational database in matrix format, and 2) presents matrix operations for estimation of query results. The model is applicable to centralized as well as distributed database systems that support the relational model of data. The typical assumptions of uniformly distributed attribute values, and independence among attributes are relaxed. Since computed statistics about the database are used, the model is expected to enable accurate evaluation of query processing alternatives and thus better query processing strategies. The DDSM can be used in conjunction with existing query optimization algorithms, and with existing local processing and/or data transfer cost models.