Physical database design for relational databases
- 1 March 1988
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Database Systems
- Vol. 13 (1), 91-128
- https://doi.org/10.1145/42201.42205
Abstract
This paper describes the concepts used in the implementation of DBDSGN, an experimental physical design tool for relational databases developed at the IBM San Jose Research Laboratory. Given a workload for System R (consisting of a set of SQL statements and their execution frequencies), DBDSGN suggests physical configurations for efficient performance. Each configuration consists of a set of indices and an ordering for each table. Workload statements are evaluated only for atomic configurations of indices, which have only one index per table. Costs for any configuration can be obtained from those of the atomic configurations. DBDSGN uses information supplied by the System R optimizer both to determine which columns might be worth indexing and to obtain estimates of the cost of executing statements in different configurations. The tool finds efficient solutions to the index-selection problem; if we assume the cost estimates supplied by the optimizer are the actual execution costs, it finds the optimal solution. Optionally, heuristics can be used to reduce execution time. The approach taken by DBDSGN in solving the index-selection problem for multiple-table statements significantly reduces the complexity of the problem. DBDSGN's principles were used in the Relational Design Tool (RDT), an IBM product based on DBDSGN, which performs design for SQL/DS, a relational system based on System R. System R actually uses DBDSGN's suggested solutions as the tool expects because cost estimates and other necessary information can be obtained from System R using a new SQL statement, the EXPLAIN statement. This illustrates how a system can export a model of its internal assumptions and behavior so that other systems (such as tools) can share this model.Keywords
This publication has 18 references indexed in Scilit:
- Approximating the number of unique values of an attribute without sortingInformation Systems, 1987
- Implications of certain assumptions in database performance evauationACM Transactions on Database Systems, 1984
- A history and evaluation of System RCommunications of the ACM, 1981
- Support for repetitive transactions and ad hoc queries in System RACM Transactions on Database Systems, 1981
- System R: An architectural overviewIBM Systems Journal, 1981
- The difficulty of optimum index selectionACM Transactions on Database Systems, 1978
- Minimum cost selection of secondary indexes for formatted filesACM Transactions on Database Systems, 1977
- SEQUEL 2: A Unified Approach to Data Definition, Manipulation, and ControlIBM Journal of Research and Development, 1976
- System RACM Transactions on Database Systems, 1976
- Analysis and performance of inverted data base structuresCommunications of the ACM, 1975