Physical database design for relational databases

1 March 1988

journal article
Published by Association for Computing Machinery (ACM) in ACM Transactions on Database Systems

Vol. 13 (1), 91-128
https://doi.org/10.1145/42201.42205

Abstract

This paper describes the concepts used in the implementation of DBDSGN, an experimental physical design tool for relational databases developed at the IBM San Jose Research Laboratory. Given a workload for System R (consisting of a set of SQL statements and their execution frequencies), DBDSGN suggests physical configurations for efficient performance. Each configuration consists of a set of indices and an ordering for each table. Workload statements are evaluated only for atomic configurations of indices, which have only one index per table. Costs for any configuration can be obtained from those of the atomic configurations. DBDSGN uses information supplied by the System R optimizer both to determine which columns might be worth indexing and to obtain estimates of the cost of executing statements in different configurations. The tool finds efficient solutions to the index-selection problem; if we assume the cost estimates supplied by the optimizer are the actual execution costs, it finds the optimal solution. Optionally, heuristics can be used to reduce execution time. The approach taken by DBDSGN in solving the index-selection problem for multiple-table statements significantly reduces the complexity of the problem. DBDSGN's principles were used in the Relational Design Tool (RDT), an IBM product based on DBDSGN, which performs design for SQL/DS, a relational system based on System R. System R actually uses DBDSGN's suggested solutions as the tool expects because cost estimates and other necessary information can be obtained from System R using a new SQL statement, the EXPLAIN statement. This illustrates how a system can export a model of its internal assumptions and behavior so that other systems (such as tools) can share this model.

Keywords

This publication has 18 references indexed in Scilit:

Approximating the number of unique values of an attribute without sorting
Information Systems, 1987
Implications of certain assumptions in database performance evauation
ACM Transactions on Database Systems, 1984
A history and evaluation of System R
Communications of the ACM, 1981
Support for repetitive transactions and ad hoc queries in System R
ACM Transactions on Database Systems, 1981
System R: An architectural overview
IBM Systems Journal, 1981
The difficulty of optimum index selection
ACM Transactions on Database Systems, 1978
Minimum cost selection of secondary indexes for formatted files
ACM Transactions on Database Systems, 1977
SEQUEL 2: A Unified Approach to Data Definition, Manipulation, and Control
IBM Journal of Research and Development, 1976
System R
ACM Transactions on Database Systems, 1976
Analysis and performance of inverted data base structures
Communications of the ACM, 1975

Cited by 144 articles