Multidimensional indexing and query coordination for tertiary storage management
- 20 January 2003
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 214-225
- https://doi.org/10.1109/ssdm.1999.787637
Abstract
In many scientific domains, experimental devices or simulation programs generate large volumes of data. The volumes of data may reach hundreds of terabytes and therefore it is impractical to store them on disk systems. Rather they are stored on robotic tape systems that are managed by some mass storage system (MSS). A major bottleneck in analyzing the simulated/collected data is the retrieval of subsets from the tertiary storage system. We describe the architecture and implementation of a Storage Access Coordination System (STACS) designed to optimize the use of a disk cache, and thus minimize the number of files read from tape. We achieve this by using a specialized index to locate the relevant data on tapes, and by coordinating file caching over multiple queries. We focus on a specific application area, a high energy physics data management and analysis environment. STACS was implemented and is being incorporated in an operational system, scheduled to go online at the end of 1999. We also include the results of various tests that demonstrate the benefits and efficiency gained of using the STACS.Keywords
This publication has 3 references indexed in Scilit:
- Coarse indices for a tape-based data warehousePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Determining the optimal file size on tertiary storage systems based on the distribution of query sizesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- The pyramid-techniquePublished by Association for Computing Machinery (ACM) ,1998