Multidimensional indexing and query coordination for tertiary storage management

Abstract
In many scientific domains, experimental devices or simulation programs generate large volumes of data. The volumes of data may reach hundreds of terabytes and therefore it is impractical to store them on disk systems. Rather they are stored on robotic tape systems that are managed by some mass storage system (MSS). A major bottleneck in analyzing the simulated/collected data is the retrieval of subsets from the tertiary storage system. We describe the architecture and implementation of a Storage Access Coordination System (STACS) designed to optimize the use of a disk cache, and thus minimize the number of files read from tape. We achieve this by using a specialized index to locate the relevant data on tapes, and by coordinating file caching over multiple queries. We focus on a specific application area, a high energy physics data management and analysis environment. STACS was implemented and is being incorporated in an operational system, scheduled to go online at the end of 1999. We also include the results of various tests that demonstrate the benefits and efficiency gained of using the STACS.

This publication has 3 references indexed in Scilit: