Determining the optimal file size on tertiary storage systems based on the distribution of query sizes

Abstract
In tertiary storage systems, the data is stored on multiple tape volumes where each tape is further divided into files. Since in many such systems the minimum unit of data transfer is a file, it is an important problem to match file sizes with the access patterns to the data. In general, if the file size is large relative to the query size it will lead to the transfer of large amounts of irrelevant data whereas small file sizes will incur an overhead penalty associated with reading each new file. In this work, we analyze the relationship between file sizes and query response times and provide a methodology to compute the optimal file size given information about the distribution of query sizes. Exact closed form solutions for the cost function are given for two common distributions.

This publication has 3 references indexed in Scilit: