Analysis of file I/O traces in commercial computing environments

Abstract
Improving the performance of the file system is becoming increasingly important to alleviate the effect of I/O bottlenecks in computer systems. To design changes to an existing file system or to architect a new file system it is important to understand current usage patterns. In this paper we analyze file I/O traces of several existing production computer sytems to understand file access behavior. Our analysis suggests that a relatively small percentage of the files are active. The amount of total data active is also quite small for interactive environments. An average file encounters a relatively small number of file opens while receiving an order of magnitude larger number of reads to it. An average process opens quite a large number of files over a typical prime time period. What is more significant is that the effect of outliers on many of the characteristics we studied is dominant. A relatively small number of processes dominate the activity, and a very small number of files receive most of these operations. In addition, we provide a comprehensive analysis of the dynamic sharing of files in each of these enviroments, addressing both the simultaneous and sequential sharing aspects, and the activity to these shared files. We observe that although only a third of the active files are sequentially shared, they receive a very large proportion of the total operations. We analyze the traces from a given environment across different lengths of time, such as one hour, three hour and whole work-day intervals and do this for 3 different environments. This gives us an idea of the shortest length of the trace needed to have confidence in the estimation of the parameters.

This publication has 16 references indexed in Scilit: