Discovering Statistical Models of Availability in Large Distributed Systems: An Empirical Study of SETI@home
- 28 January 2011
- journal article
- research article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Parallel and Distributed Systems
- Vol. 22 (11), 1896-1903
- https://doi.org/10.1109/tpds.2011.50
Abstract
In the age of cloud, Grid, P2P, and volunteer distributed computing, large-scale systems with tens of thousands of unreliable hosts are increasingly common. Invariably, these systems are composed of heterogeneous hosts whose individual availability often exhibit different statistical properties (for example stationary versus nonstationary behavior) and fit different models (for example exponential, Weibull, or Pareto probability distributions). In this paper, we describe an effective method for discovering subsets of hosts whose availability have similar statistical properties and can be modeled with similar probability distributions. We apply this method with about 230,000 host availability traces obtained from a real Internet-distributed system, namely SETI@home. We find that about 21 percent of hosts exhibit availability, that is, a truly random process, and that these hosts can often be modeled accurately with a few distinct distributions from different families. We show that our models are useful and accurate in the context of a scheduling problem that deals with resource brokering. We believe that these methods and models are critical for the design of stochastic scheduling algorithms across large systems where host availability is uncertain.Keywords
This publication has 16 references indexed in Scilit:
- Optimal routing in parallel, non-observable queues and the price of anarchy revisitedPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2010
- The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed SystemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2010
- EDGeS: Bridging EGEE to BOINC and XtremWebJournal of Grid Computing, 2009
- Mining for statistical models of availability in large-scale distributed systems: An empirical study of SETI@homePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2009
- Modeling Job Lifespan Delays in Volunteer Computing ProjectsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2009
- On correlated availability in Internet-distributed systemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2008
- The Grid Workloads ArchiveFuture Generation Computer Systems, 2008
- The performance of bags-of-tasks in large-scale distributed systemsPublished by Association for Computing Machinery (ACM) ,2008
- Collecting unused processing capacity: an analysis of transient distributed systemsIEEE Transactions on Parallel and Distributed Systems, 1993
- Individual versus Social Optimization in the Allocation of Customers to Alternative ServersManagement Science, 1983