MapReduce for Data Intensive Scientific Analyses
Top Cited Papers
- 1 December 2008
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 19, 277-284
- https://doi.org/10.1109/escience.2008.59
Abstract
Most scientific data analyses comprise analyzing voluminous data collected from various instruments. Efficient parallel/concurrent algorithms and frameworks are the key to meeting the scalability and performance requirements entailed in such scientific data analyses. The recently introduced MapReduce technique has gained a lot of attention from the scientific community for its applicability in large parallel data analyses. Although there are many evaluations of the MapReduce technique using large textual data collections, there have been only a few evaluations for scientific data analyses. The goals of this paper are twofold. First, we present our experience in applying the MapReduce technique for two scientific data analyses: (i) high energy physics data analyses; (ii) K-means clustering. Second, we present CGL-MapReduce, a streaming-based MapReduce implementation and compare its performance with Hadoop.Keywords
This publication has 10 references indexed in Scilit:
- MapReduceCommunications of the ACM, 2008
- Fault-Tolerant Reliable Delivery of Messages in Distributed Publish/Subscribe SystemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- DryadPublished by Association for Computing Machinery (ACM) ,2007
- Evaluating MapReduce for Multi-core and Multiprocessor SystemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- Interpreting the Data: Parallel Analysis with SawzallScientific Programming, 2005
- The Google file systemPublished by Association for Computing Machinery (ACM) ,2003
- The SPMD Model: Past, Present and FutureLecture Notes in Computer Science, 2001
- Nimrod/G: an architecture for a resource management and scheduling system in a global computational gridPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2000
- Parallelism in LispACM SIGPLAN Lisp Pointers, 1995
- A deterministic annealing approach to clusteringPattern Recognition Letters, 1990