Fakultät für Informatik und Mathematik


Paper Description

BibTeX entry

author={J. Dörre, S. Apel, C. Lengauer},
title={{Modeling and Optimizing MapReduce Programs}},
institution={{Fakult{\"a}t f{\"u}r Informatik und Mathematik, Universit{\"a}t Passau}},


MapReduce frameworks allow programmers to write distributed, dataparallel programs that operate on multisets. These frameworks offer considerable flexibility to support various kinds of programs and data. To understand the essence of the programming model better and to provide a rigorous foundation for optimizations, we present an abstract, functional model of MapReduce along with a number of customization options. We demonstrate that the MapReduce programming model can also represent programs that operate on lists, which differ from multisets in that the order of elements matters. Along with the functional model, we offer a cost model that allows programmers to estimate and compare the performance of MapReduce programs. Based on the cost model, we introduce two transformation rules aiming at performance optimization of MapReduce programs, which also demonstrates the usefulness of our model. In an exploratory study, we assess the impact of applying these rules to two applications. The functional model and the cost model provide insights at a proper level of abstraction into why the optimization works.

Paper itself