H-word: Supporting job scheduling in Hadoop with workload-driven data redistribution

Jovanovic, Petar; Romero, Oscar; Calders, Toon; Abelló, Alberto

doi:doi/10.1007/978-3-319-44039-2_21

Citer

H-word: Supporting job scheduling in Hadoop with workload-driven data redistribution

par Jovanovic, Petar

;Romero, Oscar ;Calders, Toon

;Abelló, Alberto
Référence Lecture notes in computer science, 9809, page (306-320)
Publication Publié, 2016

Article révisé par les pairs

Résumé :

Today’s distributed data processing systems typically follow a query shipping approach and exploit data locality for reducing network traffic. In such systems the distribution of data over the cluster resources plays a significant role, and when skewed, it can harm the performance of executing applications. In this paper, we address the challenges of automatically adapting the distribution of data in a cluster to the workload imposed by the input applications. We propose a generic algorithm, named H-WorD, which, based on the estimated workload over resources, suggests alternative execution scenarios of tasks, and hence identifies required transfers of input data a priori, for timely bringing data close to the execution. We exemplify our algorithm in the context of MapReduce jobs in a Hadoop ecosystem. Finally, we evaluate our approach and demonstrate the performance gains of automatic data redistribution.

Référencement	Visibilité	Pérennité	Facilité
Les publications encodées constituent la bibliographie académique de l'Université.	Les documents déposés sont indexés par les moteurs de recherche (Google Scholar,…).	Les documents déposés en open-access sont archivés au sein du réseau de préservation SAFE-PLN (www.safepln.org).	Les listes de publications sont compatibles avec le CV-ULB, le FNRS et accessibles sur le web.

H-word: Supporting job scheduling in Hadoop with workload-driven data redistribution

Documents en relation

DI-fusion