Thèse de doctorat
Résumé : Computational Grids are large infrastructures composed of several components such as clusters, or massively parallel machines, generally spread across a country or the world, linked together through some network such as Internet, and allowing a transparent access to any resource. Grids have become unavoidable for a large part of the scientific community requiring computational power such as high-energy physics, bioinformatics or earth observation. Large projects are emerging, often at an international level, but even if Grids are on the way of being efficient and user-friendly systems, computer scientists and engineers still have a huge amount of work to do in order to improve their efficiency. Amongst a large number of problems to solve or to improve upon, the problem of scheduling the work and balancing the load is of first importance.

This work concentrates on the way the work is dispatched on such systems, and mainly on how the first level of scheduling – generally name brokering, or meta-sheduling – is performed. We deeply analyze the behavior of popular strategies, compare their efficiency, and propose a new very efficient brokering policy providing notable performances, attested by the large number of simulations we performed and provided in the document.

The work is mainly split in two parts. After introducing the mathematical framework on which the following of the manuscript is based, we study systems where the grid brokering is done without any feed-back information, i.e. without knowing the current state of the clusters when the resource broker – the grid component receiving jobs from clients and performing the brokering – makes its decision. We show here how a computational grid behaves if the brokering is done is such a way that each cluster receives a quantity of work proportional to its computational capacity.

The second part of this work is rather independent from the first one, and consists in the presentation of a brokering strategy, based on Whittle's indices, trying to minimize as much as possible the average sojourn time of jobs. We show how efficient the proposed strategy is for computational grids, compared to the ones popular in production systems. We also show its robustness to several parameter changes, and provide several very efficient algorithms allowing to make the required computations for this index policy. We finally extend our model in several directions.