Monte Carlo Tree Search with Advice

Chakraborty, Debraj

Citer

Monte Carlo Tree Search with Advice

par Chakraborty, Debraj

Président du jury Filiot, Emmanuel

Promoteur Raskin, Jean-François

Publication Non publié, 2022-12-20

Thèse de doctorat

Résumé :

We study how to efficiently combine techniques from formal methods and learning for online computation of a strategy that aims at optimizing the expected long-term reward in large systems modelled as Markov decision processes (MDPs). This strategy is computed with a receding horizon and using Monte Carlo tree search (MCTS). MCTS algorithm is augmented with the notion of advice which guides the search in the relevant part of the tree using exact methods. We show that the classical theoretical guarantees of the Monte Carlo tree search are still maintained after this augmentation. To lower the latency of MCTS algorithms with advice, we propose to replace advice coming from exact algorithms with an artificial neural network trained using an expert imitation framework. To demonstrate the practical interest of our techniques, we implement them on different systems modelled as MDPs: in the game of Pac-Man and Frozen Lake and also for safe and optimal scheduling of jobs in a task system.

Référencement	Visibilité	Pérennité	Facilité
Les publications encodées constituent la bibliographie académique de l'Université.	Les documents déposés sont indexés par les moteurs de recherche (Google Scholar,…).	Les documents déposés en open-access sont archivés au sein du réseau de préservation SAFE-PLN (www.safepln.org).	Les listes de publications sont compatibles avec le CV-ULB, le FNRS et accessibles sur le web.

Monte Carlo Tree Search with Advice

Documents en relation

DI-fusion