par Busatto-Gaston, Damien ;Chakraborty, Debraj ;Raskin, Jean-François
Référence 31st International Conference on Concurrency Theory (CONCUR 2020), Vol. 171, page (40:1-40:24)
Publication Publié, 2020-08-31
Référence 31st International Conference on Concurrency Theory (CONCUR 2020), Vol. 171, page (40:1-40:24)
Publication Publié, 2020-08-31
Publication dans des actes
Résumé : | n this paper, we consider the online computation of a strategy that aims at optimizing the expectedaverage reward in a Markov decision process. The strategy is computed with a receding horizonand using Monte Carlo tree search (MCTS). We augment the MCTS algorithm with the notion ofsymbolic advice, and show that its classical theoretical guarantees are maintained. Symbolic adviceare used to bias the selection and simulation strategies of MCTS. We describe how to use QBF andSAT solvers to implement symbolic advice in an efficient way. We illustrate our new algorithm usingthe popular gamePac-Manand show that the performances of our algorithm exceed those of plainMCTS as well as the performances of human players. |