par  Busatto-Gaston, Damien  ;Chakraborty, Debraj
;Chakraborty, Debraj  ;Guha, Shibashis
;Guha, Shibashis  ;Perez, Guillermo A.
;Perez, Guillermo A.  ;Raskin, Jean-François
;Raskin, Jean-François 
Référence International Conference on Quantitative Evaluation of Systems
Publication Publié, s.d.
           ;Chakraborty, Debraj
;Chakraborty, Debraj  ;Guha, Shibashis
;Guha, Shibashis  ;Perez, Guillermo A.
;Perez, Guillermo A.  ;Raskin, Jean-François
;Raskin, Jean-François 
Référence International Conference on Quantitative Evaluation of Systems
Publication Publié, s.d.
                                                                                                       
			Publication dans des actes
                                                  
        | Résumé : | In this paper, we investigate the combination of synthesis, model-based learning, and online sampling techniques to obtain safe and near-optimal schedulers for a preemptible task scheduling problem. Our algorithms can handle Markov decision processes (MDPs) that have 10 20 states and beyond which cannot be handled with state-of-the art probabilistic model-checkers. We provide probably approximately correct (PAC) guarantees for learning the model. Additionally, we extend Monte-Carlo tree search with advice, computed using safety games or obtained using the earliest-deadline-first scheduler, to safely explore the learned model online. Finally, we implemented and compared our algorithms empirically against shielded deep Q-learning on large task systems. | 



