par García-Díez, Silvia;Vandenbussche, Eric ;Saerens, Marco
Référence Proceedings of the IEEE Conference on Decision & Control, including the Symposium on Adaptive Processes, page (6570-6577), 6160770
Publication Publié, 2011
Article révisé par les pairs
Résumé : This work investigates the continuous-state counterpart of the discrete randomized shortest-path framework (RSP, [23]) on a graph. Given a weighted directed graph G, the RSP considers the policy that minimizes the expected cost (exploitation) to reach a destination node from a source node, while maintaining a constant relative entropy spread in the graph (exploration). This results in a Boltzmann probability distribution on the (usually infinite) set of paths connecting the source node and the destination node, depending on an inverse temperature parameter θ. This framework defines a biased random walk on the graph that gradually favors low-cost paths as θ increases. It is shown that the continuous-state counterpart requires the solution of two partial differential equations - providing forward and backward variables - from which all the quantities of interest can be computed. For instance, the best local move is obtained by taking the gradient of the logarithm of one of these solutions, namely the backward variable. These partial differential equations are the socalled steady-state Bloch equations to which the Feynman-Kac formula provides a path integral solution. The RSP framework is therefore a discrete-state equivalent of the continuous Feynman-Kac diffusion process involving the Wiener measure. Finally, it is shown that the continuous-time continuous-state optimal randomized policy is obtained by solving a diffusion equation with an external drift provided by the gradient of the logarithm of the backward variable, playing the role of a potential. © 2011 IEEE.