par Caironi, Pierguido V. C.;Dorigo, Marco
Référence International journal of intelligent systems, 12, 10, page (695-724)
Publication Publié, 1997-10
Article révisé par les pairs
Résumé : Q-learning can greatly improve its convergence speed if helped by immediate reinforcements provided by a trainer able to judge the usefulness of actions as stage setting with respect to the goal of the agent. This article experimentally investigates this hypothesis studying the integration of immediate reinforcements (also called training reinforcements) with standard delayed reinforcements (namely, reinforcements assigned only when the agent–environment relationship reaches a peculiar state, such as when the agent reaches a target). The article proposes two new algorithms (TL and MTL) able to exploit even locally wrong and misleading training reinforcements. The proposed algorithms are tested against Q-learning and other algorithms (AB–LEC and BB–LEC) described in the literature [S. D. Whitehead, TR-365, University of Rochester, NY, 1991], which also make use of training reinforcements. Experiments are run in a grid world where a Q-agent, a simple simulated robot, must learn to reach a target. © 1997 John Wiley & Sons, Inc.