Using the online cross-entropy method to learn relational policies for playing different games

Author: Kurt Driessens, Bernhard Pfahringer, Samuel Sarjant, Tony C. Smith
Publisher: Institute of Electrical and Electronics Engineers (IEEE)

ABOUT BOOK

By defining a video-game environment as a collection of objects, relations, actions and rewards, the relational reinforcement learning algorithm presented in this paper generates and optimises a set of concise, human-readable relational rules for achieving maximal reward. Rule learning is achieved using a combination of incremental specialisation of rules and a modified online cross-entropy method, which dynamically adjusts the rate of learning as the agent progresses. The algorithm is tested on the Ms. Pac-Man and Mario environments, with results indicating the agent learns an effective policy for acting within each environment

Powered by: