Publicatie

Finite-memory near-optimal learning for Markov decision processes with long-run average reward

Boekbijdrage - Boekabstract Conferentiebijdrage

We consider learning policies online in Markov decision processes with the long-run average reward (a.k.a. mean payoff). To ensure implementability of the policies, we focus on policies with finite memory. Firstly, we show that near optimality can be achieved almost surely, using an unintuitive gadget we call forgetfulness. Secondly, we extend the approach to a setting with partial knowledge of the system topology, introducing two optimality measures and providing near-optimal algorithms also for these cases.

Pagina's: 1149 - 1158

Jaar van publicatie:2020

Trefwoorden:P1 Proceeding

Handle: https://hdl.handle.net/10067/1715830151162165141

Toegankelijkheid:Open

Publicatie

Finite-memory near-optimal learning for Markov decision processes with long-run average reward

Boekbijdrage - Boekabstract Conferentiebijdrage

Auteurs/uitgever

Onderzoekseenheden

Evenementen

Projecten