Talks and tutorials on RL
Introduction to Reinforcement Learning with Function Approximation: A tutorial given at NIPS 2015 by Richard Sutton.
Policy Search: Methods and Applications: A tutorial given at ICML 2015 by Jan Peters and Gerhard Neumann.
Representation and Learning Methods for Complex Outputs: Talk NIPS 2014 by Richard Sutton.
Value and Q-value recursion
There are two forms the expected reward for a given state is encoded:
The v-function is the expected reward given a state whilst the q-function is for a state and action. The recursive aspect of both these two functions can be derived from first principal and it can be shown that the v-function is a function of the q-function.
See RVQ.pdf for the derivation of the recursion and the link between both functional forms.
See RL_Solutions_Chap3.pdf for the effect of sign and constants in the reward function.
Policy Gradient Theorem
We want to find an expression for which uses an estimator of the expected reward such as the action-value or advantage function.
Policy Gradient Methods for Reinforcement Learning with Function Approximation Proves that the gradient of a policy be derived when using a function approximator for either an action-value or advantage function.
The key is to able to find an unbiased estimage of the gradient