Talks and tutorials on RL

Introduction to Reinforcement Learning with Function Approximation: A tutorial given at NIPS 2015 by Richard Sutton.

Policy Search: Methods and Applications: A tutorial given at ICML 2015 by Jan Peters and Gerhard Neumann.

Representation and Learning Methods for Complex Outputs: Talk NIPS 2014 by Richard Sutton.
Value and Qvalue recursion
There are two forms the expected reward for a given state is encoded:
 vfunction: \(V^{\pi}(s) = \mathbb{E}_{\pi} \left\{ \sum\limits_{k=0}^{\infty} \gamma^k r_{t+k+1} \lvert s_t = s \right\}\)
 qfunction: \(Q^{\pi}(s,a) = \mathbb{E}_{\pi} \left\{ \sum\limits_{k=0}^{\infty} \gamma^k r_{t+k+1} \lvert s_t = s, a_t = a \right\}\)
The vfunction is the expected reward given a state whilst the qfunction is for a state and action. The recursive aspect of both these two functions can be derived from first principal and it can be shown that the vfunction is a function of the qfunction.
See RVQ.pdf for the derivation of the recursion and the link between both functional forms.
See RL_Solutions_Chap3.pdf for the effect of sign and constants in the reward function.
Policy Gradient Theorem
\[a = \pi(s;\theta)\]We want to find an expression for \(\Delta\theta\) which uses an estimator of the expected reward such as the actionvalue or advantage function.
Policy Gradient Methods for Reinforcement Learning with Function Approximation Proves that the gradient of a policy be derived when using a function approximator for either an actionvalue or advantage function.
The key is to able to find an unbiased estimage of the gradient \(\Delta\theta\)