Deep RL Resources

If your environment is deterministic and you know what will happen after you take an action, you can most likely solve it using search.

If your environment is stochastic but you know the transition probabilities and the state space isn't too large, you can use a dynamic programming approach such as tabular value iteration.

If you know the transition probabilities but the state space is too large, you can use approximate value iteration.

If you don't know the transition probabilities but the state space isn't too large you can use tabular Q-learning.

If you don't know the transition probabilities and the state space is large you can use deep Q-learning (DQN).

If DQN doesn't work because it is a very difficult environment or you have continuous actions, you can use deep policy gradients (A3C, PPO).

In most of your projects, you don't know the transition probabilities and the state space is very large, so you will be using deep reinforcement learning (DQN/A3C/PPO). The resources listed below are all for deep reinforcement learning.

Here is a great repository of worked examples and should probably be the first place you start:

https://github.com/dennybritz/reinforcement-learning Links to an external site.

Here is an easy-to-follow 8-part blog series on deep RL

https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0 Links to an external site.

This class at Berkeley is how I learned deep RL and I highly recommend it

http://rail.eecs.berkeley.edu/deeprlcourse/ Links to an external site.

If you want to try out policy gradients such as PPO or A3C this is the first place to start

https://spinningup.openai.com/en/latest/ Links to an external site.

As many of you might already know, deep RL is hard to debug, so start small and try to get the most basic thing working first.