Blog

Showing 36 to 40 of 72 posts.

Today, I wrote two programs that find the optimal policy for a grid world problem using a Q-learning and a double learning approach.

January 25th, 2019

Day 35: Sarsa Learning Code

Today I wrote a program that finds the optimal policy for a grid world problem using a Sarsa learning method.

Today, I wrote a code which evaluates a policy using a temporal-difference (TD) approach.

Instead of choosing a greedy action (as in Q-learning) or applying an epsilon-greedy approach (as in Sarsa) one better way is to update the Q function using an expectation of the actions at a state. This approach is called the expected Sarsa method.

Q-learning, one of the most famous reinforcement learning algorithms, is an off-policy approach that updates the q function whilst maximizing over the available policies at a state.