Today I coded a Monte Carlo method for policy evaluation.
The basic idea for MC policy evaluation is to keep track of the states and rewards and find the mean over all of the episodes. In addition, we should remember to start every episode from a random state to ensure convergence to the policy value.
I coded along with Lazy Programmer's course on RL: https://www.udemy.com/artificial-intelligence-reinforcement-learning-in-python
Find my codes here: https://github.com/AidinFerdowsi/Monte-Carlo