Today, I wrote a code that takes into account the noisy environment behavior.

The main difference of this code with a noiseless Monte Carlo is that in a noisy environment the chosen policy can deviate because of noise. Thus, while moving from one step to another, we need to define a probability of choosing an action based on policy and choosing other actions with the remainder of the probability.

Check the following block of code for more details:

We can see that the agent plays action a with a probability of 0.5 and other actions with probability of 0.5.

I coded along with Lazy Programmer: https://www.udemy.com/artificial-intelligence-reinforcement-learning-in-python

Find my codes here: https://github.com/AidinFerdowsi/Monte-Carlo