Today, I wrote a code that takes into account the noisy environment behavior.
The main difference of this code with a noiseless Monte Carlo is that in a noisy environment the chosen policy can deviate because of noise. Thus, while moving from one step to another, we need to define a probability of choosing an action based on policy and choosing other actions with the remainder of the probability.
Check the following block of code for more details:
We can see that the agent plays action a with a probability of 0.5 and other actions with probability of 0.5.
I coded along with Lazy Programmer: https://www.udemy.com/artificial-intelligence-reinforcement-learning-in-python
Find my codes here: https://github.com/AidinFerdowsi/Monte-Carlo