Showing 46 to 50 of 72 posts.

Today, I wrote a code that takes into account the noisy environment behavior. 

Today I coded a Monte Carlo method for policy evaluation.

Discounting-aware importance sampling and per-decision importance sampling are two newly proposed methods which help to reduce the variance of importance ratio learning.

Unlike on-policy Monte Carlo methods in which we estimate the policy while using it for controlling, in the off-policy method these two steps are done separately.

In this blog, we will discuss how Monte Carlo off-policy algorithms can be implemented incrementally.  This means how can we implement off-policy algorithms episode-by-episode without keeping track of returns for all of the episodes.