January 10th, 2019

Day 24: Discounting-aware and Importance Sampling Methods

Discounting-aware importance sampling and per-decision importance sampling are two newly proposed methods which help to reduce the variance of importance ratio learning.

 

Discounting-aware importance: In this method, we take the discounting factor as a probability of termination or, equivalently, a degree of partial termination. Using this idea one can modify the original importance sampling methods  as follows:

*Ordinary importance sampling method without taking discounting factor into consideration:

**Discounting-aware ordinary importance sampling:

*Weighted importance sampling method without taking discounting factor into consideration:

**Discounting-aware Weighted importance sampling:

Note that we define flat partial returns as follows:

This method helps to reduce the variance as we reduce the importance of tail actions.

Per-decision Importance Sampling:

This method also helps to reduce the variance by modifying the structure of the return as a sum of rewards. In this method, the importance ratio is also taken into account in return calculation. We define a per-decision return as follows:

Thus, the value evaluation will be as follows:

 

* Check Sutton's RL book for more info.