Discounting-aware importance sampling and per-decision importance sampling are two newly proposed methods which help to reduce the variance of importance ratio learning.
Discounting-aware importance: In this method, we take the discounting factor as a probability of termination or, equivalently, a degree of partial termination. Using this idea one can modify the original importance sampling methods as follows:
*Ordinary importance sampling method without taking discounting factor into consideration:
**Discounting-aware ordinary importance sampling:
*Weighted importance sampling method without taking discounting factor into consideration:
**Discounting-aware Weighted importance sampling:
Note that we define flat partial returns as follows:
This method helps to reduce the variance as we reduce the importance of tail actions.
Per-decision Importance Sampling:
This method also helps to reduce the variance by modifying the structure of the return as a sum of rewards. In this method, the importance ratio is also taken into account in return calculation. We define a per-decision return as follows:
Thus, the value evaluation will be as follows:
* Check Sutton's RL book for more info.