Discounting-aware importance sampling and per-decision importance sampling are two newly proposed methods which help to reduce the variance of importance ratio learning.

Discounting-aware importance: In this method, we take the discounting factor as a probability of termination or, equivalently, a degree of partial termination. Using this idea one can modify the original importance sampling methods as follows:

***Ordinary** importance sampling method without taking discounting factor into consideration:

****Discounting-aware ordinary** importance sampling:

***Weighted** importance sampling method without taking discounting factor into consideration:

****Discounting-aware ****Weighted **importance sampling:

Note that we define flat partial returns as follows:

This method helps to reduce the variance as we reduce the importance of tail actions.

Per-decision Importance Sampling:

This method also helps to reduce the variance by modifying the structure of the return as a sum of rewards. In this method, the importance ratio is also taken into account in return calculation. We define a per-decision return as follows:

Thus, the value evaluation will be as follows:

* Check Sutton's RL book for more info.