Thanks for the great lecture! One question: Why is the weighted importance sampling ratio asymptotically correct? If there is an action that is less likely to be taken under pi than b, then the importance sampling ratio rho will always be less than one. Hence it should always underestimate, right? But everyone tells that it will asymptotically approach the correct value v under pi.
@arminneashrafi2846
2 жыл бұрын
Why do we calculate the probability of the trajectory, where as in the expected value we have the return, is the probability of a return value the same as probability of its trajectory? can't there be different trajectories with the same return?
Пікірлер: 7