The major disadvantage of dynamic programming, Monte Carlo, and temporal difference methods is that they all require to estimate the value function. However, such a value function grows with the number of states and actions. Thus, the solution to this problem is to use approximators for value functions.