So far in the past RL problems that I have posted here the action space was discrete and finite. However, this can be sometimes impractical and may have lower accuracy. Gym has an environment that contains continuous action space for mountain car climb problem.
In policy gradient not only we have a neural network estimator for the value function, but also we have an approximator for the policy. Today I wrote a class for the value function.