So far in the past RL problems that I have posted here the action space was discrete and finite. However, this can be sometimes impractical and may have lower accuracy. Gym has an environment that contains continuous action space for mountain car climb problem. 

Today I finished the code for the policy gradient learning using Tensorflow. 

In policy gradient not only we have a neural network estimator for the value function, but also we have an approximator for the policy. Today I wrote a class for the value function.

Today I wrote the block of code for Tensorflow training session.

Today I started to write a code for policy gradient learning using Tensorflow.