Today I studied the theory behind the policy gradient methods.

Today I finished the code for TD-lambda learning that solves the mountain climb problem in the Gym environment.

Today I continued the code for TD-lambda algorithm by writing a class that calculates eligibility traces of the state action pairs.

Today I started writing a program for TD-lambda method to solve the mountain-climb problem in the gym environment.

Today I finished the code for nStep RBF Q-Qlearning for mountain climb problem code.