Today I started implementing an n-step Q-learning algorithm to find the optimal solution of the mountain-climb problem. I also use a one-layer neural network approximator for Q function estimation.
Today I wrote two classes as part of the solution for mountain climb problem. One class can evaluate a state, update its value and apply the epsilon-greedy method. The other class is to simulate a random episode and apply Q-learning on it.