Instead of choosing a greedy action (as in Q-learning) or applying an epsilon-greedy approach (as in Sarsa) one better way is to update the Q function using an expectation of the actions at a state. This approach is called the expected Sarsa method.
Aidin Ferdowsi received his BS in Electrical Engineering from the University of Tehran, Iran and MS in Electrical Engineering from Virginia Tech. He is currently a PhD student at the Bradley department of Electrical and Computer Engineering at Virginia Tech. He is also a [email protected] Fellow. His research interests include machine learning, cyber-physical systems, smart cities, security, and game theory.