Reinforcement learning is a class of machine learning methods that use the reward signal of the environment to infer optimal actions. Thus, easy environments for reinforcement learning agents are those with dense reward signals. In these environments, most algorithms included in TF-Agents already achieve good results. However, in sparse-reward environments such as Montezuma’s Revenge from Atari 2600, such algorithms fail to discover optimal behavior as it rarely finds any nonzero reward. To mitigate this problem, multiple methods have been proposed to add intrinsic rewards or “curiosity” to existing algorithms to extend the capability of these algorithms.
This project will implement three curiosity modules to TF-Agents: pseudocount-based exploration via Context Tree Switching (CTS), Intrinsic Curiosity Module (ICM), and Random Network Distillation (RND). These modules will be implemented as modules so that they can be enabled or disabled in one line for existing reinforcement learning algorithms in TF-Agents. With the addition of these modules, the users will be able to gain meaningful results without careful reward shaping on sparse-reward environments.