Technical •January 15, 2023• 2 Min. Read

New Approaches in Reinforcement Learning

Enriching Reinforcement Learning through curiosity modules.

Reinforcement learning is currently one of the hottest research areas in the field of artificial intelligence. The core concept of this approach involves independent learning by trial and error tactic. Steps are taken with the intent of maximizing returns through the learned actions. This is identical to the way humans learn new skills such as learning a new language, learning to play sports where we falter in the initial stages but gradually learn through self-analysis of what worked and what did not to improve.

The most interesting aspect of reinforcement learning is the “reward and punishment” approach where a positive reward is received in case of correct action and a negative reward (penalty) if the action taken was incorrect. This aspect of learning is essentially similar to human behavioral traits.

Reinforcement learning achieves results through appropriate models where the intent is to amplify the collective rewards of the learning agent. Whereas, unsupervised learning detects similarities and gaps between data points to establish patterns and make predictions.

Despite the striking similarities with the human learning approach, relation learning does have limitations, especially in environments where feedback about an action taken is in short supply and infrequent. Such scenarios are common in the real world, for example trying to learn to select your favorite shampoo brand in a bustling supermarket. Despite the extensive search, the brand seems unreachable. There is no real “feedback” about the direction you are heading to.

Curiosity as an input for Reinforcement Learning

In a recent research named “Episodic Curiosity through Reachability” collaborated by google, DeepMind, and ETH Zurich, a new approach of giving short term memory-based inputs of forming relational learning rewards was explored. This method enhances scope of rewards for RL by including explore options to solve the original task. Remarks are added to the memory with respect to known data and unknown data. Discovery of unknown data leads to more rewards compared to the known ones.

Curiosity approaches

While reinforcement learning algorithms depend on formulated external environment
rewards. However, designing rewards is not a scalable approach in many real-life scenarios, which gave rise to the exploration of curiosity approaches where the motivating factor for
reward functions can arise internally to the agent.

Intrinsic Curiosity Module

This module imparts positive rewards when the agent does not make an accurate prediction, thereby forming a predictive model based on curiosity. Discovering unknown factors adds as an element of surprise which strives to maximize results.

Dynamics-based Curiosity-driven Learning

In this approach, the high prediction error is given preference along with lesser time spent or complex environment. This dynamic-based curiosity gives better results compared to previous curiosity-based approaches. However, there can be possibilities of behaviors similar to procrastination.

Episodic Curiosity

This approach enables agents to create rewards for themselves, thus making rewards more achievable. This method uses episodic memory to maximize rewards. To define the reward, the current value is compared with the existing values in memory. Essentially, the comparison is performed based on how many environment steps it takes to reach the current value. This approach prevents scenarios involving fast and predictable outcomes.

The current interest and results in the curiosity models can lead to smarter approaches for learned behaviors in terms of Reinforcement Learning.
References

https://arxiv.org/abs/1810.02274
https://pathak22.github.io/noreward-rl/
https://pathak22.github.io/large-scale-curiosity/resources/largeScaleCuriosity2018.pdf
https://arxiv.org/abs/1810.02274
https://ai.googleblog.com/2018/10/curiosity-and-procrastination-in.html