Learning to navigate without a GPS
This project is designed to connect experimental data on cellular and behavioral activities in animals during spatial tasks to theoretical functional models of these cells and further to whole-brain models installed in robotic agents performing similar behavioral tasks in the real world. Animals are extremely efficient in navigating their environments, finding valuable resources, learning their locations, and returning to them when appropriate. Recent neurophysiological recordings in behaving rats (Hafting et al, 2005; Sargolini et al, 2006), and rat brain slices (Giocomo et al, 2007) provided an array of information about the properties of grid cells in the entorhinal cortex that complemented previous studies of place cells (see Best et al, 2001 for review) and head direction cells (see Taube, 2007 for review). Analysis of this data suggested several models (Burgess et al, 2007; Hasselmo and Brandon, 2008; Gorchetchnikov and Grossberg, 2007; Mhatre et al, 2011; Fortenberry et al, 2011) that are set to explain how grid cells together with head direction cells and place cells can be involved in the path-integration mechanisms. One of the critical functions of path-integration system is to help the animal to self-localize with respect to the current internal representation of the environment while simultaneously updating this internal map of the environment. This function and the corresponding problem in robotic engineering is known as simultaneous localization and mapping (SLAM) problem.The goal of this project is to create a complete model of a biological path-integration system in order to solve the SLAM problem for noisy and inconsistent sensory input and to embed the resulting model in the Modular Neural Exploring and Traveling Agent (MoNETA) initiative.
On the right the brain areas with corresponding review or experimental papers. On the left the block diagram of the path integration system: Green arrows show existing models with corresponding citations; purple arrows show previously not modeled connections completed within the project so far; red arrows show the connections that have not been modeled so far.
Animals and humans (and robots) must constantly make decisions about their pending actions based on the current sensory milieu. How, then, do these creatures choose the best course of action from a number of different possibilities? One of the earliest and most enduring solutions to this problem in nature is the use of reinforcement learning, whereby actions leading to rewarded outcomes are repeated under similar conditions.
The current project is to create a biologically-inspired decision making system using the putative neural substrate of reinforcement learning, the basal ganglia, as a template. In addition to basic reinforcement learning and decision making functions, the system is also designed to address a more specific problem: how much time should be devoted to processing the environment before a decision should be made? When should an action become a habit, rather than a deliberate decision - and when should a habit be reconsidered? This conflict is perhaps best illustrated by an example.
Consider the situation of an automobile driver. An inexperienced driver is constantly on the lookout for signals that tell them they must slow down: stop signs, red lights, tail lights, etc. Over time, the driver begins to recognize that a red light is always an indicator that they should slow down. Thus, the response to a red signal becomes habitual and automatic, and the driver soon applies brakes unconsciously.
What about green lights? These, too, may become conditioned to provoke an automatic response - that is, to apply gas - over experience. But if the driver accelerates immediately and incautiously when the light changes, he or she may be hit by another driver running through the red light. After a collision has occurred, the driver must learn to suppress their conditioned response at least long enough to look both ways. Thus, the previously unconscious and habitual response must become deliberate once again, for at least long enough for the driver to make a delayed response habitual.