MoNETA brain submodules: SORT
MoNETA visual system (Sensory Object Recognition and Tracking, SORT, in pink) was evolved by the animat freely interacting with a virtual environment. This visual system what and where systems have been shown to be able to learn (and re-learn, in the case of changing image statistics) both ocular dominance and orientation selectivity in a single network. This multi-layered thalamocortical network consists of three main components that mimic structures in the mammalian visual system: retina (sensory organ), LGN (lateral geniculate nucleus of the thalamus; relay station), and V1 (first stage of cortical visual processing). The clustering of orientation preference (colored plot, with each color corresponding to an orientation) and ocular domimance (grayscale plot, with black and white corresponding to left and right eye, respectively) are well studied topography within V1 and other primary visual areas. Before learning, the model V1 layer did not contain clusters of orientation preference. These clusters emerge after learning, and are consistent with the types of maps found in biology.
The purpose of SORT 1.0 – a short form for Sensory Object Recognition and Tracking – is to provide the rest of MoNETA with the identity and location, in retinal coordinates, of a set of landmarks currently in the field of view of the animat. There are only eight different landmarks in the Morris water maze task. With such a small number, it would be possible to handcraft specialized detectors for each landmark and use them to filter the incoming visual input. Such a solution would not be scalable as it would only be able to deal with the restricted set of landmarks present in the water maze task, and would require considerable reworking to extend it to more general situations. For this reason, SORT instead learns features and object categories from experience.
In the first version of the water maze task, landmarks could only be distinguished based on color features (figure below).
Figure Example snapshot from the virtual environment with three visible landmarks.
The simplest approach to learn to distinguish landmarks in this case is thus to learn color vectors (say, in RGB format), for each object class. This was done in SORT 1.0 by simply exposing the animat’s visual system to snapshots of the landmarks taken from the virtual environment and adapting synaptic weights with a Hebbian learning rule with the constraint that different output neurons must necessarily encode different colors. We assume here for simplicity that color feature learning is completed before starting landmark category learning. Landmark categories are learned in SORT 1.0, based on a supervised learning procedure.
Figure Class confusion matrix. Perfect classification is denoted by a purely diagonal matrix.
Although very simplistic, the color learning scheme is sufficient to handle landmark recognition and localization in the virtual environment considered above. The figure below shows an example where two landmarks were recognized and localized in a given field of view.
Figure Localization and recognition of two landmarks. Each row at the top of the Figure denotes activity in different neural layers, where each layer codes for a particular landmark category. The position of the landmark in the field of view is further indicated by the azimuth of the peak in activity along each layer.
Needless to say, color-based object recognition will only get you so far in a complex natural world. A more useful source of information lies in the contours that make up objects. This can already be seen in our simple virtual environment by replacing color-based landmarks with simple hand-drawn shapes as in the figure below. Clearly object shapes can be distinguished here just from their contours.
Figure Simple hand-drawn landmark shapes.
To encode object contours we first filter the input image with oriented edge detectors at multiple orientations and scales (resembling the operation of simple cells in cortical area V1), as shown in the figure below.
Figure Oriented edge detectors at multiple orientations and scales.
Single cell activity is pooled together at the next processing stage by specialized “pooling” cells that are loosely analogous to V1 complex cells. The purpose of pooling across a number of cells is to provide some level of invariance to translations of a landmark’s location, as shown in many models of object recognition (e.g. Fukushima, 1980; Serre et al., 2007). What these models also show is that greater invariance can be built progressively by interleaving multiple stages of simple cells and complex cells. SORT 1.0 only uses two levels of interleaved layers (i.e. simple cells -> complex cells -> simple cells -> complex cells). The features for the second layer of simple cells is further learned by exposure to snapshots of the virtual environment, in the same way that color features were learned in the previous version. The figure below shows a sample of learned features.
Figure Example features learned at the second level of the
Although not obvious here, the feature space covered by these learned features provides a better representation of the objects in the virtual environment than did the features at the earlier level (Fig.5). Learning object categories from this new feature space therefore improves recognition performance, and yield a model that can be generalized to more realistic situations than the one that was developed in the color-only version of the virtual environment.