MoNETA's goal is to process a variety of biological and non biological senses, ranging from vision, to audition, touch, smell, and other input sources such as radar range finders, infrared, etc.
MoNETA brains now include a rudimentary auditory processing, including:
- a neural layer responsive for frequency analysis of sound intensities, similar to the cochlea's response to sound;
- a neural layer implementing a filter bank, where each filter bin responds to a certain frequency range;
- a cortical layer responsible for the online learning of sounds; and
- a customized virtual environment in which MoNETA learns in an unsupervised fashion to create multimodal representations of auditory and visual objects.
Within a few milliseconds from their generation, sounds are processed by the outer, middle, and inner ears, where sound waves create vibrations along the basilar membrane of the cochlea, and converted to nerve impulses by the hair cells in the cochlea. The nerve impulses reach the auditory processing areas in the central nervous system, which recognize the sound and integrates this knowledge with information from other sensory modalities (e.g., vision). It is widely believed that the cochlea functions to separate incoming acoustic frequencies by responding to different frequencies in different spatial locations along its length, it can be said that each region on the basilar membrane acts as a filter bank, and can be modeled by the Mel-Frequency Cepstral Coefficient filter bank output.
Mel-frequency Cepstral Coefficients is a well-known feature extraction method for biologically plausible speech recognition systems. In this implementation, MoNETA combines an auditory filter-bank with a discrete cosine transform, wherein each of 13 filter bank outputs represents a range of frequencies separated linearly for central frequencies, and logarithmically at the ends.
It is this filter bank output from the basilar membrane that gets translated to nerve impulses by the hair cells of the cochlea, then relayed to the cortex for auditory processing and classification. The simulation in a simple virtual environment plays a distinct tone is when the animat character gets near a colored box in the virtual room. The animat hears this tone and classifies it using a biologically plausible learning law (post-synaptic gated learning, or instar). All this is accomplished in real-time thanks to the parallel processing capabilities of Cog. The classifier output is denoted by the black and white boxes in the lower half of the video. See below:
The next goal of MoNETA in this domain is to integrate visual and auditory category to create multimodal representation of its environment.
Slaney, Malcolm. MATLAB Auditory Toolbox for Auditory Modeling Work. Interval Research Corporation, 1998.