SFN: Normalization Model of Multisensory Integration

Normalization model of multisensory integration: Tomokazu Ohshiro. et al. Many physiological studies have investigated how multisensory neurons respond to inputs from different modalities. These neurons display some particular properties that speak to how they integrate the information from multiple modalities. However, a simple computation framework that accounts of these features have not been established.

The basic model proposed by the speak is a divisive model. The basic setup is as follows: individual layers that are modality specific receive information from specific sensory structures. Multiple single-modality layers, with matching (aligned) receptive fields, then target the same multisensory layer, with the inputs from each modality summed in a weighted fashion. Output from this computation is used to divisively normalize the output of the multisensory layer.

Ohshiro next describes the principle of inverse effectiveness, one of the particular properties of multisensory neurons which have not yet been successfully modeled, but which their divisive model supports. He shows an example from cat superior colliculus: when multisensory neurons are co-presented by visual and auditory stimuli that are at threshold (optimal) intensities, the activation is greater than the sum of the responses to each stimuli alone. However, additional multisensory drive (aka increasing the intensity of the multisensory stimuli to super-threshold, non-optimal stimuli) results in a suppression of the responses, such that they are less than the sum of responses to each stimulus alone. Put another way: bimodal responses are larger than the sum of the responses to each stimuli alone at weaker stimulus intensities, but smaller than that sum at stronger intensities. Ohshiro notes that according to their model, this response suppression becomes more robust as you add more sensory types.

Ohshiro moves on to the spatial principle of multisensory integration: another property he will account for in his computational model. Again he presents an example from the superior colliculus, showing that there is a spatial computation, such that if the different inputs are offset spatially, then the bimodal response gets weaker. This suppression of the bimodal response when multisensory inputs are not spatially aligned is a critical prediction of their normalization model, and confirms that the multisensory integration mechanisms is based on aligned spatial maps of sensory inputs.*

Shown next are recordings from Macaque MSTd neurons that again demonstrate that a bimodal stimulus produces a suppression of response when the stimuli are non-optimal (highly superthreshold), whereas weak bimodal inputs are enhanced relative to the sum of responses to the individual modalities.

In summary, Ohshiro et al propose that a multisensory version of divisive normalization can account for basic empirical principles of multisensory integration. Furthermore, they present both a computational model, and physiological data, demonstrating that divisive normalization can underlie the observation that non-optimal input, which is excitatory on its own, can produce cross modal suppression.

*This should not surprise aficionados of the superior colliculus, which contains aligned visual and auditory visual maps, as well as a multisensory integration circuit (see work by Eric Knusden).