Complementary Learning Systems within the Hippocampus: Reconciling Episodic Memory with Statistical Learning

As humans, we have a natural ability to remember the specifics of individual experiences (e.g. where I parked my car today) and rapidly learn rules across those experiences (e.g. where in the parking lot spaces tend to be open). The Complementary Learning Systems (CLS)  [1] theory offers a computational framework for how we are able to accomplish, both seemingly disparate tasks, by positing that the brain uses different brain circuits to solve these two different problems, specifically the cortex and the hippocampus. The hippocampus uses sparse connectivity enabled for the storage of distinct activity patterns even with very similar stimuli, in order to prevent interference, and then slowly teaches these experiences to cortex. For instance, say you have two events that happened with the same people in the same exact room in your house, but individual details like what they were wearing and what they said differed. Then sparse representations would be one way for treating these similar memories as being distinct! The cortex, on the other hand, utilizes very similar patterns of brain activity between stimuli to learn similarities between them (over days and months).


                 Figure 1, Schapiro et al. [3]

However, this distinction between the complementary roles of cortex and hippocampus does not appear to be so clear-cut. There is experimental evidence that humans can learn regularities rapidly (over minutes or hours), and there have been several empirical demonstrations that the hippocampus is involved in rapid statistical learning [2]. Schapiro et al. [3] ask how the connective heterogeneity of different pathways in the hippocampus can utilize non-overlapping representations and support rapid statistical learning. To approach this problem, they build a model consisting of two main pathways (Figure 1): the trisynaptic pathway (TSP), ECin --> DG --> CA3 --> CA1, and the monosynaptic pathway (MSP), EC <--> CA1. The representations in TSP are sparse (a small subset of neurons are connected between layers), which results in few active neurons at any time. This allows the different brain sub-regions in this pathway to avoid interference by forming separated representations of brain activity patterns, since the network is able to represent different patterns with different sets of active neurons, since there only a few active neurons at any given time. The projections within the MSP, on the other hand, are not as sparse, so there are more neurons that may be active at any time in CA1. We can view the MSP pathway as translating between sparse representations in the TSP and overlapping representations in EC, in order to help the hippocampus communicate with cortex.

In other words, there is a lot of brain activity in EC, with lots of neurons being active. Then this information is passed to the next brain region in the pathway, DG, the entry point to the hippocampus from the cortex, where it gets really sparse (few neurons) being active. This results in some information loss – namely the information about the rules/regularities binding together similar experiences. But then, when the information is at the final step in the hippocampus, in CA1, after going through this 3-step pathway, the information is represented with lots of activity. How did this information going into hippocampus result in lots of activity and information in CA1? Well, in addition to the 3-step pathway, the TSP, there’s the more direct route, the 1-step MSP, where lots of brain activity in the EC sub-region that feeds into the hippocampus, is transferred directly to the final hippocampus way-station, called CA1, resulting in much less information loss.

Using this model, Schapiro et al. [3] want to understand how different pathways in the hippocampus might support rapid statistical learning. They choose three learning paradigms for the simulation that require extracting regularities on the timescale of minutes to hours, but for the sake of demonstration we will focus on one of them: pair structure. In the pair structure task, there is a sequence of letters (A-H) grouped into four pairs (AB, CD, EF, GH). Thus, B tends to follow A, D tends to follow C, etc. There are also transitions between pairs (namely BC is a transition between AB and CD), but they occur less often. The model then has to extract that B most often occurs after A, D most often occurs after C, etc. Thus, after witnessing AB and then CD as discrete events, the model must learn to group the second letter with the pair coming later in the alphabet. In all of their tasks, the authors observed that the model could learn the appropriate regularities. This was assessed by examining the probability that model would predict the second pair when presented with the first pair, as well as examining the internal representations and measuring their correlations across pairs to visually identify these groupings. By ablating certain pathways in the model, they noticed that it was unable to learn these regularities, and as a result, they were able to gain insight into the essential components of the model that supported statistical learning. In particular, they found that DG and CA3, forming the TSP to area CA1, represented distinct experiences but failed to learn regularities across experiences. By contrast, the direct MSP pathway to CA1 learned regularities across experiences. The MSP can function even when the TSP is fully lesioned, which might provide an explanation for why infants, who have undeveloped TSPs, are statistical learners.

The Complementary Learning Systems (CLS) [1] theory posits that the hippocampus is responsible for fast storage of memory traces, which are then slowly consolidated into cortex, and the cortex, in turn, is largely responsible for extracting regularities across these traces. Schapiro et al. [3] set out to understand how the hippocampus is able to rapidly extract regularities across short timescales, in response to experimental evidence [2] that the hippocampus is necessary for these tasks. They found that the sparse connectivity in the TSP pathway enabled for the storage of non-overlapping traces, and the dense connectivity and recurrence in MSP enabled rapid statistical learning of regularities in structured data. It would be very interesting to see if their model can scale to tasks that involve raw visual input (rather than simple binary encodings of letter pairs), as well as connect to experimental data that involves a CA1 ablation to see how much of a deficit in statistical learning is observed. If it could, that would provide a normative explanation of the role of complementary systems within hippocampus for the storage of experiences and statistical learning across those experiences.


1.     McClelland, J. L., McNaughton, B. L., & O'Reilly, R. C. (1995). Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory. Psychological Review, 102(3), 419-457.

2.     Schapiro AC, Gregory E, Landau B, McCloskey M, Turk-Browne NB (2014). The necessity of the medial temporal lobe for statistical learning. J. Cogn. Neurosci. 26, 1736-1747.

3.     Schapiro AC, Turk-Browne NB, Botvinick MM, Norman KA (2017). Complementary learning systems within the hippocampus: a neural network modelling approach to reconciling episodic memory with statistical learning. Philos Trans R Soc Lond B Biol Sci. 2017 Jan 5;372(1711).