Brain-inspired memristive neural networks for unsupervised learning

Memristive devices, such as resistive switching memory (RRAM) and phase change memory (PCM), show variable resistance which can mimic the synaptic plasticity in the human brain. This fascinating analogy has provided the inspiration for many recent research advances, involving memristive devices and their use as artiﬁcial electronics synapses in neuromorphic circuits with learning capability. In particular, RRAM-based artiﬁcial synapses are extremely promising in terms of area efﬁciency, low power consumption, and ﬂexibility of design which pave the way for spiking neural networks that perform and behave like the human brain. This chapter will review the state of the art about the design and development of mem-ristive neural networks for unsupervised learning. First, the optimization of RRAM devices for synaptic applications will be discussed, and a novel RRAM device with improved resistance window and controllability of resistance will be introduced. Then, a hybrid CMOS/memristive synaptic circuit will be shown to carry out learning tasks via the spike-timing dependent plasticity (STDP), which is one of the learning rules in biological synapses. Finally, the neural networks based on RRAM synapses will be reviewed, covering both feed-forward networks and recurrent networks. In both cases, the network displays unsupervised learning of input patterns, which can be stored, recognized, or even reconstructed by the network, thus high-lighting the wealth of potential promising applications for memristive networks with synaptic plasticity.


Introduction
Resistive switching memory (RRAM) is a 2-terminal element which can change its resistance R, or conductance G = 1/R, via the application of a voltage signal [1,2,3,4,5].Resistive switching effects in metal oxides were originally discovered in the 1960s [6,7,8], then later studied for potential application in nonvolatile memory devices [9,10,11,12].Today, the research on RRAM for electronic storage has been mostly transferred to industrial development of storage-class memory [13] and embedded memory for Internet of Things (IoT) [14].On the other hand, RRAM devices have stimulated an increasing interest for the development of artificial synapses in neural networks.In fact, a RRAM device shows controllable conductance change in both binary (digital) and multilevel (analog) mode, thus being possibly implemented as a plastic neuromorphic synapse similar to biological synapses in the human brain.In this frame, RRAM can be viewed as a memristive device, i.e., a variable resistive element which can change its conductance in response to the applied voltage [15,16].Engineering RRAM synapses with tunable weight has become a grand challenge toward the development of neuromorphic circuits capable of learning via synaptic plasticity.The current focus of the research on memristive synapses includes 2 types of neural networks.First, deep neural networks (DNNs) with multi-layer perceptron (MLP) structure show promising properties of inference after supervised training [17].Typically, DNNs rely on non-spiking neurons and weight update based on supervised learning algorithms such as the backpropagation scheme [18].In backpropagation, the MLP output is compared with the ideal solution, which is carried by labels, and the resulting difference, i.e., the error, is back-propagated to proportionally update all synapses in the network, until the recognition accuracy is improved above a certain threshold.DNNs have been demonstrated for supervised learning with various types of memristive synapses including PCM [19] and RRAM [20,21].In a DNN, memristive devices offer the possibility to store a multilevel weight in a nanoscale element, thus allowing to reduce the circuit area of the synaptic array.Most importantly, the matrix-vector multiplication (MVM) is carried out physically within the memristor array thanks to the Kirchhoff's law and the Ohm's law, instead of relying on extensive multiply-accumulate (MAC) operations [22].The acceleration of MVM is however contrasted by inevitable approximation related to the imperfect programming of memristive devices and the time-dependent fluctuations arising after programming, e.g., related to drift in PCM [23] and noise in RRAM [24].Finally, the non-linearity of the resistance change process in RRAM and PCM is an additional challenge that severely degrades the learning accuracy of DNNs [25,26,27].On the other hand, the spiking neural network (SNN) is viewed as a suitable solution for event-driven processing, similar to the human brain, thus potentially resulting in higher energy efficiency, larger density of information, and higher computing functionality [28,29].SNNs are also more suitable to achieve unsupervised learning, where patterns are received and stored in the network without labels, which is one of the most general learning mode of the human brain.To achieve unsupervised learning in absence of labels, the delay between spikes is used as feedback informa-tion across the SNN layers, through the spike-timing dependent plasticity (STDP) rule [30,31,32,33,34].Designing memristive synapses capable of STDP, and designing SNN architectures able to replicate computation primitives of the brain is among the greatest challenges for memristive neuromorphic engineering.This chapter provides an overview of memristive SNNs capable of unsupervised learning, focusing on RRAM-based synapses at the level of device, synaptic circuit, and memristive SNNs.First, the device optimization strategy for synaptic application will be discussed, with reference to a novel RRAM technology based on SiO x for improved on/off ratio.Then, a circuit proposal for STDP synapses using RRAMs and PCMs will be described, showing experimental demonstration of individual synaptic circuit blocks and their characteristics.Finally, the SNN architectures will be analyzed, covering both feed-forward and recurrent networks for unsupervised learning of patterns.Pattern learning, storage, recognition, reconstruction and association will be demonstrated in SNNs by circuit simulation and experimental demonstration with physical memristive RRAM synapses.The results pave the way for brain-inspired SNNs capable of unsupervised learning, inference and planning.

RRAM devices
To develop SNNs capable of learning with RRAM synapses, the RRAM device must be optimized to fit the specifications of both nonvolatile storage and in-memory computing.This challenging task is accomplished by a detailed understanding, modeling, and engineering of RRAM materials with emphasis on programming performances (energy consumption, programming voltage, set/reset speed), reliability characteristics (retention time, endurance, variability and noise) and scaling.To enable such a broad landscape of device properties, materials must be carefully selected and combined in the RRAM stack.A RRAM device is operated by the formation and disconnection of a conduc- tive filament (CF), or percolation path, along an insulating material, as depicted in Fig. 1 [35].Initially, the device features a dielectric layer, e.g., a metal oxide MeO x in Fig. 1a.The switching operation in the dielectric is initiated by a forming operation, where a dielectric breakdown is first induced across MeO x to generate a sufficient amount of defects, such as oxygen vacancies, or excess metallic impurities of the constituent metal Me or originating from the electrodes.The oxygen concentration x in the metal oxide is usually kept below the stoichiometric value, e.g., x is generally lower than 2 in HfO x , to allow a certain concentration of defects to be present in the pristine oxide layer and facilitate oxide breakdown at relatively low voltage.After forming (Fig. 1b), the device is in the so-called set state, or low resistance state (LRS), due to the CF shunting the top electrode (TE) and the bottom electrode (BE).The application of a reset operation leads to the disconnection of the CF, and the corresponding transition to the so-called reset state, or high resistance state (HRS), as shown in Fig. 1c.The CF is then recovered by a set operation which leads back to the LRS.In bipolar switching RRAM technology, which constitutes the large majority of RRAM devices currently studied by academic and industrial research, the set and reset operations consist of the application of voltage sweeps or pulses with opposite polarities, e.g., positive voltage for the set transition and negative voltage for the reset transition, where the voltage is assumed to be applied to the TE.RRAM devices can be generally distinguished in 2 technologies, namely RRAM relying on the resistive switching of metal oxides, such as HfO x [36,37,38,39], TiO x [40,41] and TaO x [42,43], and RRAM based on the electrochemical reaction and migration of cations from an active electrode, also known as conductive bridge memory (CBRAM) [44,45,46].In the latter case, the dielectric material can be either a metal oxide, or another insulating layer, also referred to as electrolyte, such as GeSe [44], GdO x [45], GeS 2 [46,47], ZrO x [48] and Al 2 O 3 [49].The active electrode generally consists of Ag [47], Cu [13,44,50], or their compounds, such as CuTe [45].The resistance window between HRS and LRS is generally larger in CBRAM-type devices compared to oxide-based RRAM [51], which enables a higher immunity to switching variations [52,53] and current fluctuations [24,54,55], constituting a significant concern for nanoscale RRAM reliability.On the other hand, CBRAM suffers from a relatively short retention time, as demonstrated by several reports of volatile CBRAM where the retention time is well below 1 s [47,56,57,58,59,60,61].To achieve a good device stability and a high resistance window, the promising properties of oxide-and CBRAM-type RRAM devices should be combined.To this purpose, a dielectric material with high band gap should be adopted, to enable high resistance of the HRS.To enable high resistance window and a high stability at the same time, the TE should be reasonably active, although avoiding the choice of Ag and Cu which may lead to volatile switching behavior.Finally, the BE material should be inert to prevent set transition under negative applied voltage during the reset operation [39].Based on these considerations, a novel RRAM technology was recently proposed, which combines a SiO x dielectric layer, a Ti-based TE and a C-based BE, as de- picted in Fig. 2a [58,62].The Ti cap serves as defect-injecting reservoir layer during set, where Ti cations migrate into the SiO x to form the CF under a positive applied voltage [63,64].The application of a negative voltage leads to migration of Ti cations back to the TE, with no further defect injection from the BE due to the inert quality of the graphitic C layer.The BE is also confined into a 70-nm plug to enable evaluation of the forming, switching, and reliability properties at the nanoscale.The SiO x layer was deposited by e-beam evaporation from a SiO source, thus x should be around 1 in the device stack.Fig. 2b shows the measured I −V curves of a SiO x -RRAM device connected to an integrated field-effect transistor (FET) to control the maximum current during the set transition, also referred to as compliance current I C [63].In the figure, I C was limited to about 70 µA to study the device operation under relatively low current consumption.The I − V curves show set transition for positive voltage and reset transition at negative voltage, with a resistance window of about 10 4 between the LRS and HRS, despite the relatively high resistance of the LRS due to the low I C .The high resistance window is due to the high band gap of the SiO x layer, combined with the CBRAM-type switching mode of the device, where Ti defects are almost completely removed from the SiO x layer after reset, thus enabling a relatively high resistance of the HRS.The abrupt set transition to the LRS reveals the sudden formation of the CF, where more defects introduced into the SiO x enhance the electric field and Joule heating, thus inducing the self-accelerated migration of defects [65,66].On the other hand, the reset transition shows more gradual increase of resistance, as the migration of defects causes an increase in the width of the depleted gap, thus reducing the electric field and Joule heating.As a result, once the reset transition has started, more voltage is needed to further promote migration of defects, which is at the basis of the gradual drop of current in the reset transition of   [66].Note that the abrupt set transition is potentially interesting for digital memory and logic computing applications [67], whereas synaptic potentiation generally requires gradual increase of the conductance, for progressive STDP and fine updating of the synaptic weights in DNNs [25,26,27].Gradual depression is however possible thanks to the negative feedback of the reset transition, thus enabling the use of the SiO x RRAM in DNNs for supervised training by backpropagation algorithm.The gradual reset transition allows to tune the resistance of the HRS, by controlling the width of the depleted gap during reset [67].This is shown in Fig. 3a, where the HRS resistance at the end of the reset transition increases with the maximum voltage V stop applied in the reset sweep.The controllable HRS enables multilevel operation of the SiO x RRAM, which enhances the scalability of the memory device by allowing for the storage of more than one single bit within a physical memory cell [68,69,70,71].Note that the increase of HRS is reflected by the corresponding increase of the set voltage V set , marking the set transition under positive applied voltage.This can be explained by the relationship between V set and the depleted width ∆ in the HRS, where an increasing ∆ leads to a decreasing field across the depleted gap for a given voltage, thus requiring a larger V set to reach the critical field for inducing the set transition by defect migration across the depleted gap [65].The controllable HRS also allows to tune the resistance window which increases with V stop as shown in Fig. 3b.The slight decrease of the LRS resistance with V stop can be explained by the higher average field along the depleted gap for high HRS resistance [58,62].The SiO x RRAM also demonstrates high cycling endurance of almost 10 8 cycles, low cycle-to-cycle switching variations, and an excellent stability at elevated temperature, where both HRS and LRS show negligible variations for annealing at 260 • C for 1 hour [58,62].Overall, these favorable properties make SiO x Fig. 4 Sketch of a fundamental circuit block in a feed-forward neural network including a presynaptic neuron (PRE) and a post-synaptic neuron (POST) connected by a resistive synapse with a 1T1R structure.As a spike is generated by the PRE, a current spike is activated across the synapse leading to an increase of V int within the POST.As V int exceeds the internal threshold V th for fire, a backward spike is applied to the TE of the 1T1R synapse, causing the weight update according to STDP.Adapted with permission from [91].Copyright 2016 IEEE.
RRAM a promising technology for nonvolatile memory and in-memory computing, including neuromorphic memristive networks.

RRAM synapses
Brain-inspired neuromorphic networks rely on synaptic plasticity according to biological learning rules, such as STDP and spike-rate dependent plasticity (SRDP), to emulate human-brain functionalities including visual/auditory pattern learning [72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88] and pattern classification [89,90].The synaptic plasticity can be implemented at circuit level by combining a RRAM device with a FET in the so-called one-transistor/one-resistor (1T1R) structure, as shown in Fig. 4. Here, the 1T1R synapse is shown as a connecting element between a pre-synaptic neuron (PRE) and a POST-synaptic neuron (POST) [83,90,91].The PRE is connected to the gate of the FET in the 1T1R synapse, while the POST receives the synaptic current from the BE while controlling the voltage at the TE of the 1T1R synapse.The operation of the 1T1R synapse can be understood as follows: as the PRE generates a positive voltage spike, the FET acts as a pass-transistor enabling a synaptic current proportional to the RRAM synaptic conductance.The current spike enters the POST via the BE which can collect incoming currents from several synaptic channels, as in the ideal McCulloch-Pitts (MCP) neuron scheme [92].The currents are integrated in the integrate-and-fire POST circuit, eventually leading to a fire event as the integral signal V int reaches the threshold V th .At fire, the POST generates a spike toward the next layer of neurons, and additionally applies a feedback spike to the synapse TE.The feedback spike consists of the sequence of a positive pulse and a negative pulse, which can induce a weight update depending on the relative timing with the PRE spike, as shown in Fig. 5a.If the PRE spike shortly precedes the POST spike (0 < ∆t < 10 ms), the resulting overlap between the PRE spike and the positive pulse in the POST spike causes set transition, as the positive applied voltage is larger than the set volt- age (V T E+ > V set ), hence causing synaptic potentiation.On the other hand, if the PRE spike shortly follows the POST spike (−10 ms < ∆t < 0), the overlap between the PRE spike and the negative pulse in the POST spike causes reset transition, as the negative applied voltage is larger (in absolute value) than the reset voltage (V T E− < V reset ), hence causing synaptic depression.Fig. 5b shows the measured resistance before/after the application of spikes in the Fig. 5a, showing that the RRAM device undergoes a set process (synaptic potentiation) for 0 < ∆t < 10 ms, whereas the RRAM undergoes a reset process (synaptic depression) for −10 ms < ∆t < 0. Figure 5c shows the correlation plot of the resistance R(t i+1 ) measured after the spike application as a function of R(t i ) measured before the spike application, for variable ∆ t [87].Under potentiation condition, namely for positive delay satisfying 0 < ∆t < 10 ms, a RRAM prepared in HRS undergoes a set transition to the LRS, whereas if the RRAM device is initially in LRS, no resistance variation occurs because the RRAM is already at its minimum resistance state [87,93].For negative delay satisfying −10 ms < ∆t < 0, corresponding to the condition for synaptic depression, a resistance transition is activated when the RRAM device is initialized in its LRS.Finally, if ∆ t assumes values outside the plasticity window (|∆t| > 10 ms), the PRE and POST spikes do not overlap, therefore the RRAM resistance does not change.As a result of the full set/reset operations taking place in the plasticity mechanism, the 1T1R synapse only displays HRS and LRS resistive levels, thus evidencing the binary operation of the 1T1R synaptic device due to the relatively abrupt set and reset transitions [87].Note that more resistance levels can be achieved by timedependent modulation of the PRE and POST spikes in the 2T1R synapse [94].In this synapse architecture, the waveform of the PRE spike allows for time-dependent potentiation, where a longer ∆ t corresponds to a smaller conductance due to the lower compliance current during set transition.On the other hand, the waveform of the POST spike allows for time-dependent depression, where a longer ∆ t corresponds to a smaller resistance due to the lower voltage applied during reset transition [94].The enhanced functionality comes at the expense of a slightly higher complexity of the 2T1R synapse circuit, requiring 2 transistors instead of only one in the 1T1R synapse [91].To further support the dependence of STDP on initial state in the 1T1R synapse, Fig. 6 shows the measured (a) and calculated (b) STDP characteristics, namely the ratio between the initial resistance R 0 and the final resistance after potentiation/depression, as a function of ∆ t for increasing R 0 [83].Calculations were done based on a compact model for RRAM devices [66] including statistical variability [83].These results show binary STDP behavior, where the amount of potentiation and depression is a function of R 0 .The variable change of resistance allows the final resistance to be equal to either HRS or LRS, in strong analogy with biological synapses where the weight update is limited between two boundary states.Fig. 7 further illustrates the three-dimensional (3D) color map of the calculated STDP characteristics, evidencing the increase of potentiation/depression level of 1T1R synapse with increasing/decreasing R 0 for positive/negative ∆ t [83].(e) The presentation of the pattern causes POST fires because of V int exceeding the threshold, thus leading to potentiation in the synapses with ∆t > 0. Random noise presentation instead may cause depression, because of stimulation of PRE channels shortly after a fire event with ∆t < 0. Reprinted with permission from [83].Copyright 2016 IEEE.

Simulation results
The demonstration of the STDP learning rule at the level of individual synapses opens the way for unsupervised learning in full feed-forward networks, such as the perceptron network [95] depicted in Fig. 8a, consisting of a 2-layer neural network with 64 PREs and a single POST [83].In the network, each PRE is connected to the POST via a 1T1R synapse as the one described in Fig. 4. A stochastic learning approach is adopted to induce unsupervised learning, namely, automatic potentiation of synapses within a reference pattern, and depression of all other synapses.The stochastic learning approach consists of the presentation of the reference pattern, e.g. the alphabetical letter X (Fig. 8b).During this presentation, all PREs belonging to the pattern collectively generate a spike which is applied to the corresponding 1T1R synapses.Alternatively, a random noise pattern, e.g., the one shown in Fig. 8c, is presented, each one stimulating random PRE channels.Noise is randomly alternated with the pattern at each epoch, namely, the periodic times marking the submission of a pattern to the network.While the presentation of the pattern causes POST fire, thus potentiation of the active synapses within a certain epoch, the presentation of noise is crucial since it allows for the depression of all the synapses in the background, namely, the space not included in the pattern.This is possible because noise activates random synapses soon after POST fire, thus satisfying the condition for depression with ∆t < 0 according to the STDP learning rule.
Fig. 8 also shows a typical sequence of pattern and noise submissions to the input layer (d) and the internal potential V int of the POST as a function of time (e).The presentation of the pattern generally causes fire, hence leading to potentiation of the active synapses according to STDP.On the other hand, the random synapses activated after POST fire are depressed according to STDP, thus further supporting the importance of the noise alternation to induce depression.Note that the network is never stimulated by two consecutive pattern submissions to avoid activation of the pattern after fire, which could cause unwanted depression of pattern synapses [83].Fig. 9a shows the calculated evolution of synaptic weights as a function of epochs during training phase, where each epoch corresponds to a time interval of 10 ms.The detailed map of synaptic weights is shown in the color plots of Fig. 9b-d, describing the weight distribution at epochs 0, 250 and 500.Starting from a uniform distribution of synaptic weights at epoch 0, the pattern synapses undergo a relatively abrupt potentiation within the first 50 epochs.On the other hand, background synapses require a longer time of about 150 epochs for depression, due to the random activation and depression of individual random synapses during the stochastic learning process.The different timescale for potentiation and depression is clearly indicated by the average conductance of pattern and background synapses in Fig. 9a.These simulation results evidence the capability of visual pattern learning according to STDP in a 2-layer perceptron neural network equipped with 1T1R synapses.

Hardware demonstration of unsupervised learning
Pattern learning via STDP was demonstrated in hardware via RRAM-based neural networks by achieving remarkable performances on both large [76] and small scale [84,86,87,88].Fig. 10 shows the reference architecture for unsupervised learning, consisting of a 2-layer perceptron with spiking neurons and 1T1R synapses.This scheme was adopted in a full-hardware implementation with physical RRAM devices and spiking neurons [87].
Fig. 11 shows a schematic illustration of the neural network circuit (a) and the corresponding hardware implementation on a printed circuit board (PCB) (b).The neural network consists of a 2-layer perceptron including 16 PREs, 16 1T1R synapses and a single POST [87,88].The PRE spikes were implemented via digital switches enabling the application of a voltage V G to the gate of the 1T1R synapses.The PRE switches were controlled by an Arduino Due microcontroller (µC), which also served as leaky integrate-and-fire (LIF) circuit of the POST for the digital integration of the synaptic currents, which were initially converted into an analog voltage by an external transimpedance amplifier.The feedback spike to the 1T1R synapses at fire was generated by the µC, driving a multiplexer (MUX) to provide the appropriate voltage to the TE according to the scheme in Fig. 5. Fig. 12 shows an experimental demonstration of unsupervised learning of a 4x4 visual pattern via the hardware spiking neural network in Fig. 11 [87].The initial weights of the 16 1T1R synapses were prepared in a random state between LRS and HRS, then a diagonal pattern was submitted with stochastic alternation with random noise images.Fig. 12a-d shows the color plots of the synaptic weights during the unsupervised learning process of 1000 epochs (10 s).Fig. 12e shows the stochastic submission of PRE spikes, representing the pattern or noise consisting of a 3% of  activated channels at each epoch to induce background depression (see Sec. 3).The pattern and noise were presented with equal probabilities of 50%.The relatively small percentage (3%) of activated channels in the noise image was adopted to prevent unwanted noise-induced fires which could lead to unstable behavior during the learning process [88].Fig. 12f shows the detailed evolution of all the pattern (red) and background (blue) synaptic weights evidencing the fast convergence of pattern weights to high conductance values (potentiation) and a more gradual transition of

background weights toward low conductance values (depression).
The ability to induce potentiation and depression in real time based on the submitted input spikes allows to quickly adapt the stored weights to a dynamically changing stimulation.To prove the ability to learn dynamic patterns, the network of Fig. 11 was trained with a sequence of 3 distinct patterns, while monitoring the synaptic weights in real time during learning.Fig. 13a-c shows the 3 patterns which were sequentially presented to the first layer of the neural network during the experiment, while Fig. 13d shows a typical noise image which was alternated with the patterns.After preparing the synaptic weights in HRS (Fig. 13e), the network was externally stimulated by pattern #1 for 300 epochs (3 s), resulting in the potentiation of pattern synapses and depression of background synapses, as evidenced in Fig. 13f.In the following 300 epochs (epochs 301-600), pattern #2 was submitted, causing the readjustment of the synaptic weights to adapt to the new pattern, while the previous one was forgotten (Fig. 13g).At epoch 601, pattern #3 was presented in the PRE spikes, and eventually learnt by the synaptic network as evidenced by the color plot in Fig. 13h.Fig. 13i shows the raster plot of PRE spikes evidencing the pattern and noise presentation to the network as a function of epochs during each training phase, while Fig. 13j shows the measured synaptic weights as a function of time, further supporting online unsupervised learning by STDP.A slightly more sophisticated perceptron is displayed in Fig. 14a, featuring 2 POSTs in the second layer for the learning and recognition of 2 distinct patterns.POST1 and POST2 are each connected to the 3x3 PRE layer via 9 1T1R synapses capable of STDP.Also, to avoid learning of the same pattern by the 2 POSTs, the first POST (POST1) and the second POST (POST2) were controlled by the µC to implement a winner-take-all (WTA) learning optimization scheme [74,96], where the fire of one POST caused the reset of the internal potential in the other POST.This was achieved by 2 inhibitory synapses connecting POST1 to POST2, and POST2 to POST1, to allow for bidirectional WTA [74].In the experiment, the 2 patterns and the usual noise images were submitted to the 3x3 input layer with random alternated sequence.A top bar and a bottom bar were used as initial patterns for the first 1000 epochs, as shown in Fig. 14b.After this first phase, the bars were modified by a 1-step counterclockwise shift along the perimeter of the 3x3 frame, until the bar reached the bottom, from the top, or vice versa.Each new learning phase lasted 1000 epochs.Fig. 14c-d shows color plots of the conductance of synapses connected to POST1 and POST2, respectively, during the  LRS2, respectively, by the set transition with different gate voltages V G1 and V G2 , with V G1 < V G2 , resulting in compliance currents I C1 < I C2 , thus causing LRS1 resistance being higher than LRS2.(g) Raster plot of PRE spikes evidencing the input submission of gray and white patterns with different applied V G , and noise patterns to obtain synaptic depression, hence black level or HRS.(h) Measured synaptic weights during learning, evidencing the adjustment of the conductance to one of the 3 gray levels, namely HRS, LRS1, and LRS2.Reprinted from [87], which is licensed under a Creative Commons Attribution 4.0 International License.dynamic learning process, evidencing the capability of the 2 POSTs to separately learn the 2 submitted patterns, and to respond to consecutive pattern shifts by gradually adjusting the synaptic weights [87].Fig. 14e-f shows the synaptic weights of POST1 and POST2, respectively, as a function of epochs during the learning experiment, further evidencing the synaptic plasticity in response to the input dynamic patterns.
All previous examples consider digital input patterns, consisting of 0/1 states for each channel, which could be represented by HRS and LRS of the synapse, respectively.On the other hand, a more realistic case is the gray-scale image, where analog stimulation in the first layer, e.g., mapped in the spike frequency or spike amplitude, requires more than 2 resistive levels to represent the input patterns.To this purpose, the gray-scale pattern learning process described in Fig. 15 considers a HRS level, corresponding to the black tone in the figure, and 2 different LRS levels, called LRS1 and LRS2, with the weight of LRS1 being smaller than LRS2 [87].Fig. 15a-d shows the color plots of the measured synaptic weights during a 1000epoch gray-scale pattern learning experiment performed by the hardware neural network with 3x3 PREs and one POST.Starting from a random weight configuration, the synaptic weights show the learning of a gray-scale image with 3 levels via a selective analog potentiation process.To obtain these results, LRS1 and LRS2 were achieved by using 1T1R synapses with 2 different compliance currents I C1 = 50 µA and I C2 = 100 µA, respectively, as a result of the application of the gate voltages V G1 = 2.1 V and V G2 = 2.5 V, as shown by the I − V curves in Fig. 15e-f.The resulting conductance values are LRS1 (gray level) being lower than LRS2 (white level) because of the lower I C resulting in a higher resistance [63].Fig. 15g shows the sequence of PRE spikes submitted to the network to implement the 3-level grayscale image learning, while Fig. 15h shows the evolution of the synaptic weights with time, indicating the learning of 3 distinct gray-scale levels due to analog potentiation for white and gray tones and noise-induced depression for black tone, respectively.These experimental results support the capability of multi-level pattern learning in 1T1R synapses, paving the way for color-scale image learning.

Attractor formation
While feed-forward networks can be helpful in several applications, such as learning and recognition of different kind of patterns, most of the information processing in the human brain is done by recurrent neural networks (RNNs), where at least one feedback loop exists connecting the output layer to the input layer [97].For instance, the brain ability to retrieve a previously stored memory, also referred to as attractor state, in response to a partial stimulus, has been the subject of intense studies.According to biological observations, it is believed that this emergent computational ability results from the specific recurrent synaptic topology in interested brain areas such as the hippocampus.Thus, spiking RNNs were modeled and designed leading to both CMOS-based [98,99,100,101,102,103,104] and memristor-based circuit implementations [105,106,107,108,109,110,111].Fig. 16a shows a simplified sketch of a RNN consisting of 4 neurons, each providing both excitatory and inhibitory stimulations to each of the other 3 neurons, as well as receiving an external input X.Specifically, according to the well-known Hopfield network topology [112], no self-feedback is present in any of the neurons, which prevents a divergent dynamics occurring during network operation.
Fig. 16b shows the circuit implementation of the RNN with spiking neurons and 1T1R synapses [110,111].Here, integrate-and-fire neurons are fully connected one to each other by excitatory (blue) and inhibitory (red) 1T1R synapses.The generic i-th neuron has 2 inputs, namely the external current spike X i , and the total synaptic current activated by other neurons, and 3 outputs, namely (i) G i , which is applied to the gate of 1T1R synapses along the i-th row, (ii) O i , which is applied to the TE of excitatory 1T1R synapses along the i-th column, and (iii) O' i , which is applied to the TE of inhibitory 1T1R synapses along the i-th column.The operation of the RNN during the learning process of an attractor state is de- scribed in Fig. 16c for the specific pair of neurons N 2 and N 3 , the excitatory synapse W 32 and the inhibitory synapse W' 32 .At the fire event of N 3 , the gates of all the excitatory/inhibitory synapses along the 3 rd row are activated, thus inducing synaptic currents which are proportional to the synaptic weight.The synaptic currents are activated by a relatively small voltage V read being positive for excitatory synapses, and negative for inhibitory synapses.If the total current collected at the input of N 2 exceeds the fire threshold, the neuron N 2 fires.If the fires of the 2 neurons occur at the same time, both gate and TE pulses are applied to synapses W 32 and W' 32 at the same time, resulting in potentiation of the excitatory synapse, because of the positive TE voltage V exc causing set transition, and depression of the inhibitory synapse, because of the negative TE voltage V inh causing reset transition [110,111].Thus, the stimulation by external spikes causes network training according to the Hebbian rule, where 'neurons that fire together wire together' [113].The potentiation of excitatory synapses and the depression of inhibitory synapses cause the formation of an attractor state in the RNN.Note that the potentiation of excitatory synapses and the depression of inhibitory synapses are simplified cases of the STDP in the 1T1R synapses described in Sec. 3, where the bipolar voltage pulse at the TE is replaced by a unipolar voltage pulse, with either positive voltage V exc or negative voltage V inh .Fig. 17 shows the simulation results of a sequential learning of two orthogonal attractors, namely 2 attractors with no neurons in common, via a 6-neuron RNN based on the network architecture shown in Fig. 16b [110].First, all excitatory weights were prepared in HRS, whereas all inhibitory weights were prepared in LRS.Then, the pool of neurons N 1 , N 2 , and N 3 were externally stimulated for 1 s, as shown in Fig. 17a.Finally, external current spikes were applied to another pool including N 4 , N 5 , and N 6 for the following 1-s-long training phase (Fig. 17b).Fig. 17c shows the evolution of the excitatory synaptic weights, indicating the potentiation of excitatory synapses in the first attractor (N 1 , N 2 , N 3 ) during the first learning phase, followed by the potentiation of excitatory synapses in the second attractor (N 4 , N 5 , N 6 ) in the second learning phase.Note that external stimulation was asynchronous and had an average frequency of 200 Hz for stimulated neurons.Similarly, Fig. 17d shows the color plots for inhibitory weights at increasing times, evidencing the depression of inhibitory weights in the attractors.Fig. 17e shows the detailed time evolution of excitatory weights, while Fig. 17f shows inhibitory weights during the two learning phases, further supporting the RNN capability of learning orthogonal attractors.

Associative memory
After the attractor learning, the RNN is operated in a different mode aimed at testing or recalling the stored network states [107,108,110,111].A key property of the RNN is that, after attractors are formed in the network, it is possible to recall an attractor even in the presence of a partial or erroneous stimulus of the attractor, which is important for error-tolerant brain-inspired cognitive systems [98,112].This type of attractor recall is at the origin of associative learning, namely a fundamental cognitive primitive in the mammalian brain, which received in-depth theoretical and experimental studies, as indicated by the well-known Pavlov's dog experiments [114].To illustrate the associative learning in the RRAM-based RNN, Fig. 18 shows simulation results for recalling the attractor (N 1 , N 2 , N 3 ), and its significance in terms of associative learning according to the Pavlov's dog experiments [110].If the food presentation to the dog is always combined with the ringing of a bell, the 'bell' and  'food' are associated, i.e., an attractor is formed in the dog's brain.Consequently, whenever the dog hears the bell's ring alone, it resuscitates the concept of food and the stimulus to salivation (Fig. 18a), similar to the direct presentation of food (Fig. 18b).

Pattern learning and reconstruction
RNN can provide insight regarding some typical human brain functionalities, such as the memory formation and error-tolerant retrieval [99,107,108,111].shows simulation results of pattern learning in a RRAM-based RNN with 64 neurons arranged with the architecture of Fig. 16b [111].Orthogonal attractors were formed by presenting 2 patterns in a sequence, namely the image 'X' (Fig. 19a) followed by the image 'C' (Fig. 19b), both presented to the RNN for 5 s.The weights were prepared in HRS for excitatory synapses and LRS for inhibitory synapses, as shown in Fig. 19c.Then, the application of pattern 'X' led to Hebbian modification of the synapses in the first attractor within 5 s (Fig. 19d).The presentation of the second pattern for the following 5 s led to the formation of the second attractor (Fig. 19e).The attractor formation is further illustrated in Fig. 19f, showing the calculated weights of the excitatory synapses (top) and inhibitory synapses (bottom) as a function of time.The 2 phases for learning the 'X' in the first 5 s and the 'C' in the following 5 s can be clearly seen.After the attractor formation, the capability of reactivating the whole attractor by submitting only part of the pattern was tested.Fig. 20 shows the input excitation patterns that were submitted in the simulations, consisting of partial versions of the pattern 'X' with only (a) 9 active channels or (b) 5 active channels and partial versions of the pattern 'C' with only (c) 7 active channels or (d) 4 active channels [111].Fig. 20e shows the simulation results of attractor recall with the partial patterns in (a) and (b), each case being simulated 10 times for statistical significance.The number of activated neurons increases with time during the submission of the partial pattern, eventually activating all the 12 neurons in the original pattern 'X'.Note that the average time required to retrieve the whole pattern 'X' decreases as the number of externally stimulated neurons increases, as a result of the higher synaptic current feeding other unstimulated neurons within the selected attractor.Similar to the previous case for pattern 'X', the stimulation of a part of attractor 'C' leads to the activation of all the 12 neurons in the attractor, as shown in Fig. 20f.These results support error tolerant pattern recognition, where a pattern is recog-nized even in presence of a bare suggestion, or stimulation of only a fraction of the pattern.To explore the limits of the error tolerant recognition, and the possibility of confusion between competing patterns, Fig. 20g shows a color map of the calculated probability P of recognizing the pattern 'X' after externally stimulating the 64-neuron RNN for 1 s.The recognition probability is reported as a function of the number of externally stimulated neurons belonging to 'X' or 'C'.The reported P is the average over 1000 simulations for each case.Note that all the simulations eventually led to recognition of either 'X' or 'C', therefore the probability for recognizing C is given by 1-P [111].The results indicate that P increases as the number of stimulated X-neurons increases, and P decreases as the number of stimulated Cneurons increases.For similar number of X-and C-neurons being excited, the color plot shows random behavior of the RNN with P of about 50%.Finally, note that as the stimulated X-and C-neurons within the submitted test pattern are both above 7, P assumes intermediate values since such a high external excitation can activate either attractors with high probability, thus the recall process is mainly controlled by the stochastic Poisson input spike trains used to stimulate the RNN.The results corroborate the feasibility of error-tolerant brain-inspired RNN with RRAM-based 1T1R synapses capable of STDP.

Conclusions
This chapter provides an overview of the RRAM-based neuromorphic circuits for brain-inspired computing.RRAM devices and architectures might provide a promising technology for scalable, energy-efficient neuromorphic chips, to tackle the challenges of the emergent big data processing and pervasive Internet of Things (IoT).In this scenario, the RRAM device operation, challenges and emerging technologies are reviewed with reference to novel SiO 2 RRAM capable of improved resistance window and stability for multilevel operation in neural networks.Then, RRAM synapses capable of STDP are described, addressing the physical processes and circuit algorithms allowing for time-dependent potentiation and depression.Finally, SNN architectures capable of pattern learning and other cognitive computing primitives are discussed, covering both feed-forward architectures and braininspired recurrent SNNs.Pattern learning, associative memory, attractor recognition and error tolerant reconstruction of information are shown by the simulation of RRAM-based RNNs.The scenario supports RRAM-based SNN as a promising and attractive technology for low-power and scalable neuromorphic computing.

Fig. 1
Fig. 1 RRAM device states.(a) The device features a dielectric layer, such as a binary metal oxide MeO x .(b) After forming, the device is left in a set state, or LRS, due to the presence of a CF shunting the TE and BE.(c) The application of a reset pulse leads to the disconnection of the CF thus resulting in the reset state, or HRS.Transition back to the LRS is possible via the application of a set pulse.Reprinted with permission from [35].Copyright 2014 Wiley and Sons.

Fig. 2
Fig. 2 SiO x -based RRAM device.(a) Device stack structure and (b) measured I − V curves, evidencing set and reset transitions at positive and negative voltages, respectively.A large resistance window of about 10 4 is obtained thanks to the high band gap of the SiO x dielectric layer and the complete dissolution of the Ti-based CF in the HRS.Reprinted with permission from [58].Copyright 2016 IEEE.

Fig. 3
Fig. 3 Control of resistance window in SiO x -based RRAM.(a) Measured I −V curves at increasing V stop and (b) corresponding resistance for HRS and LRS as a function of V stop , for I C = 70 µA.The resistance window controllably increases with V stop , thus enabling multilevel memory operation and gradual synaptic depression.Reprinted with permission from [58].Copyright 2016 IEEE.

Fig. 2b
Fig. 2b[66].Note that the abrupt set transition is potentially interesting for digital memory and logic computing applications[67], whereas synaptic potentiation generally requires gradual increase of the conductance, for progressive STDP and fine updating of the synaptic weights in DNNs[25,26,27].Gradual depression is however possible thanks to the negative feedback of the reset transition, thus enabling the use of the SiO x RRAM in DNNs for supervised training by backpropagation algorithm.The gradual reset transition allows to tune the resistance of the HRS, by controlling the width of the depleted gap during reset[67].This is shown in Fig.3a, where the HRS resistance at the end of the reset transition increases with the maximum voltage V stop applied in the reset sweep.The controllable HRS enables multilevel operation of the SiO x RRAM, which enhances the scalability of the memory device by allowing for the storage of more than one single bit within a physical memory cell[68,69,70,71].Note that the increase of HRS is reflected by the corresponding increase of the set voltage V set , marking the set transition under positive applied voltage.This can be explained by the relationship between V set and the depleted width ∆ in the HRS, where an increasing ∆ leads to a decreasing field across the depleted gap for a given voltage, thus requiring a larger V set to reach the critical field for inducing the set transition by defect migration across the depleted gap[65].The controllable HRS also allows to tune the resistance window which increases with V stop as shown in Fig.3b.The slight decrease of the LRS resistance with V stop can be explained by the higher average field along the depleted gap for high HRS resistance[58,62].The SiO x RRAM also demonstrates high cycling endurance of almost 10 8 cycles, low cycle-to-cycle switching variations, and an excellent stability at elevated temperature, where both HRS and LRS show negligible variations for annealing at 260 • C for 1 hour[58,62].Overall, these favorable properties make SiO x

Fig. 5
Fig. 5 (a) PRE and POST voltage waveforms applied to the gate and the TE, respectively, for the cases of positive delay (left) and negative delay (right).(b) If the PRE spike occurs before the POST spike (∆t > 0), the resistance decreases due to the positive TE spike causing set transition, or synaptic potentiation.Otherwise, if the PRE spike occurs after the POST spike (∆t < 0), the resistance increases due to the negative TE spike causing reset transition, or synaptic depression.(c) Correlation plot of the RRAM resistance R(t i+1 ) at epoch t i+1 as a function of the RRAM resistance R(t i ) at epoch t i for variable ∆ t, corresponding to the cases of potentiation, depression, and no change of weight because of excessive delay.Reprinted from [87], which is licensed under a Creative Commons Attribution 4.0 International License.

Fig. 6
Fig. 6 (a) Measured and (b) calculated STDP characteristics indicating the relative change of resistance R 0 /R as a function of ∆ t for variable initial resistance states R 0 , from full LRS (R 0 = 25 kΩ ) to full HRS (R 0 = 500 kΩ ).Reprinted with permission from [83].Copyright 2016 IEEE.

Fig. 7
Fig.7Three-dimensional (3D) color plot of calculated STDP characteristics shown in Fig.6.Potentiation and depression are both a function of time delay and the initial synaptic state, resulting in the final state being either HRS or LRS.Reprinted with permission from[83].Copyright 2016 IEEE.

Fig. 8
Fig. 8 (a) Schematic representation of a 2-layer feed-forward neural network with 64 PREs fully connected to one POST by 64 1T1R synapses.To enable pattern learning, (b) a pattern and (c) random noisy images are presented to the network with a random sequence as shown by the raster plot in (d).(e)The presentation of the pattern causes POST fires because of V int exceeding the threshold, thus leading to potentiation in the synapses with ∆t > 0. Random noise presentation instead may cause depression, because of stimulation of PRE channels shortly after a fire event with ∆t < 0. Reprinted with permission from[83].Copyright 2016 IEEE.

Fig. 9
Fig. 9 (a) Calculated synaptic weights in the pattern (red) and background (cyan) as a function of time, obtained by a simulation of unsupervised pattern learning with a RRAM synapse model.(b) Starting from random synaptic states, each lying between HRS and LRS, synapses change their weight according to the stochastic learning process via potentiation of pattern weights and depression of background synapses (c, d).The black and blue traces indicate the average weight of pattern and background synapses, respectively.Reprinted with permission from [83].Copyright 2016 IEEE.

Fig. 10
Fig. 10 Illustrative scheme of a 2-layer neural network with perceptron structure including 4x4 PREs and one POST connected by 16 1T1R synapses.Reprinted from [87], which is licensed under a Creative Commons Attribution 4.0 International License.

Fig. 11 (
Fig. 11 (a) Circuit schematic and (b) hardware implementation of the RRAM-based feed-forward network depicted in Fig. 10.Reprinted from [87], which is licensed under a Creative Commons Attribution 4.0 International License.

Fig. 12 (
Fig. 12 (a-d) Color plots of the synaptic weights measured during a pattern learning experiment performed by the network of Fig. 11 with a diagonal pattern.(e) Raster plot of PRE spikes within pattern (red) and background (blue) channels.(f) Measured synaptic weights showing the adjustment of pattern synapses to LRS and background synapses to HRS.Reprinted from [87], which is licensed under a Creative Commons Attribution 4.0 International License.

Fig. 13
Fig. 13 Experimental demonstration of dynamic pattern learning as a result of sequential application of (a) pattern #1, (b) pattern #2, and (c) pattern #3, each alternated with (d) random noise spikes.(e-h) Color plots of measured synaptic weights during learning experiment which, (e) initialized in a random weight configuration, evidence an effective adaptation to the (f) pattern #1, (g) pattern #2, and (h) pattern #3 within epoch 300, 600 and 1000, respectively.(i) Raster plot of PRE spikes generated as a result of pattern (red) and noise (blue) presentation to the first layer of the neural network and (j) evolution of synaptic weights within the pattern (red) and the background (blue) as a function of epochs, showing a selective potentiation of pattern synapses and a slower depression of background synapses during each training phase.Reprinted from [87], which is licensed under a Creative Commons Attribution 4.0 International License.

Fig. 14 (
Fig. 14 (a) Sketch of a perceptron network with 2 POSTs in the output neuron layer to implement multiple pattern learning according to the winner-take-all (WTA) scheme.(b) Patterns submitted to the PRE layer, consisting of a top bar and a bottom bar, which were then gradually modified by a counter-clockwise rotation by one step every 1000 epochs to experimentally demonstrate the capability of online dynamic learning.(c, d) Color plots of synaptic weights connecting the input layer to POST1 and POST2, respectively, evidencing the online adaptation to the dynamically changing patterns.(e, f) Measured synaptic weights for POST1 and POST2, respectively, showing the evolution of pattern (red) and background (blue) synaptic weights during the learning process.Note the adjustment of synaptic weights to LRS or HRS, for pattern or background synapses, respectively, in every phase of the online learning experiment.Reprinted from [87], which is licensed under a Creative Commons Attribution 4.0 International License.

Fig. 15
Fig. 15 Color plots of measured synaptic weights at epoch (a) 0, (b) 300, (c) 600, and (d) 1000 as a result of a gray-scale pattern learning.Whereas the black level corresponds to the HRS, gray and white levels correspond to two different low resistance states, namely LRS1 and LRS2, with the conductance of LRS1 being smaller than LRS2.(e, f) Measured and calculated I −V curves for the 1T1R synapses, showing the formation of LRS1 and LRS2, respectively, by the set transition with different gate voltages V G1 and V G2 , with V G1 < V G2 , resulting in compliance currents I C1 < I C2 , thus causing LRS1 resistance being higher than LRS2.(g) Raster plot of PRE spikes evidencing the input submission of gray and white patterns with different applied V G , and noise patterns to obtain synaptic depression, hence black level or HRS.(h) Measured synaptic weights during learning, evidencing the adjustment of the conductance to one of the 3 gray levels, namely HRS, LRS1, and LRS2.Reprinted from [87], which is licensed under a Creative Commons Attribution 4.0 International License.

Fig. 16 (
Fig. 16 (a) Illustrative scheme and (b) corresponding circuit implementation of a RNN with the Hopfield network configuration, namely with no self-feedback connections, in the case of 4 fullyconnected neurons.The synapses consist of 1T1R structures and are operated as either excitatory or inhibitory connections between 2 neurons in the RNN.(c) Schematic illustration of RNN operation during the training phase at 2 neurons (N 2 and N 3 ) and their respective synapses.The time overlap between neuron spikes causes potentiation of excitatory synapses, and depression of inhibitory synapses.Adapted with permission from [111].Copyright 2018 IEEE.

Fig. 17
Fig.17Simulation of the learning process in a 6-neuron RNN, where two orthogonal attractor states are formed in response to external stimulation of neurons N 1 , N 2 , and N 3 (a), followed by the external stimulation of neurons N 4 , N 5 , and N 6 (b).Color code representation of the conductance values of (c) excitatory and (d) inhibitory synapses at times 0 s, 1 s, and 2 s.Weights of the (e) excitatory and (f) inhibitory synapses evidencing the selective potentiation/depression of the excitatory/inhibitory synaptic weights within the attractor activated by stimulation of N 1 , N 2 , and N 3 during the first 1-s-phase, followed by the potentiation/depression of excitatory/inhibitory synapses within the second attractor by stimulation of N 4 , N 5 , and N 6 during the consecutive 1-s-learning phase.

Fig. 18
Fig.18Illustrative explanation of an associative memory referring to the Pavlov's dog experiments.The regular application of an external stimulus such as a bell's ring during feeding leads to the formation of an attractor state linking bell's ring, food and salivation.As a result, the sound of a bell's ring after training induces the activation of the whole attractor, including the neurons associated with the concept of food, and the stimulus to salivation (a).The attractor is similarly activated by the direct sight of food (b).Reprinted with permission from[110].Copyright 2017 IEEE.

Fig. 19
Fig. 19 Simulation of the formation of two attractors as a result of external stimulation of a 64neuron RNN due to the sequential submission of pattern 'X' for 5 s followed by pattern 'C' for another 5-s-long phase.(a, b) Pattern 'X' and 'C', respectively.(c-e) Color code representation of the weights of (top) excitatory and (bottom) inhibitory synapses at times 0 s, 5 s, and 10 s, starting from HRS and LRS, respectively.(f) Calculated weights of excitatory synapses (top) and inhibitory synapses (bottom), clearly indicating the sequential formation of the attractor 'X' (red), followed by the attractor 'C' (blue).Adapted with permission from [111].Copyright 2018 IEEE.

Fig. 20
Fig. 20 Incomplete patterns used for simulating pattern reconstruction after pattern learning with (a) 9 active channels and (b) 5 active channels instead of the correct 12, for pattern 'X'.Incomplete patterns with (c) 7 active channels and (d) 4 active channels instead of the correct 12, for pattern 'C'.Number of activated neurons within (e) the attractor 'X' and (f) the attractor 'C' as a function of time, for different number of externally-stimulated neurons.(g) Color map of probability P of reconstructing attractor 'X', as a function of the number of externally activated neurons in pattern 'X' and pattern 'C'.The probability of reactivating the attractor 'C' can be obtained as 1-P.Adapted with permission from [111].Copyright 2018 IEEE.