-
Differentiable Transportation Pruning
Authors:
Yunqiang Li,
Jan C. van Gemert,
Torsten Hoefler,
Bert Moons,
Evangelos Eleftheriou,
Bram-Ernst Verhoef
Abstract:
Deep learning algorithms are increasingly employed at the edge. However, edge devices are resource constrained and thus require efficient deployment of deep neural networks. Pruning methods are a key tool for edge deployment as they can improve storage, compute, memory bandwidth, and energy usage. In this paper we propose a novel accurate pruning technique that allows precise control over the outp…
▽ More
Deep learning algorithms are increasingly employed at the edge. However, edge devices are resource constrained and thus require efficient deployment of deep neural networks. Pruning methods are a key tool for edge deployment as they can improve storage, compute, memory bandwidth, and energy usage. In this paper we propose a novel accurate pruning technique that allows precise control over the output network size. Our method uses an efficient optimal transportation scheme which we make end-to-end differentiable and which automatically tunes the exploration-exploitation behavior of the algorithm to find accurate sparse sub-networks. We show that our method achieves state-of-the-art performance compared to previous pruning methods on 3 different datasets, using 5 different models, across a wide range of pruning ratios, and with two types of sparsity budgets and pruning granularities.
△ Less
Submitted 31 July, 2023; v1 submitted 17 July, 2023;
originally announced July 2023.
-
A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference
Authors:
Manuel Le Gallo,
Riduan Khaddam-Aljameh,
Milos Stanisavljevic,
Athanasios Vasilopoulos,
Benedikt Kersting,
Martino Dazzi,
Geethan Karunaratne,
Matthias Braendli,
Abhairaj Singh,
Silvia M. Mueller,
Julian Buechel,
Xavier Timoneda,
Vinay Joshi,
Urs Egger,
Angelo Garofalo,
Anastasios Petropoulos,
Theodore Antonakopoulos,
Kevin Brew,
Samuel Choi,
Injo Ok,
Timothy Philip,
Victor Chan,
Claire Silvestre,
Ishtiaq Ahsan,
Nicole Saulnier
, et al. (4 additional authors not shown)
Abstract:
The need to repeatedly shuttle around synaptic weight values from memory to processing units has been a key source of energy inefficiency associated with hardware implementation of artificial neural networks. Analog in-memory computing (AIMC) with spatially instantiated synaptic weights holds high promise to overcome this challenge, by performing matrix-vector multiplications (MVMs) directly withi…
▽ More
The need to repeatedly shuttle around synaptic weight values from memory to processing units has been a key source of energy inefficiency associated with hardware implementation of artificial neural networks. Analog in-memory computing (AIMC) with spatially instantiated synaptic weights holds high promise to overcome this challenge, by performing matrix-vector multiplications (MVMs) directly within the network weights stored on a chip to execute an inference workload. However, to achieve end-to-end improvements in latency and energy consumption, AIMC must be combined with on-chip digital operations and communication to move towards configurations in which a full inference workload is realized entirely on-chip. Moreover, it is highly desirable to achieve high MVM and inference accuracy without application-wise re-tuning of the chip. Here, we present a multi-core AIMC chip designed and fabricated in 14-nm complementary metal-oxide-semiconductor (CMOS) technology with backend-integrated phase-change memory (PCM). The fully-integrated chip features 64 256x256 AIMC cores interconnected via an on-chip communication network. It also implements the digital activation functions and processing involved in ResNet convolutional neural networks and long short-term memory (LSTM) networks. We demonstrate near software-equivalent inference accuracy with ResNet and LSTM networks while implementing all the computations associated with the weight layers and the activation functions on-chip. The chip can achieve a maximal throughput of 63.1 TOPS at an energy efficiency of 9.76 TOPS/W for 8-bit input/output matrix-vector multiplications.
△ Less
Submitted 6 December, 2022;
originally announced December 2022.
-
On the visual analytic intelligence of neural networks
Authors:
Stanisław Woźniak,
Hlynur Jónsson,
Giovanni Cherubini,
Angeliki Pantazi,
Evangelos Eleftheriou
Abstract:
Visual oddity task was conceived as a universal ethnic-independent analytic intelligence test for humans. Advancements in artificial intelligence led to important breakthroughs, yet competing with humans on such analytic intelligence tasks remains challenging and typically resorts to non-biologically-plausible architectures. We present a biologically realistic system that receives inputs from synt…
▽ More
Visual oddity task was conceived as a universal ethnic-independent analytic intelligence test for humans. Advancements in artificial intelligence led to important breakthroughs, yet competing with humans on such analytic intelligence tasks remains challenging and typically resorts to non-biologically-plausible architectures. We present a biologically realistic system that receives inputs from synthetic eye movements - saccades, and processes them with neurons incorporating dynamics of neocortical neurons. We introduce a procedurally generated visual oddity dataset to train an architecture extending conventional relational networks and our proposed system. Both approaches surpass the human accuracy, and we uncover that both share the same essential underlying mechanism of reasoning. Finally, we show that the biologically inspired network achieves superior accuracy, learns faster and requires fewer parameters than the conventional network.
△ Less
Submitted 28 September, 2022;
originally announced September 2022.
-
Towards efficient end-to-end speech recognition with biologically-inspired neural networks
Authors:
Thomas Bohnstingl,
Ayush Garg,
Stanisław Woźniak,
George Saon,
Evangelos Eleftheriou,
Angeliki Pantazi
Abstract:
Automatic speech recognition (ASR) is a capability which enables a program to process human speech into a written form. Recent developments in artificial intelligence (AI) have led to high-accuracy ASR systems based on deep neural networks, such as the recurrent neural network transducer (RNN-T). However, the core components and the performed operations of these approaches depart from the powerful…
▽ More
Automatic speech recognition (ASR) is a capability which enables a program to process human speech into a written form. Recent developments in artificial intelligence (AI) have led to high-accuracy ASR systems based on deep neural networks, such as the recurrent neural network transducer (RNN-T). However, the core components and the performed operations of these approaches depart from the powerful biological counterpart, i.e., the human brain. On the other hand, the current developments in biologically-inspired ASR models, based on spiking neural networks (SNNs), lag behind in terms of accuracy and focus primarily on small scale applications. In this work, we revisit the incorporation of biologically-plausible models into deep learning and we substantially enhance their capabilities, by taking inspiration from the diverse neural and synaptic dynamics found in the brain. In particular, we introduce neural connectivity concepts emulating the axo-somatic and the axo-axonic synapses. Based on this, we propose novel deep learning units with enriched neuro-synaptic dynamics and integrate them into the RNN-T architecture. We demonstrate for the first time, that a biologically realistic implementation of a large-scale ASR model can yield competitive performance levels compared to the existing deep learning models. Specifically, we show that such an implementation bears several advantages, such as a reduced computational cost and a lower latency, which are critical for speech recognition applications.
△ Less
Submitted 4 November, 2021; v1 submitted 4 October, 2021;
originally announced October 2021.
-
Learning in Deep Neural Networks Using a Biologically Inspired Optimizer
Authors:
Giorgia Dellaferrera,
Stanislaw Wozniak,
Giacomo Indiveri,
Angeliki Pantazi,
Evangelos Eleftheriou
Abstract:
Plasticity circuits in the brain are known to be influenced by the distribution of the synaptic weights through the mechanisms of synaptic integration and local regulation of synaptic strength. However, the complex interplay of stimulation-dependent plasticity with local learning signals is disregarded by most of the artificial neural network training algorithms devised so far. Here, we propose a…
▽ More
Plasticity circuits in the brain are known to be influenced by the distribution of the synaptic weights through the mechanisms of synaptic integration and local regulation of synaptic strength. However, the complex interplay of stimulation-dependent plasticity with local learning signals is disregarded by most of the artificial neural network training algorithms devised so far. Here, we propose a novel biologically inspired optimizer for artificial (ANNs) and spiking neural networks (SNNs) that incorporates key principles of synaptic integration observed in dendrites of cortical neurons: GRAPES (Group Responsibility for Adjusting the Propagation of Error Signals). GRAPES implements a weight-distribution dependent modulation of the error signal at each node of the neural network. We show that this biologically inspired mechanism leads to a systematic improvement of the convergence rate of the network, and substantially improves classification accuracy of ANNs and SNNs with both feedforward and recurrent architectures. Furthermore, we demonstrate that GRAPES supports performance scalability for models of increasing complexity and mitigates catastrophic forgetting by enabling networks to generalize to unseen tasks based on previously acquired knowledge. The local characteristics of GRAPES minimize the required memory resources, making it optimally suited for dedicated hardware implementations. Overall, our work indicates that reconciling neurophysiology insights with machine intelligence is key to boosting the performance of neural networks.
△ Less
Submitted 23 April, 2021;
originally announced April 2021.
-
Optimality of short-term synaptic plasticity in modelling certain dynamic environments
Authors:
Timoleon Moraitis,
Abu Sebastian,
Evangelos Eleftheriou
Abstract:
Biological neurons and their in-silico emulations for neuromorphic artificial intelligence (AI) use extraordinarily energy-efficient mechanisms, such as spike-based communication and local synaptic plasticity. It remains unclear whether these neuronal mechanisms only offer efficiency or also underlie the superiority of biological intelligence. Here, we prove rigorously that, indeed, the Bayes-opti…
▽ More
Biological neurons and their in-silico emulations for neuromorphic artificial intelligence (AI) use extraordinarily energy-efficient mechanisms, such as spike-based communication and local synaptic plasticity. It remains unclear whether these neuronal mechanisms only offer efficiency or also underlie the superiority of biological intelligence. Here, we prove rigorously that, indeed, the Bayes-optimal prediction and inference of randomly but continuously transforming environments, a common natural setting, relies on short-term spike-timing-dependent plasticity, a hallmark of biological synapses. Further, this dynamic Bayesian inference through plasticity enables circuits of the cerebral cortex in simulations to recognize previously unseen, highly distorted dynamic stimuli. Strikingly, this also introduces a biologically-modelled AI, the first to overcome multiple limitations of deep learning and outperform artificial neural networks in a visual task. The cortical-like network is spiking and event-based, trained only with unsupervised and local plasticity, on a small, narrow, and static training dataset, but achieves recognition of unseen, transformed, and dynamic data better than deep neural networks with continuous activations, trained with supervised backpropagation on the transforming data. These results link short-term plasticity to high-level cortical function, suggest optimality of natural intelligence for natural environments, and repurpose neuromorphic AI from mere efficiency to computational supremacy altogether.
△ Less
Submitted 15 June, 2021; v1 submitted 14 September, 2020;
originally announced September 2020.
-
Online Spatio-Temporal Learning in Deep Neural Networks
Authors:
Thomas Bohnstingl,
Stanisław Woźniak,
Wolfgang Maass,
Angeliki Pantazi,
Evangelos Eleftheriou
Abstract:
Biological neural networks are equipped with an inherent capability to continuously adapt through online learning. This aspect remains in stark contrast to learning with error backpropagation through time (BPTT) applied to recurrent neural networks (RNNs), or recently to biologically-inspired spiking neural networks (SNNs). BPTT involves offline computation of the gradients due to the requirement…
▽ More
Biological neural networks are equipped with an inherent capability to continuously adapt through online learning. This aspect remains in stark contrast to learning with error backpropagation through time (BPTT) applied to recurrent neural networks (RNNs), or recently to biologically-inspired spiking neural networks (SNNs). BPTT involves offline computation of the gradients due to the requirement to unroll the network through time. Online learning has recently regained the attention of the research community, focusing either on approaches that approximate BPTT or on biologically-plausible schemes applied to SNNs. Here we present an alternative perspective that is based on a clear separation of spatial and temporal gradient components. Combined with insights from biology, we derive from first principles a novel online learning algorithm for deep SNNs, called online spatio-temporal learning (OSTL). For shallow networks, OSTL is gradient-equivalent to BPTT enabling for the first time online training of SNNs with BPTT-equivalent gradients. In addition, the proposed formulation unveils a class of SNN architectures trainable online at low time complexity. Moreover, we extend OSTL to a generic form, applicable to a wide range of network architectures, including networks comprising long short-term memory (LSTM) and gated recurrent units (GRU). We demonstrate the operation of our algorithm on various tasks from language modelling to speech recognition and obtain results on par with the BPTT baselines. The proposed algorithm provides a framework for developing succinct and efficient online training approaches for SNNs and in general deep RNNs.
△ Less
Submitted 8 October, 2020; v1 submitted 24 July, 2020;
originally announced July 2020.
-
Accurate Emulation of Memristive Crossbar Arrays for In-Memory Computing
Authors:
Anastasios Petropoulos,
Irem Boybat,
Manuel Le Gallo,
Evangelos Eleftheriou,
Abu Sebastian,
Theodore Antonakopoulos
Abstract:
In-memory computing is an emerging non-von Neumann computing paradigm where certain computational tasks are performed in memory by exploiting the physical attributes of the memory devices. Memristive devices such as phase-change memory (PCM), where information is stored in terms of their conductance levels, are especially well suited for in-memory computing. In particular, memristive devices, when…
▽ More
In-memory computing is an emerging non-von Neumann computing paradigm where certain computational tasks are performed in memory by exploiting the physical attributes of the memory devices. Memristive devices such as phase-change memory (PCM), where information is stored in terms of their conductance levels, are especially well suited for in-memory computing. In particular, memristive devices, when organized in a crossbar configuration can be used to perform matrix-vector multiply operations by exploiting Kirchhoff's circuit laws. To explore the feasibility of such in-memory computing cores in applications such as deep learning as well as for system-level architectural exploration, it is highly desirable to develop an accurate hardware emulator that captures the key physical attributes of the memristive devices. Here, we present one such emulator for PCM and experimentally validate it using measurements from a PCM prototype chip. Moreover, we present an application of the emulator for neural network inference where our emulator can capture the conductance evolution of approximately 400,000 PCM devices remarkably well.
△ Less
Submitted 6 April, 2020;
originally announced April 2020.
-
ESSOP: Efficient and Scalable Stochastic Outer Product Architecture for Deep Learning
Authors:
Vinay Joshi,
Geethan Karunaratne,
Manuel Le Gallo,
Irem Boybat,
Christophe Piveteau,
Abu Sebastian,
Bipin Rajendran,
Evangelos Eleftheriou
Abstract:
Deep neural networks (DNNs) have surpassed human-level accuracy in a variety of cognitive tasks but at the cost of significant memory/time requirements in DNN training. This limits their deployment in energy and memory limited applications that require real-time learning. Matrix-vector multiplications (MVM) and vector-vector outer product (VVOP) are the two most expensive operations associated wit…
▽ More
Deep neural networks (DNNs) have surpassed human-level accuracy in a variety of cognitive tasks but at the cost of significant memory/time requirements in DNN training. This limits their deployment in energy and memory limited applications that require real-time learning. Matrix-vector multiplications (MVM) and vector-vector outer product (VVOP) are the two most expensive operations associated with the training of DNNs. Strategies to improve the efficiency of MVM computation in hardware have been demonstrated with minimal impact on training accuracy. However, the VVOP computation remains a relatively less explored bottleneck even with the aforementioned strategies. Stochastic computing (SC) has been proposed to improve the efficiency of VVOP computation but on relatively shallow networks with bounded activation functions and floating-point (FP) scaling of activation gradients. In this paper, we propose ESSOP, an efficient and scalable stochastic outer product architecture based on the SC paradigm. We introduce efficient techniques to generalize SC for weight update computation in DNNs with the unbounded activation functions (e.g., ReLU), required by many state-of-the-art networks. Our architecture reduces the computational cost by re-using random numbers and replacing certain FP multiplication operations by bit shift scaling. We show that the ResNet-32 network with 33 convolution layers and a fully-connected layer can be trained with ESSOP on the CIFAR-10 dataset to achieve baseline comparable accuracy. Hardware design of ESSOP at 14nm technology node shows that, compared to a highly pipelined FP16 multiplier design, ESSOP is 82.2% and 93.7% better in energy and area efficiency respectively for outer product computation.
△ Less
Submitted 25 March, 2020;
originally announced March 2020.
-
Compiling Neural Networks for a Computational Memory Accelerator
Authors:
Kornilios Kourtis,
Martino Dazzi,
Nikolas Ioannou,
Tobias Grosser,
Abu Sebastian,
Evangelos Eleftheriou
Abstract:
Computational memory (CM) is a promising approach for accelerating inference on neural networks (NN) by using enhanced memories that, in addition to storing data, allow computations on them. One of the main challenges of this approach is defining a hardware/software interface that allows a compiler to map NN models for efficient execution on the underlying CM accelerator. This is a non-trivial tas…
▽ More
Computational memory (CM) is a promising approach for accelerating inference on neural networks (NN) by using enhanced memories that, in addition to storing data, allow computations on them. One of the main challenges of this approach is defining a hardware/software interface that allows a compiler to map NN models for efficient execution on the underlying CM accelerator. This is a non-trivial task because efficiency dictates that the CM accelerator is explicitly programmed as a dataflow engine where the execution of the different NN layers form a pipeline. In this paper, we present our work towards a software stack for executing ML models on such a multi-core CM accelerator. We describe an architecture for the hardware and software, and focus on the problem of implementing the appropriate control logic so that data dependencies are respected. We propose a solution to the latter that is based on polyhedral compilation.
△ Less
Submitted 24 April, 2020; v1 submitted 5 March, 2020;
originally announced March 2020.
-
Mixed-precision deep learning based on computational memory
Authors:
S. R. Nandakumar,
Manuel Le Gallo,
Christophe Piveteau,
Vinay Joshi,
Giovanni Mariani,
Irem Boybat,
Geethan Karunaratne,
Riduan Khaddam-Aljameh,
Urs Egger,
Anastasios Petropoulos,
Theodore Antonakopoulos,
Bipin Rajendran,
Abu Sebastian,
Evangelos Eleftheriou
Abstract:
Deep neural networks (DNNs) have revolutionized the field of artificial intelligence and have achieved unprecedented success in cognitive tasks such as image and speech recognition. Training of large DNNs, however, is computationally intensive and this has motivated the search for novel computing architectures targeting this application. A computational memory unit with nanoscale resistive memory…
▽ More
Deep neural networks (DNNs) have revolutionized the field of artificial intelligence and have achieved unprecedented success in cognitive tasks such as image and speech recognition. Training of large DNNs, however, is computationally intensive and this has motivated the search for novel computing architectures targeting this application. A computational memory unit with nanoscale resistive memory devices organized in crossbar arrays could store the synaptic weights in their conductance states and perform the expensive weighted summations in place in a non-von Neumann manner. However, updating the conductance states in a reliable manner during the weight update process is a fundamental challenge that limits the training accuracy of such an implementation. Here, we propose a mixed-precision architecture that combines a computational memory unit performing the weighted summations and imprecise conductance updates with a digital processing unit that accumulates the weight updates in high precision. A combined hardware/software training experiment of a multilayer perceptron based on the proposed architecture using a phase-change memory (PCM) array achieves 97.73% test accuracy on the task of classifying handwritten digits (based on the MNIST dataset), within 0.6% of the software baseline. The architecture is further evaluated using accurate behavioral models of PCM on a wide class of networks, namely convolutional neural networks, long-short-term-memory networks, and generative-adversarial networks. Accuracies comparable to those of floating-point implementations are achieved without being constrained by the non-idealities associated with the PCM devices. A system-level study demonstrates 173x improvement in energy efficiency of the architecture when used for training a multilayer perceptron compared with a dedicated fully digital 32-bit implementation.
△ Less
Submitted 31 January, 2020;
originally announced January 2020.
-
5 Parallel Prism: A topology for pipelined implementations of convolutional neural networks using computational memory
Authors:
Martino Dazzi,
Abu Sebastian,
Pier Andrea Francese,
Thomas Parnell,
Luca Benini,
Evangelos Eleftheriou
Abstract:
In-memory computing is an emerging computing paradigm that could enable deeplearning inference at significantly higher energy efficiency and reduced latency. The essential idea is to map the synaptic weights corresponding to each layer to one or more computational memory (CM) cores. During inference, these cores perform the associated matrix-vector multiply operations in place with O(1) time compl…
▽ More
In-memory computing is an emerging computing paradigm that could enable deeplearning inference at significantly higher energy efficiency and reduced latency. The essential idea is to map the synaptic weights corresponding to each layer to one or more computational memory (CM) cores. During inference, these cores perform the associated matrix-vector multiply operations in place with O(1) time complexity, thus obviating the need to move the synaptic weights to an additional processing unit. Moreover, this architecture could enable the execution of these networks in a highly pipelined fashion. However, a key challenge is to design an efficient communication fabric for the CM cores. Here, we present one such communication fabric based on a graph topology that is well suited for the widely successful convolutional neural networks (CNNs). We show that this communication fabric facilitates the pipelined execution of all state of-the-art CNNs by proving the existence of a homomorphism between one graph representation of these networks and the proposed graph topology. We then present a quantitative comparison with established communication topologies and show that our proposed topology achieves the lowest bandwidth requirements per communication channel. Finally, we present a concrete example of mapping ResNet-32 onto an array of CM cores.
△ Less
Submitted 8 June, 2019;
originally announced June 2019.
-
Accurate deep neural network inference using computational phase-change memory
Authors:
Vinay Joshi,
Manuel Le Gallo,
Simon Haefeli,
Irem Boybat,
S. R. Nandakumar,
Christophe Piveteau,
Martino Dazzi,
Bipin Rajendran,
Abu Sebastian,
Evangelos Eleftheriou
Abstract:
In-memory computing is a promising non-von Neumann approach for making energy-efficient deep learning inference hardware. Crossbar arrays of resistive memory devices can be used to encode the network weights and perform efficient analog matrix-vector multiplications without intermediate movements of data. However, due to device variability and noise, the network needs to be trained in a specific w…
▽ More
In-memory computing is a promising non-von Neumann approach for making energy-efficient deep learning inference hardware. Crossbar arrays of resistive memory devices can be used to encode the network weights and perform efficient analog matrix-vector multiplications without intermediate movements of data. However, due to device variability and noise, the network needs to be trained in a specific way so that transferring the digitally trained weights to the analog resistive memory devices will not result in significant loss of accuracy. Here, we introduce a methodology to train ResNet-type convolutional neural networks that results in no appreciable accuracy loss when transferring weights to in-memory computing hardware based on phase-change memory (PCM). We also propose a compensation technique that exploits the batch normalization parameters to improve the accuracy retention over time. We achieve a classification accuracy of 93.7% on the CIFAR-10 dataset and a top-1 accuracy on the ImageNet benchmark of 71.6% after mapping the trained weights to PCM. Our hardware results on CIFAR-10 with ResNet-32 demonstrate an accuracy above 93.5% retained over a one day period, where each of the 361,722 synaptic weights of the network is programmed on just two PCM devices organized in a differential configuration.
△ Less
Submitted 11 April, 2020; v1 submitted 7 June, 2019;
originally announced June 2019.
-
Supervised Learning in Spiking Neural Networks with Phase-Change Memory Synapses
Authors:
S. R. Nandakumar,
Irem Boybat,
Manuel Le Gallo,
Evangelos Eleftheriou,
Abu Sebastian,
Bipin Rajendran
Abstract:
Spiking neural networks (SNN) are artificial computational models that have been inspired by the brain's ability to naturally encode and process information in the time domain. The added temporal dimension is believed to render them more computationally efficient than the conventional artificial neural networks, though their full computational capabilities are yet to be explored. Recently, computa…
▽ More
Spiking neural networks (SNN) are artificial computational models that have been inspired by the brain's ability to naturally encode and process information in the time domain. The added temporal dimension is believed to render them more computationally efficient than the conventional artificial neural networks, though their full computational capabilities are yet to be explored. Recently, computational memory architectures based on non-volatile memory crossbar arrays have shown great promise to implement parallel computations in artificial and spiking neural networks. In this work, we experimentally demonstrate for the first time, the feasibility to realize high-performance event-driven in-situ supervised learning systems using nanoscale and stochastic phase-change synapses. Our SNN is trained to recognize audio signals of alphabets encoded using spikes in the time domain and to generate spike trains at precise time instances to represent the pixel intensities of their corresponding images. Moreover, with a statistical model capturing the experimental behavior of the devices, we investigate architectural and systems-level solutions for improving the training and inference performance of our computational memory-based system. Combining the computational potential of supervised SNNs with the parallel compute power of computational memory, the work paves the way for next-generation of efficient brain-inspired systems.
△ Less
Submitted 28 May, 2019;
originally announced May 2019.
-
Low-Power Neuromorphic Hardware for Signal Processing Applications
Authors:
Bipin Rajendran,
Abu Sebastian,
Michael Schmuker,
Narayan Srinivasa,
Evangelos Eleftheriou
Abstract:
Machine learning has emerged as the dominant tool for implementing complex cognitive tasks that require supervised, unsupervised, and reinforcement learning. While the resulting machines have demonstrated in some cases even super-human performance, their energy consumption has often proved to be prohibitive in the absence of costly super-computers. Most state-of-the-art machine learning solutions…
▽ More
Machine learning has emerged as the dominant tool for implementing complex cognitive tasks that require supervised, unsupervised, and reinforcement learning. While the resulting machines have demonstrated in some cases even super-human performance, their energy consumption has often proved to be prohibitive in the absence of costly super-computers. Most state-of-the-art machine learning solutions are based on memory-less models of neurons. This is unlike the neurons in the human brain, which encode and process information using temporal information in spike events. The different computing principles underlying biological neurons and how they combine together to efficiently process information is believed to be a key factor behind their superior efficiency compared to current machine learning systems. Inspired by the time-encoding mechanism used by the brain, third generation spiking neural networks (SNNs) are being studied for building a new class of information processing engines.
Modern computing systems based on the von Neumann architecture, however, are ill-suited for efficiently implementing SNNs, since their performance is limited by the need to constantly shuttle data between physically separated logic and memory units. Hence, novel computational architectures that address the von Neumann bottleneck are necessary in order to build systems that can implement SNNs with low energy budgets. In this paper, we review some of the architectural and system level design aspects involved in developing a new class of brain-inspired information processing engines that mimic the time-based information encoding and processing aspects of the brain.
△ Less
Submitted 5 August, 2019; v1 submitted 11 January, 2019;
originally announced January 2019.
-
Deep learning incorporating biologically-inspired neural dynamics
Authors:
Stanisław Woźniak,
Angeliki Pantazi,
Thomas Bohnstingl,
Evangelos Eleftheriou
Abstract:
Neural networks have become the key technology of artificial intelligence and have contributed to breakthroughs in several machine learning tasks, primarily owing to advances in deep learning applied to Artificial Neural Networks (ANNs). Simultaneously, Spiking Neural Networks (SNNs) incorporating biologically-feasible spiking neurons have held great promise because of their rich temporal dynamics…
▽ More
Neural networks have become the key technology of artificial intelligence and have contributed to breakthroughs in several machine learning tasks, primarily owing to advances in deep learning applied to Artificial Neural Networks (ANNs). Simultaneously, Spiking Neural Networks (SNNs) incorporating biologically-feasible spiking neurons have held great promise because of their rich temporal dynamics and high-power efficiency. However, the developments in SNNs were proceeding separately from those in ANNs, effectively limiting the adoption of deep learning research insights. Here we show an alternative perspective on the spiking neuron that casts it as a particular ANN construct called Spiking Neural Unit (SNU), and a soft SNU (sSNU) variant that generalizes its dynamics to a novel recurrent ANN unit. SNUs bridge the biologically-inspired SNNs with ANNs and provide a methodology for seamless inclusion of spiking neurons in deep learning architectures. Furthermore, SNU enables highly-efficient in-memory acceleration of SNNs trained with backpropagation through time, implemented with the hardware in-the-loop. We apply SNUs to tasks ranging from hand-written digit recognition, language modelling, to music prediction. We obtain accuracy comparable to, or better than, that of state-of-the-art ANNs, and we experimentally verify the efficacy of the in-memory-based SNN realization for the music-prediction task using 52,800 phase-change memory devices. The new generation of neural units introduced in this paper incorporate biologically-inspired neural dynamics in deep learning. In addition, they provide a systematic methodology for training neuromorphic computing hardware. Thus, they open a new avenue for a widespread adoption of SNNs in practical applications.
△ Less
Submitted 19 May, 2019; v1 submitted 17 December, 2018;
originally announced December 2018.
-
Mixed-precision training of deep neural networks using computational memory
Authors:
Nandakumar S. R.,
Manuel Le Gallo,
Irem Boybat,
Bipin Rajendran,
Abu Sebastian,
Evangelos Eleftheriou
Abstract:
Deep neural networks have revolutionized the field of machine learning by providing unprecedented human-like performance in solving many real-world problems such as image and speech recognition. Training of large DNNs, however, is a computationally intensive task, and this necessitates the development of novel computing architectures targeting this application. A computational memory unit where re…
▽ More
Deep neural networks have revolutionized the field of machine learning by providing unprecedented human-like performance in solving many real-world problems such as image and speech recognition. Training of large DNNs, however, is a computationally intensive task, and this necessitates the development of novel computing architectures targeting this application. A computational memory unit where resistive memory devices are organized in crossbar arrays can be used to locally store the synaptic weights in their conductance states. The expensive multiply accumulate operations can be performed in place using Kirchhoff's circuit laws in a non-von Neumann manner. However, a key challenge remains the inability to alter the conductance states of the devices in a reliable manner during the weight update process. We propose a mixed-precision architecture that combines a computational memory unit storing the synaptic weights with a digital processing unit and an additional memory unit accumulating weight updates in high precision. The new architecture delivers classification accuracies comparable to those of floating-point implementations without being constrained by challenges associated with the non-ideal weight update characteristics of emerging resistive memories. A two layer neural network in which the computational memory unit is realized using non-linear stochastic models of phase-change memory devices achieves a test accuracy of 97.40% on the MNIST handwritten digit classification problem.
△ Less
Submitted 4 December, 2017;
originally announced December 2017.
-
Neuromorphic computing with multi-memristive synapses
Authors:
Irem Boybat,
Manuel Le Gallo,
S. R. Nandakumar,
Timoleon Moraitis,
Thomas Parnell,
Tomas Tuma,
Bipin Rajendran,
Yusuf Leblebici,
Abu Sebastian,
Evangelos Eleftheriou
Abstract:
Neuromorphic computing has emerged as a promising avenue towards building the next generation of intelligent computing systems. It has been proposed that memristive devices, which exhibit history-dependent conductivity modulation, could efficiently represent the synaptic weights in artificial neural networks. However, precise modulation of the device conductance over a wide dynamic range, necessar…
▽ More
Neuromorphic computing has emerged as a promising avenue towards building the next generation of intelligent computing systems. It has been proposed that memristive devices, which exhibit history-dependent conductivity modulation, could efficiently represent the synaptic weights in artificial neural networks. However, precise modulation of the device conductance over a wide dynamic range, necessary to maintain high network accuracy, is proving to be challenging. To address this, we present a multi-memristive synaptic architecture with an efficient global counter-based arbitration scheme. We focus on phase change memory devices, develop a comprehensive model and demonstrate via simulations the effectiveness of the concept for both spiking and non-spiking neural networks. Moreover, we present experimental results involving over a million phase change memory devices for unsupervised learning of temporal correlations using a spiking neural network. The work presents a significant step towards the realization of large-scale and energy-efficient neuromorphic computing systems.
△ Less
Submitted 24 February, 2019; v1 submitted 17 November, 2017;
originally announced November 2017.
-
Fatiguing STDP: Learning from Spike-Timing Codes in the Presence of Rate Codes
Authors:
Timoleon Moraitis,
Abu Sebastian,
Irem Boybat,
Manuel Le Gallo,
Tomas Tuma,
Evangelos Eleftheriou
Abstract:
Spiking neural networks (SNNs) could play a key role in unsupervised machine learning applications, by virtue of strengths related to learning from the fine temporal structure of event-based signals. However, some spike-timing-related strengths of SNNs are hindered by the sensitivity of spike-timing-dependent plasticity (STDP) rules to input spike rates, as fine temporal correlations may be obstru…
▽ More
Spiking neural networks (SNNs) could play a key role in unsupervised machine learning applications, by virtue of strengths related to learning from the fine temporal structure of event-based signals. However, some spike-timing-related strengths of SNNs are hindered by the sensitivity of spike-timing-dependent plasticity (STDP) rules to input spike rates, as fine temporal correlations may be obstructed by coarser correlations between firing rates. In this article, we propose a spike-timing-dependent learning rule that allows a neuron to learn from the temporally-coded information despite the presence of rate codes. Our long-term plasticity rule makes use of short-term synaptic fatigue dynamics. We show analytically that, in contrast to conventional STDP rules, our fatiguing STDP (FSTDP) helps learn the temporal code, and we derive the necessary conditions to optimize the learning process. We showcase the effectiveness of FSTDP in learning spike-timing correlations among processes of different rates in synthetic data. Finally, we use FSTDP to detect correlations in real-world weather data from the United States in an experimental realization of the algorithm that uses a neuromorphic hardware platform comprising phase-change memristive devices. Taken together, our analyses and demonstrations suggest that FSTDP paves the way for the exploitation of the spike-based strengths of SNNs in real-world applications.
△ Less
Submitted 17 June, 2017;
originally announced June 2017.
-
Temporal correlation detection using computational phase-change memory
Authors:
Abu Sebastian,
Tomas Tuma,
Nikolaos Papandreou,
Manuel Le Gallo,
Lukas Kull,
Thomas Parnell,
Evangelos Eleftheriou
Abstract:
For decades, conventional computers based on the von Neumann architecture have performed computation by repeatedly transferring data between their processing and their memory units, which are physically separated. As computation becomes increasingly data-centric and as the scalability limits in terms of performance and power are being reached, alternative computing paradigms are searched for in wh…
▽ More
For decades, conventional computers based on the von Neumann architecture have performed computation by repeatedly transferring data between their processing and their memory units, which are physically separated. As computation becomes increasingly data-centric and as the scalability limits in terms of performance and power are being reached, alternative computing paradigms are searched for in which computation and storage are collocated. A fascinating new approach is that of computational memory where the physics of nanoscale memory devices are used to perform certain computational tasks within the memory unit in a non-von Neumann manner. Here we present a large-scale experimental demonstration using one million phase-change memory devices organized to perform a high-level computational primitive by exploiting the crystallization dynamics. Also presented is an application of such a computational memory to process real-world data-sets. The results show that this co-existence of computation and storage at the nanometer scale could be the enabler for new, ultra-dense, low power, and massively parallel computing systems.
△ Less
Submitted 1 June, 2017;
originally announced June 2017.
-
Mixed-Precision In-Memory Computing
Authors:
Manuel Le Gallo,
Abu Sebastian,
Roland Mathis,
Matteo Manica,
Heiner Giefers,
Tomas Tuma,
Costas Bekas,
Alessandro Curioni,
Evangelos Eleftheriou
Abstract:
As CMOS scaling reaches its technological limits, a radical departure from traditional von Neumann systems, which involve separate processing and memory units, is needed in order to significantly extend the performance of today's computers. In-memory computing is a promising approach in which nanoscale resistive memory devices, organized in a computational memory unit, are used for both processing…
▽ More
As CMOS scaling reaches its technological limits, a radical departure from traditional von Neumann systems, which involve separate processing and memory units, is needed in order to significantly extend the performance of today's computers. In-memory computing is a promising approach in which nanoscale resistive memory devices, organized in a computational memory unit, are used for both processing and memory. However, to reach the numerical accuracy typically required for data analytics and scientific computing, limitations arising from device variability and non-ideal device characteristics need to be addressed. Here we introduce the concept of mixed-precision in-memory computing, which combines a von Neumann machine with a computational memory unit. In this hybrid system, the computational memory unit performs the bulk of a computational task, while the von Neumann machine implements a backward method to iteratively improve the accuracy of the solution. The system therefore benefits from both the high precision of digital computing and the energy/areal efficiency of in-memory computing. We experimentally demonstrate the efficacy of the approach by accurately solving systems of linear equations, in particular, a system of 5,000 equations using 998,752 phase-change memory devices.
△ Less
Submitted 4 October, 2018; v1 submitted 16 January, 2017;
originally announced January 2017.