subscribe to arXiv mailings

arXiv:2406.11778 [pdf, other]

Brain-inspired Computational Modeling of Action Recognition with Recurrent Spiking Neural Networks Equipped with Reinforcement Delay Learning

Authors: Alireza Nadafian, Milad Mozafari, Timothée Masquelier, Mohammad Ganjtabesh

Abstract: The growing interest in brain-inspired computational models arises from the remarkable problem-solving efficiency of the human brain. Action recognition, a complex task in computational neuroscience, has received significant attention due to both its intricate nature and the brain's exceptional performance in this area. Nevertheless, current solutions for action recognition either exhibit limitati… ▽ More The growing interest in brain-inspired computational models arises from the remarkable problem-solving efficiency of the human brain. Action recognition, a complex task in computational neuroscience, has received significant attention due to both its intricate nature and the brain's exceptional performance in this area. Nevertheless, current solutions for action recognition either exhibit limitations in effectively addressing the problem or lack the necessary biological plausibility. Deep neural networks, for instance, demonstrate acceptable performance but deviate from biological evidence, thereby restricting their suitability for brain-inspired computational studies. On the other hand, the majority of brain-inspired models proposed for action recognition exhibit significantly lower effectiveness compared to deep models and fail to achieve human-level performance. This deficiency can be attributed to their disregard for the underlying mechanisms of the brain. In this article, we present an effective brain-inspired computational model for action recognition. We equip our model with novel biologically plausible mechanisms for spiking neural networks that are crucial for learning spatio-temporal patterns. The key idea behind these new mechanisms is to bridge the gap between the brain's capabilities and action recognition tasks by integrating key biological principles into our computational framework. Furthermore, we evaluate the performance of our model against other models using a benchmark dataset for action recognition, DVS-128 Gesture. The results show that our model outperforms previous biologically plausible models and competes with deep supervised models. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2310.16620 [pdf, other]

SpikingJelly: An open-source machine learning infrastructure platform for spike-based intelligence

Authors: Wei Fang, Yanqi Chen, Jianhao Ding, Zhaofei Yu, Timothée Masquelier, Ding Chen, Liwei Huang, Huihui Zhou, Guoqi Li, Yonghong Tian

Abstract: Spiking neural networks (SNNs) aim to realize brain-inspired intelligence on neuromorphic chips with high energy efficiency by introducing neural dynamics and spike properties. As the emerging spiking deep learning paradigm attracts increasing interest, traditional programming frameworks cannot meet the demands of the automatic differentiation, parallel computation acceleration, and high integrati… ▽ More Spiking neural networks (SNNs) aim to realize brain-inspired intelligence on neuromorphic chips with high energy efficiency by introducing neural dynamics and spike properties. As the emerging spiking deep learning paradigm attracts increasing interest, traditional programming frameworks cannot meet the demands of the automatic differentiation, parallel computation acceleration, and high integration of processing neuromorphic datasets and deployment. In this work, we present the SpikingJelly framework to address the aforementioned dilemma. We contribute a full-stack toolkit for pre-processing neuromorphic datasets, building deep SNNs, optimizing their parameters, and deploying SNNs on neuromorphic chips. Compared to existing methods, the training of deep SNNs can be accelerated $11\times$, and the superior extensibility and flexibility of SpikingJelly enable users to accelerate custom models at low costs through multilevel inheritance and semiautomatic code generation. SpikingJelly paves the way for synthesizing truly energy-efficient SNN-based machine intelligence systems, which will enrich the ecology of neuromorphic computing. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: Accepted in Science Advances (https://www.science.org/doi/10.1126/sciadv.adi1480)

arXiv:2309.13972 [pdf, ps, other]

Audio classification with Dilated Convolution with Learnable Spacings

Authors: Ismail Khalfaoui-Hassani, Timothée Masquelier, Thomas Pellegrini

Abstract: Dilated convolution with learnable spacings (DCLS) is a recent convolution method in which the positions of the kernel elements are learned throughout training by backpropagation. Its interest has recently been demonstrated in computer vision (ImageNet classification and downstream tasks). Here we show that DCLS is also useful for audio tagging using the AudioSet classification benchmark. We took… ▽ More Dilated convolution with learnable spacings (DCLS) is a recent convolution method in which the positions of the kernel elements are learned throughout training by backpropagation. Its interest has recently been demonstrated in computer vision (ImageNet classification and downstream tasks). Here we show that DCLS is also useful for audio tagging using the AudioSet classification benchmark. We took two state-of-the-art convolutional architectures using depthwise separable convolutions (DSC), ConvNeXt and ConvFormer, and a hybrid one using attention in addition, FastViT, and drop-in replaced all the DSC layers by DCLS ones. This significantly improved the mean average precision (mAP) with the three architectures without increasing the number of parameters and with only a low cost on the throughput. The method code is based on PyTorch and is available at https://github.com/K-H-Ismail/DCLS-Audio △ Less

Submitted 22 November, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

arXiv:2306.17670 [pdf, other]

Learning Delays in Spiking Neural Networks using Dilated Convolutions with Learnable Spacings

Authors: Ilyass Hammouamri, Ismail Khalfaoui-Hassani, Timothée Masquelier

Abstract: Spiking Neural Networks (SNNs) are a promising research direction for building power-efficient information processing systems, especially for temporal tasks such as speech recognition. In SNNs, delays refer to the time needed for one spike to travel from one neuron to another. These delays matter because they influence the spike arrival times, and it is well-known that spiking neurons respond more… ▽ More Spiking Neural Networks (SNNs) are a promising research direction for building power-efficient information processing systems, especially for temporal tasks such as speech recognition. In SNNs, delays refer to the time needed for one spike to travel from one neuron to another. These delays matter because they influence the spike arrival times, and it is well-known that spiking neurons respond more strongly to coincident input spikes. More formally, it has been shown theoretically that plastic delays greatly increase the expressivity in SNNs. Yet, efficient algorithms to learn these delays have been lacking. Here, we propose a new discrete-time algorithm that addresses this issue in deep feedforward SNNs using backpropagation, in an offline manner. To simulate delays between consecutive layers, we use 1D convolutions across time. The kernels contain only a few non-zero weights - one per synapse - whose positions correspond to the delays. These positions are learned together with the weights using the recently proposed Dilated Convolution with Learnable Spacings (DCLS). We evaluated our method on three datasets: the Spiking Heidelberg Dataset (SHD), the Spiking Speech Commands (SSC) and its non-spiking version Google Speech Commands v0.02 (GSC) benchmarks, which require detecting temporal patterns. We used feedforward SNNs with two or three hidden fully connected layers, and vanilla leaky integrate-and-fire neurons. We showed that fixed random delays help and that learning them helps even more. Furthermore, our method outperformed the state-of-the-art in the three datasets without using recurrent connections and with substantially fewer parameters. Our work demonstrates the potential of delay learning in developing accurate and precise models for temporal data processing. Our code is based on PyTorch / SpikingJelly and available at: https://github.com/Thvnvtos/SNN-delays △ Less

Submitted 1 December, 2023; v1 submitted 30 June, 2023; originally announced June 2023.

Journal ref: ICLR 2024

arXiv:2306.00830 [pdf, ps, other]

Adapting a ConvNeXt model to audio classification on AudioSet

Authors: Thomas Pellegrini, Ismail Khalfaoui-Hassani, Etienne Labbé, Timothée Masquelier

Abstract: In computer vision, convolutional neural networks (CNN) such as ConvNeXt, have been able to surpass state-of-the-art transformers, partly thanks to depthwise separable convolutions (DSC). DSC, as an approximation of the regular convolution, has made CNNs more efficient in time and memory complexity without deteriorating their accuracy, and sometimes even improving it. In this paper, we first imple… ▽ More In computer vision, convolutional neural networks (CNN) such as ConvNeXt, have been able to surpass state-of-the-art transformers, partly thanks to depthwise separable convolutions (DSC). DSC, as an approximation of the regular convolution, has made CNNs more efficient in time and memory complexity without deteriorating their accuracy, and sometimes even improving it. In this paper, we first implement DSC into the Pretrained Audio Neural Networks (PANN) family for audio classification on AudioSet, to show its benefits in terms of accuracy/model size trade-off. Second, we adapt the now famous ConvNeXt model to the same task. It rapidly overfits, so we report on techniques that improve the learning process. Our best ConvNeXt model reached 0.471 mean-average precision on AudioSet, which is better than or equivalent to recent large audio transformers, while using three times less parameters. We also achieved positive results in audio captioning and audio retrieval with this model. Our PyTorch source code and checkpoint models are available at https://github.com/topel/audioset-convnext-inf. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: Accepted at INTERSPEECH 2023

arXiv:2306.00817 [pdf, other]

Dilated Convolution with Learnable Spacings: beyond bilinear interpolation

Authors: Ismail Khalfaoui-Hassani, Thomas Pellegrini, Timothée Masquelier

Abstract: Dilated Convolution with Learnable Spacings (DCLS) is a recently proposed variation of the dilated convolution in which the spacings between the non-zero elements in the kernel, or equivalently their positions, are learnable. Non-integer positions are handled via interpolation. Thanks to this trick, positions have well-defined gradients. The original DCLS used bilinear interpolation, and thus only… ▽ More Dilated Convolution with Learnable Spacings (DCLS) is a recently proposed variation of the dilated convolution in which the spacings between the non-zero elements in the kernel, or equivalently their positions, are learnable. Non-integer positions are handled via interpolation. Thanks to this trick, positions have well-defined gradients. The original DCLS used bilinear interpolation, and thus only considered the four nearest pixels. Yet here we show that longer range interpolations, and in particular a Gaussian interpolation, allow improving performance on ImageNet1k classification on two state-of-the-art convolutional architectures (ConvNeXt and Conv\-Former), without increasing the number of parameters. The method code is based on PyTorch and is available at https://github.com/K-H-Ismail/Dilated-Convolution-with-Learnable-Spacings-PyTorch △ Less

Submitted 22 September, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: Published in ICML 2023 Workshop on Differentiable Almost Everything: Differentiable Relaxations, Algorithms, Operators, and Simulators. 2023

arXiv:2304.12760 [pdf, other]

Parallel Spiking Neurons with High Efficiency and Ability to Learn Long-term Dependencies

Authors: Wei Fang, Zhaofei Yu, Zhaokun Zhou, Ding Chen, Yanqi Chen, Zhengyu Ma, Timothée Masquelier, Yonghong Tian

Abstract: Vanilla spiking neurons in Spiking Neural Networks (SNNs) use charge-fire-reset neuronal dynamics, which can only be simulated serially and can hardly learn long-time dependencies. We find that when removing reset, the neuronal dynamics can be reformulated in a non-iterative form and parallelized. By rewriting neuronal dynamics without reset to a general formulation, we propose the Parallel Spikin… ▽ More Vanilla spiking neurons in Spiking Neural Networks (SNNs) use charge-fire-reset neuronal dynamics, which can only be simulated serially and can hardly learn long-time dependencies. We find that when removing reset, the neuronal dynamics can be reformulated in a non-iterative form and parallelized. By rewriting neuronal dynamics without reset to a general formulation, we propose the Parallel Spiking Neuron (PSN), which generates hidden states that are independent of their predecessors, resulting in parallelizable neuronal dynamics and extremely high simulation speed. The weights of inputs in the PSN are fully connected, which maximizes the utilization of temporal information. To avoid the use of future inputs for step-by-step inference, the weights of the PSN can be masked, resulting in the masked PSN. By sharing weights across time-steps based on the masked PSN, the sliding PSN is proposed to handle sequences of varying lengths. We evaluate the PSN family on simulation speed and temporal/static data classification, and the results show the overwhelming advantage of the PSN family in efficiency and accuracy. To the best of our knowledge, this is the first study about parallelizing spiking neurons and can be a cornerstone for the spiking deep learning research. Our codes are available at \url{https://github.com/fangwei123456/Parallel-Spiking-Neuron}. △ Less

Submitted 9 January, 2024; v1 submitted 25 April, 2023; originally announced April 2023.

Comments: Accepted in NeurIPS 2023

arXiv:2302.06492 [pdf, other]

doi 10.3389/fnins.2023.1160034

Optical flow estimation from event-based cameras and spiking neural networks

Authors: Javier Cuadrado, Ulysse Rançon, Benoît Cottereau, Francisco Barranco, Timothée Masquelier

Abstract: Event-based cameras are raising interest within the computer vision community. These sensors operate with asynchronous pixels, emitting events, or "spikes", when the luminance change at a given pixel since the last event surpasses a certain threshold. Thanks to their inherent qualities, such as their low power consumption, low latency and high dynamic range, they seem particularly tailored to appl… ▽ More Event-based cameras are raising interest within the computer vision community. These sensors operate with asynchronous pixels, emitting events, or "spikes", when the luminance change at a given pixel since the last event surpasses a certain threshold. Thanks to their inherent qualities, such as their low power consumption, low latency and high dynamic range, they seem particularly tailored to applications with challenging temporal constraints and safety requirements. Event-based sensors are an excellent fit for Spiking Neural Networks (SNNs), since the coupling of an asynchronous sensor with neuromorphic hardware can yield real-time systems with minimal power requirements. In this work, we seek to develop one such system, using both event sensor data from the DSEC dataset and spiking neural networks to estimate optical flow for driving scenarios. We propose a U-Net-like SNN which, after supervised training, is able to make dense optical flow estimations. To do so, we encourage both minimal norm for the error vector and minimal angle between ground-truth and predicted flow, training our model with back-propagation using a surrogate gradient. In addition, the use of 3d convolutions allows us to capture the dynamic nature of the data by increasing the temporal receptive fields. Upsampling after each decoding stage ensures that each decoder's output contributes to the final estimation. Thanks to separable convolutions, we have been able to develop a light model (when compared to competitors) that can nonetheless yield reasonably accurate optical flow estimates. △ Less

Submitted 17 May, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

Comments: 9 pages, 3 figures and 3 tables, plus Supplementary Materials

arXiv:2210.14151 [pdf, other]

Drastically Reducing the Number of Trainable Parameters in Deep CNNs by Inter-layer Kernel-sharing

Authors: Alireza Azadbakht, Saeed Reza Kheradpisheh, Ismail Khalfaoui-Hassani, Timothée Masquelier

Abstract: Deep convolutional neural networks (DCNNs) have become the state-of-the-art (SOTA) approach for many computer vision tasks: image classification, object detection, semantic segmentation, etc. However, most SOTA networks are too large for edge computing. Here, we suggest a simple way to reduce the number of trainable parameters and thus the memory footprint: sharing kernels between multiple convolu… ▽ More Deep convolutional neural networks (DCNNs) have become the state-of-the-art (SOTA) approach for many computer vision tasks: image classification, object detection, semantic segmentation, etc. However, most SOTA networks are too large for edge computing. Here, we suggest a simple way to reduce the number of trainable parameters and thus the memory footprint: sharing kernels between multiple convolutional layers. Kernel-sharing is only possible between ``isomorphic" layers, i.e.layers having the same kernel size, input and output channels. This is typically the case inside each stage of a DCNN. Our experiments on CIFAR-10 and CIFAR-100, using the ConvMixer and SE-ResNet architectures show that the number of parameters of these models can drastically be reduced with minimal cost on accuracy. The resulting networks are appealing for certain edge computing applications that are subject to severe memory constraints, and even more interesting if leveraging "frozen weights" hardware accelerators. Kernel-sharing is also an efficient regularization method, which can reduce overfitting. The codes are publicly available at https://github.com/AlirezaAzadbakht/kernel-sharing. △ Less

Submitted 23 October, 2022; originally announced October 2022.

arXiv:2112.03740 [pdf, other]

Dilated convolution with learnable spacings

Authors: Ismail Khalfaoui-Hassani, Thomas Pellegrini, Timothée Masquelier

Abstract: Recent works indicate that convolutional neural networks (CNN) need large receptive fields (RF) to compete with visual transformers and their attention mechanism. In CNNs, RFs can simply be enlarged by increasing the convolution kernel sizes. Yet the number of trainable parameters, which scales quadratically with the kernel's size in the 2D case, rapidly becomes prohibitive, and the training is no… ▽ More Recent works indicate that convolutional neural networks (CNN) need large receptive fields (RF) to compete with visual transformers and their attention mechanism. In CNNs, RFs can simply be enlarged by increasing the convolution kernel sizes. Yet the number of trainable parameters, which scales quadratically with the kernel's size in the 2D case, rapidly becomes prohibitive, and the training is notoriously difficult. This paper presents a new method to increase the RF size without increasing the number of parameters. The dilated convolution (DC) has already been proposed for the same purpose. DC can be seen as a convolution with a kernel that contains only a few non-zero elements placed on a regular grid. Here we present a new version of the DC in which the spacings between the non-zero elements, or equivalently their positions, are no longer fixed but learnable via backpropagation thanks to an interpolation technique. We call this method "Dilated Convolution with Learnable Spacings" (DCLS) and generalize it to the n-dimensional convolution case. However, our main focus here will be on the 2D case. We first tried our approach on ResNet50: we drop-in replaced the standard convolutions with DCLS ones, which increased the accuracy of ImageNet1k classification at iso-parameters, but at the expense of the throughput. Next, we used the recent ConvNeXt state-of-the-art convolutional architecture and drop-in replaced the depthwise convolutions with DCLS ones. This not only increased the accuracy of ImageNet1k classification but also of typical downstream and robustness tasks, again at iso-parameters but this time with negligible cost on throughput, as ConvNeXt uses separable convolutions. Conversely, classic DC led to poor performance with both ResNet50 and ConvNeXt. The code of the method is available at: https://github.com/K-H-Ismail/Dilated-Convolution-with-Learnable-Spacings-PyTorch. △ Less

Submitted 11 May, 2023; v1 submitted 7 December, 2021; originally announced December 2021.

Comments: Published in The Eleventh International Conference on Learning Representations (ICLR) 2023. (https://openreview.net/forum?id=Q3-1vRh3HOA)

Journal ref: The Eleventh International Conference on Learning Representations ICLR 2023

arXiv:2109.13751 [pdf, other]

doi 10.1109/ACCESS.2022.3226484

StereoSpike: Depth Learning with a Spiking Neural Network

Authors: Ulysse Rançon, Javier Cuadrado-Anibarro, Benoit R. Cottereau, Timothée Masquelier

Abstract: Depth estimation is an important computer vision task, useful in particular for navigation in autonomous vehicles, or for object manipulation in robotics. Here we solved it using an end-to-end neuromorphic approach, combining two event-based cameras and a Spiking Neural Network (SNN) with a slightly modified U-Net-like encoder-decoder architecture, that we named StereoSpike. More specifically, we… ▽ More Depth estimation is an important computer vision task, useful in particular for navigation in autonomous vehicles, or for object manipulation in robotics. Here we solved it using an end-to-end neuromorphic approach, combining two event-based cameras and a Spiking Neural Network (SNN) with a slightly modified U-Net-like encoder-decoder architecture, that we named StereoSpike. More specifically, we used the Multi Vehicle Stereo Event Camera Dataset (MVSEC). It provides a depth ground-truth, which was used to train StereoSpike in a supervised manner, using surrogate gradient descent. We propose a novel readout paradigm to obtain a dense analog prediction -- the depth of each pixel -- from the spikes of the decoder. We demonstrate that this architecture generalizes very well, even better than its non-spiking counterparts, leading to state-of-the-art test accuracy. To the best of our knowledge, it is the first time that such a large-scale regression problem is solved by a fully spiking network. Finally, we show that low firing rates (<10%) can be obtained via regularization, with a minimal cost in accuracy. This means that StereoSpike could be efficiently implemented on neuromorphic chips, opening the door for low power and real time embedded systems. △ Less

Submitted 3 November, 2022; v1 submitted 28 September, 2021; originally announced September 2021.

arXiv:2109.13208 [pdf, other]

doi 10.1109/ACCESS.2022.3187033

Spiking neural networks trained via proxy

Authors: Saeed Reza Kheradpisheh, Maryam Mirsadeghi, Timothée Masquelier

Abstract: We propose a new learning algorithm to train spiking neural networks (SNN) using conventional artificial neural networks (ANN) as proxy. We couple two SNN and ANN networks, respectively, made of integrate-and-fire (IF) and ReLU neurons with the same network architectures and shared synaptic weights. The forward passes of the two networks are totally independent. By assuming IF neuron with rate-cod… ▽ More We propose a new learning algorithm to train spiking neural networks (SNN) using conventional artificial neural networks (ANN) as proxy. We couple two SNN and ANN networks, respectively, made of integrate-and-fire (IF) and ReLU neurons with the same network architectures and shared synaptic weights. The forward passes of the two networks are totally independent. By assuming IF neuron with rate-coding as an approximation of ReLU, we backpropagate the error of the SNN in the proxy ANN to update the shared weights, simply by replacing the ANN final output with that of the SNN. We applied the proposed proxy learning to deep convolutional SNNs and evaluated it on two benchmarked datasets of Fashion-MNIST and Cifar10 with 94.56% and 93.11% classification accuracy, respectively. The proposed networks could outperform other deep SNNs trained with tandem learning, surrogate gradient learning, or converted from deep ANNs. Converted SNNs require long simulation times to reach reasonable accuracies while our proxy learning leads to efficient SNNs with much smaller simulation times. The source codes of the proposed method are publicly available at https://github.com/SRKH/ProxyLearning. △ Less

Submitted 30 July, 2022; v1 submitted 27 September, 2021; originally announced September 2021.

Journal ref: IEEE Access 10 (2022)

arXiv:2108.13621 [pdf, other]

Spike time displacement based error backpropagation in convolutional spiking neural networks

Authors: Maryam Mirsadeghi, Majid Shalchian, Saeed Reza Kheradpisheh, Timothée Masquelier

Abstract: We recently proposed the STiDi-BP algorithm, which avoids backward recursive gradient computation, for training multi-layer spiking neural networks (SNNs) with single-spike-based temporal coding. The algorithm employs a linear approximation to compute the derivative of the spike latency with respect to the membrane potential and it uses spiking neurons with piecewise linear postsynaptic potential… ▽ More We recently proposed the STiDi-BP algorithm, which avoids backward recursive gradient computation, for training multi-layer spiking neural networks (SNNs) with single-spike-based temporal coding. The algorithm employs a linear approximation to compute the derivative of the spike latency with respect to the membrane potential and it uses spiking neurons with piecewise linear postsynaptic potential to reduce the computational cost and the complexity of neural processing. In this paper, we extend the STiDi-BP algorithm to employ it in deeper and convolutional architectures. The evaluation results on the image classification task based on two popular benchmarks, MNIST and Fashion-MNIST datasets with the accuracies of respectively 99.2% and 92.8%, confirm that this algorithm has been applicable in deep SNNs. Another issue we consider is the reduction of memory storage and computational cost. To do so, we consider a convolutional SNN (CSNN) with two sets of weights: real-valued weights that are updated in the backward pass and their signs, binary weights, that are employed in the feedforward process. We evaluate the binary CSNN on two datasets of MNIST and Fashion-MNIST and obtain acceptable performance with a negligible accuracy drop with respect to real-valued weights (about $0.6%$ and $0.8%$ drops, respectively). △ Less

Submitted 31 August, 2021; originally announced August 2021.

arXiv:2103.00833 [pdf, ps, other]

Fast threshold optimization for multi-label audio tagging using Surrogate gradient learning

Authors: Thomas Pellegrini, Timothée Masquelier

Abstract: Multi-label audio tagging consists of assigning sets of tags to audio recordings. At inference time, thresholds are applied on the confidence scores outputted by a probabilistic classifier, in order to decide which classes are detected active. In this work, we consider having at disposal a trained classifier and we seek to automatically optimize the decision thresholds according to a performance m… ▽ More Multi-label audio tagging consists of assigning sets of tags to audio recordings. At inference time, thresholds are applied on the confidence scores outputted by a probabilistic classifier, in order to decide which classes are detected active. In this work, we consider having at disposal a trained classifier and we seek to automatically optimize the decision thresholds according to a performance metric of interest, in our case F-measure (micro-F1). We propose a new method, called SGL-Thresh for Surrogate Gradient Learning of Thresholds, that makes use of gradient descent. Since F1 is not differentiable, we propose to approximate the thresholding operation gradients with the gradients of a sigmoid function. We report experiments on three datasets, using state-of-the-art pre-trained deep neural networks. In all cases, SGL-Thresh outperformed three other approaches: a default threshold value (defThresh), an heuristic search algorithm and a method estimating F1 gradients numerically. It reached 54.9\% F1 on AudioSet eval, compared to 50.7% with defThresh. SGL-Thresh is very fast and scalable to a large number of tags. To facilitate reproducibility, data and source code in Pytorch are available online: https://github.com/topel/SGL-Thresh △ Less

Submitted 1 March, 2021; originally announced March 2021.

Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing, Jun 2021, Toronto, Canada

arXiv:2102.04159 [pdf, other]

Deep Residual Learning in Spiking Neural Networks

Authors: Wei Fang, Zhaofei Yu, Yanqi Chen, Tiejun Huang, Timothée Masquelier, Yonghong Tian

Abstract: Deep Spiking Neural Networks (SNNs) present optimization difficulties for gradient-based approaches due to discrete binary activation and complex spatial-temporal dynamics. Considering the huge success of ResNet in deep learning, it would be natural to train deep SNNs with residual learning. Previous Spiking ResNet mimics the standard residual block in ANNs and simply replaces ReLU activation laye… ▽ More Deep Spiking Neural Networks (SNNs) present optimization difficulties for gradient-based approaches due to discrete binary activation and complex spatial-temporal dynamics. Considering the huge success of ResNet in deep learning, it would be natural to train deep SNNs with residual learning. Previous Spiking ResNet mimics the standard residual block in ANNs and simply replaces ReLU activation layers with spiking neurons, which suffers the degradation problem and can hardly implement residual learning. In this paper, we propose the spike-element-wise (SEW) ResNet to realize residual learning in deep SNNs. We prove that the SEW ResNet can easily implement identity mapping and overcome the vanishing/exploding gradient problems of Spiking ResNet. We evaluate our SEW ResNet on ImageNet, DVS Gesture, and CIFAR10-DVS datasets, and show that SEW ResNet outperforms the state-of-the-art directly trained SNNs in both accuracy and time-steps. Moreover, SEW ResNet can achieve higher performance by simply adding more layers, providing a simple method to train deep SNNs. To our best knowledge, this is the first time that directly training deep SNNs with more than 100 layers becomes possible. Our codes are available at https://github.com/fangwei123456/Spike-Element-Wise-ResNet. △ Less

Submitted 21 January, 2022; v1 submitted 8 February, 2021; originally announced February 2021.

Comments: Accepted by Advances in Neural Information Processing Systems (NeurIPS) 2021

arXiv:2101.09818 [pdf, other]

doi 10.1016/j.neucom.2022.06.055

Encrypted Internet traffic classification using a supervised Spiking Neural Network

Authors: Ali Rasteh, Florian Delpech, Carlos Aguilar-Melchor, Romain Zimmer, Saeed Bagheri Shouraki, Timothée Masquelier

Abstract: Internet traffic recognition is an essential tool for access providers since recognizing traffic categories related to different data packets transmitted on a network help them define adapted priorities. That means, for instance, high priority requirements for an audio conference and low ones for a file transfer, to enhance user experience. As internet traffic becomes increasingly encrypted, the m… ▽ More Internet traffic recognition is an essential tool for access providers since recognizing traffic categories related to different data packets transmitted on a network help them define adapted priorities. That means, for instance, high priority requirements for an audio conference and low ones for a file transfer, to enhance user experience. As internet traffic becomes increasingly encrypted, the mainstream classic traffic recognition technique, payload inspection, is rendered ineffective. This paper uses machine learning techniques for encrypted traffic classification, looking only at packet size and time of arrival. Spiking neural networks (SNN), largely inspired by how biological neurons operate, were used for two reasons. Firstly, they are able to recognize time-related data packet features. Secondly, they can be implemented efficiently on neuromorphic hardware with a low energy footprint. Here we used a very simple feedforward SNN, with only one fully-connected hidden layer, and trained in a supervised manner using the newly introduced method known as Surrogate Gradient Learning. Surprisingly, such a simple SNN reached an accuracy of 95.9% on ISCX datasets, outperforming previous approaches. Besides better accuracy, there is also a very significant improvement on simplicity: input size, number of neurons, trainable parameters are all reduced by one to four orders of magnitude. Next, we analyzed the reasons for this good accuracy. It turns out that, beyond spatial (i.e. packet size) features, the SNN also exploits temporal ones, mostly the nearly synchronous (within a 200ms range) arrival times of packets with certain sizes. Taken together, these results show that SNNs are an excellent fit for encrypted internet traffic classification: they can be more accurate than conventional artificial neural networks (ANN), and they could be implemented efficiently on low power embedded systems. △ Less

Submitted 21 July, 2022; v1 submitted 24 January, 2021; originally announced January 2021.

Comments: 22 pages, 8 figures. Neurocomputing (2022)

Journal ref: Neurocomputing (2022)

arXiv:2011.06846 [pdf, other]

Low-activity supervised convolutional spiking neural networks applied to speech commands recognition

Authors: Thomas Pellegrini, Romain Zimmer, Timothée Masquelier

Abstract: Deep Neural Networks (DNNs) are the current state-of-the-art models in many speech related tasks. There is a growing interest, though, for more biologically realistic, hardware friendly and energy efficient models, named Spiking Neural Networks (SNNs). Recently, it has been shown that SNNs can be trained efficiently, in a supervised manner, using backpropagation with a surrogate gradient trick. In… ▽ More Deep Neural Networks (DNNs) are the current state-of-the-art models in many speech related tasks. There is a growing interest, though, for more biologically realistic, hardware friendly and energy efficient models, named Spiking Neural Networks (SNNs). Recently, it has been shown that SNNs can be trained efficiently, in a supervised manner, using backpropagation with a surrogate gradient trick. In this work, we report speech command (SC) recognition experiments using supervised SNNs. We explored the Leaky-Integrate-Fire (LIF) neuron model for this task, and show that a model comprised of stacked dilated convolution spiking layers can reach an error rate very close to standard DNNs on the Google SC v1 dataset: 5.5%, while keeping a very sparse spiking activity, below 5%, thank to a new regularization term. We also show that modeling the leakage of the neuron membrane potential is useful, since the LIF model outperformed its non-leaky model counterpart significantly. △ Less

Submitted 13 November, 2020; originally announced November 2020.

Comments: Accepted to IEEE Spoken Language Technology Workshop 2021

arXiv:2007.05785 [pdf, other]

Incorporating Learnable Membrane Time Constant to Enhance Learning of Spiking Neural Networks

Authors: Wei Fang, Zhaofei Yu, Yanqi Chen, Timothee Masquelier, Tiejun Huang, Yonghong Tian

Abstract: Spiking Neural Networks (SNNs) have attracted enormous research interest due to temporal information processing capability, low power consumption, and high biological plausibility. However, the formulation of efficient and high-performance learning algorithms for SNNs is still challenging. Most existing learning methods learn weights only, and require manual tuning of the membrane-related paramete… ▽ More Spiking Neural Networks (SNNs) have attracted enormous research interest due to temporal information processing capability, low power consumption, and high biological plausibility. However, the formulation of efficient and high-performance learning algorithms for SNNs is still challenging. Most existing learning methods learn weights only, and require manual tuning of the membrane-related parameters that determine the dynamics of a single spiking neuron. These parameters are typically chosen to be the same for all neurons, which limits the diversity of neurons and thus the expressiveness of the resulting SNNs. In this paper, we take inspiration from the observation that membrane-related parameters are different across brain regions, and propose a training algorithm that is capable of learning not only the synaptic weights but also the membrane time constants of SNNs. We show that incorporating learnable membrane time constants can make the network less sensitive to initial values and can speed up learning. In addition, we reevaluate the pooling methods in SNNs and find that max-pooling will not lead to significant information loss and have the advantage of low computation cost and binary compatibility. We evaluate the proposed method for image classification tasks on both traditional static MNIST, Fashion-MNIST, CIFAR-10 datasets, and neuromorphic N-MNIST, CIFAR10-DVS, DVS128 Gesture datasets. The experiment results show that the proposed method outperforms the state-of-the-art accuracy on nearly all datasets, using fewer time-steps. Our codes are available at https://github.com/fangwei123456/Parametric-Leaky-Integrate-and-Fire-Spiking-Neuron. △ Less

Submitted 16 August, 2021; v1 submitted 11 July, 2020; originally announced July 2020.

Comments: Accepted by International Conference on Computer Vision (ICCV) 2021

arXiv:2007.04039 [pdf, other]

doi 10.1007/s11063-021-10680-x

BS4NN: Binarized Spiking Neural Networks with Temporal Coding and Learning

Authors: Saeed Reza Kheradpisheh, Maryam Mirsadeghi, Timothée Masquelier

Abstract: We recently proposed the S4NN algorithm, essentially an adaptation of backpropagation to multilayer spiking neural networks that use simple non-leaky integrate-and-fire neurons and a form of temporal coding known as time-to-first-spike coding. With this coding scheme, neurons fire at most once per stimulus, but the firing order carries information. Here, we introduce BS4NN, a modification of S4NN… ▽ More We recently proposed the S4NN algorithm, essentially an adaptation of backpropagation to multilayer spiking neural networks that use simple non-leaky integrate-and-fire neurons and a form of temporal coding known as time-to-first-spike coding. With this coding scheme, neurons fire at most once per stimulus, but the firing order carries information. Here, we introduce BS4NN, a modification of S4NN in which the synaptic weights are constrained to be binary (+1 or -1), in order to decrease memory (ideally, one bit per synapse) and computation footprints. This was done using two sets of weights: firstly, real-valued weights, updated by gradient descent, and used in the backward pass of backpropagation, and secondly, their signs, used in the forward pass. Similar strategies have been used to train (non-spiking) binarized neural networks. The main difference is that BS4NN operates in the time domain: spikes are propagated sequentially, and different neurons may reach their threshold at different times, which increases computational power. We validated BS4NN on two popular benchmarks, MNIST and Fashion-MNIST, and obtained reasonable accuracies for this sort of network (97.0% and 87.3% respectively) with a negligible accuracy drop with respect to real-valued weights (0.4% and 0.7%, respectively). We also demonstrated that BS4NN outperforms a simple BNN with the same architectures on those two datasets (by 0.2% and 0.9% respectively), presumably because it leverages the temporal dimension. The source codes of the proposed BS4NN are publicly available at https://github.com/SRKH/BS4NN. △ Less

Submitted 6 December, 2021; v1 submitted 8 July, 2020; originally announced July 2020.

arXiv:1911.10124 [pdf, other]

Technical report: supervised training of convolutional spiking neural networks with PyTorch

Authors: Romain Zimmer, Thomas Pellegrini, Srisht Fateh Singh, Timothée Masquelier

Abstract: Recently, it has been shown that spiking neural networks (SNNs) can be trained efficiently, in a supervised manner, using backpropagation through time. Indeed, the most commonly used spiking neuron model, the leaky integrate-and-fire neuron, obeys a differential equation which can be approximated using discrete time steps, leading to a recurrent relation for the potential. The firing threshold cau… ▽ More Recently, it has been shown that spiking neural networks (SNNs) can be trained efficiently, in a supervised manner, using backpropagation through time. Indeed, the most commonly used spiking neuron model, the leaky integrate-and-fire neuron, obeys a differential equation which can be approximated using discrete time steps, leading to a recurrent relation for the potential. The firing threshold causes optimization issues, but they can be overcome using a surrogate gradient. Here, we extend previous approaches in two ways. Firstly, we show that the approach can be used to train convolutional layers. Convolutions can be done in space, time (which simulates conduction delays), or both. Secondly, we include fast horizontal connections à la Denève: when a neuron N fires, we subtract to the potentials of all the neurons with the same receptive the dot product between their weight vectors and the one of neuron N. As Denève et al. showed, this is useful to represent a dynamic multidimensional analog signal in a population of spiking neurons. Here we demonstrate that, in addition, such connections also allow implementing a multidimensional send-on-delta coding scheme. We validate our approach on one speech classification benchmarks: the Google speech command dataset. We managed to reach nearly state-of-the-art accuracy (94%) while maintaining low firing rates (about 5Hz). Our code is based on PyTorch and is available in open source at http://github.com/romainzimmer/s2net △ Less

Submitted 22 November, 2019; originally announced November 2019.

arXiv:1910.09495 [pdf, other]

doi 10.1142/S0129065720500276

S4NN: temporal backpropagation for spiking neural networks with one spike per neuron

Authors: Saeed Reza Kheradpisheh, Timothée Masquelier

Abstract: We propose a new supervised learning rule for multilayer spiking neural networks (SNNs) that use a form of temporal coding known as rank-order-coding. With this coding scheme, all neurons fire exactly one spike per stimulus, but the firing order carries information. In particular, in the readout layer, the first neuron to fire determines the class of the stimulus. We derive a new learning rule for… ▽ More We propose a new supervised learning rule for multilayer spiking neural networks (SNNs) that use a form of temporal coding known as rank-order-coding. With this coding scheme, all neurons fire exactly one spike per stimulus, but the firing order carries information. In particular, in the readout layer, the first neuron to fire determines the class of the stimulus. We derive a new learning rule for this sort of network, named S4NN, akin to traditional error backpropagation, yet based on latencies. We show how approximated error gradients can be computed backward in a feedforward network with any number of layers. This approach reaches state-of-the-art performance with supervised multi fully-connected layer SNNs: test accuracy of 97.4% for the MNIST dataset, and 99.2% for the Caltech Face/Motorbike dataset. Yet, the neuron model that we use, non-leaky integrate-and-fire, is much simpler than the one used in all previous works. The source codes of the proposed S4NN are publicly available at https://github.com/SRKH/S4NN. △ Less

Submitted 13 June, 2020; v1 submitted 21 October, 2019; originally announced October 2019.

Journal ref: International Journal of Neural Systems 2020

arXiv:1903.02440 [pdf, other]

doi 10.3389/fnins.2019.00625

SpykeTorch: Efficient Simulation of Convolutional Spiking Neural Networks with at most one Spike per Neuron

Authors: Milad Mozafari, Mohammad Ganjtabesh, Abbas Nowzari-Dalini, Timothée Masquelier

Abstract: Application of deep convolutional spiking neural networks (SNNs) to artificial intelligence (AI) tasks has recently gained a lot of interest since SNNs are hardware-friendly and energy-efficient. Unlike the non-spiking counterparts, most of the existing SNN simulation frameworks are not practically efficient enough for large-scale AI tasks. In this paper, we introduce SpykeTorch, an open-source hi… ▽ More Application of deep convolutional spiking neural networks (SNNs) to artificial intelligence (AI) tasks has recently gained a lot of interest since SNNs are hardware-friendly and energy-efficient. Unlike the non-spiking counterparts, most of the existing SNN simulation frameworks are not practically efficient enough for large-scale AI tasks. In this paper, we introduce SpykeTorch, an open-source high-speed simulation framework based on PyTorch. This framework simulates convolutional SNNs with at most one spike per neuron and the rank-order encoding scheme. In terms of learning rules, both spike-timing-dependent plasticity (STDP) and reward-modulated STDP (R-STDP) are implemented, but other rules could be implemented easily. Apart from the aforementioned properties, SpykeTorch is highly generic and capable of reproducing the results of various studies. Computations in the proposed framework are tensor-based and totally done by PyTorch functions, which in turn brings the ability of just-in-time optimization for running on CPUs, GPUs, or Multi-GPU platforms. △ Less

Submitted 16 July, 2019; v1 submitted 6 March, 2019; originally announced March 2019.

arXiv:1804.08150 [pdf, other]

doi 10.1016/j.neunet.2018.12.002

Deep Learning in Spiking Neural Networks

Authors: Amirhossein Tavanaei, Masoud Ghodrati, Saeed Reza Kheradpisheh, Timothee Masquelier, Anthony S. Maida

Abstract: In recent years, deep learning has been a revolution in the field of machine learning, for computer vision in particular. In this approach, a deep (multilayer) artificial neural network (ANN) is trained in a supervised manner using backpropagation. Huge amounts of labeled examples are required, but the resulting classification accuracy is truly impressive, sometimes outperforming humans. Neurons i… ▽ More In recent years, deep learning has been a revolution in the field of machine learning, for computer vision in particular. In this approach, a deep (multilayer) artificial neural network (ANN) is trained in a supervised manner using backpropagation. Huge amounts of labeled examples are required, but the resulting classification accuracy is truly impressive, sometimes outperforming humans. Neurons in an ANN are characterized by a single, static, continuous-valued activation. Yet biological neurons use discrete spikes to compute and transmit information, and the spike times, in addition to the spike rates, matter. Spiking neural networks (SNNs) are thus more biologically realistic than ANNs, and arguably the only viable option if one wants to understand how the brain computes. SNNs are also more hardware friendly and energy-efficient than ANNs, and are thus appealing for technology, especially for portable devices. However, training deep SNNs remains a challenge. Spiking neurons' transfer function is usually non-differentiable, which prevents using backpropagation. Here we review recent supervised and unsupervised methods to train deep SNNs, and compare them in terms of accuracy, but also computational cost and hardware friendliness. The emerging picture is that SNNs still lag behind ANNs in terms of accuracy, but the gap is decreasing, and can even vanish on some tasks, while the SNNs typically require much fewer operations. △ Less

Submitted 20 January, 2019; v1 submitted 22 April, 2018; originally announced April 2018.

Journal ref: Neural Networks (2018)

arXiv:1804.00227 [pdf, other]

doi 10.1016/j.patcog.2019.05.015

Bio-inspired digit recognition using reward-modulated spike-timing-dependent plasticity in deep convolutional networks

Authors: Milad Mozafari, Mohammad Ganjtabesh, Abbas Nowzari-Dalini, Simon J. Thorpe, Timothée Masquelier

Abstract: The primate visual system has inspired the development of deep artificial neural networks, which have revolutionized the computer vision domain. Yet these networks are much less energy-efficient than their biological counterparts, and they are typically trained with backpropagation, which is extremely data-hungry. To address these limitations, we used a deep convolutional spiking neural network (D… ▽ More The primate visual system has inspired the development of deep artificial neural networks, which have revolutionized the computer vision domain. Yet these networks are much less energy-efficient than their biological counterparts, and they are typically trained with backpropagation, which is extremely data-hungry. To address these limitations, we used a deep convolutional spiking neural network (DCSNN) and a latency-coding scheme. We trained it using a combination of spike-timing-dependent plasticity (STDP) for the lower layers and reward-modulated STDP (R-STDP) for the higher ones. In short, with R-STDP a correct (resp. incorrect) decision leads to STDP (resp. anti-STDP). This approach led to an accuracy of $97.2\%$ on MNIST, without requiring an external classifier. In addition, we demonstrated that R-STDP extracts features that are diagnostic for the task at hand, and discards the other ones, whereas STDP extracts any feature that repeats. Finally, our approach is biologically plausible, hardware friendly, and energy-efficient. △ Less

Submitted 27 May, 2019; v1 submitted 31 March, 2018; originally announced April 2018.

Comments: Pattern Recognition (2019)

arXiv:1803.00447 [pdf, other]

doi 10.3389/fncom.2018.00074

Optimal localist and distributed coding of spatiotemporal spike patterns through STDP and coincidence detection

Authors: Timothée Masquelier, Saeed Reza Kheradpisheh

Abstract: Repeating spatiotemporal spike patterns exist and carry information. Here we investigated how a single spiking neuron can optimally respond to one given pattern (localist coding), or to either one of several patterns (distributed coding, i.e. the neuron's response is ambiguous but the identity of the pattern could be inferred from the response of multiple neurons), but not to random inputs. To do… ▽ More Repeating spatiotemporal spike patterns exist and carry information. Here we investigated how a single spiking neuron can optimally respond to one given pattern (localist coding), or to either one of several patterns (distributed coding, i.e. the neuron's response is ambiguous but the identity of the pattern could be inferred from the response of multiple neurons), but not to random inputs. To do so, we extended a theory developed in a previous paper [Masquelier, 2017], which was limited to localist coding. More specifically, we computed analytically the signal-to-noise ratio (SNR) of a multi-pattern-detector neuron, using a threshold-free leaky integrate-and-fire (LIF) neuron model with non-plastic unitary synapses and homogeneous Poisson inputs. Surprisingly, when increasing the number of patterns, the SNR decreases slowly, and remains acceptable for several tens of independent patterns. In addition, we investigated whether spike-timing-dependent plasticity (STDP) could enable a neuron to reach the theoretical optimal SNR. To this aim, we simulated a LIF equipped with STDP, and repeatedly exposed it to multiple input spike patterns, embedded in equally dense Poisson spike trains. The LIF progressively became selective to every repeating pattern with no supervision, and stopped discharging during the Poisson spike trains. Furthermore, using certain STDP parameters, the resulting pattern detectors were optimal. Tens of independent patterns could be learned by a single neuron using a low adaptive threshold, in contrast with previous studies, in which higher thresholds led to localist coding only. Taken together these results suggest that coincidence detection and STDP are powerful mechanisms, fully compatible with distributed coding. Yet we acknowledge that our theory is limited to single neurons, and thus also applies to feed-forward networks, but not to recurrent ones. △ Less

Submitted 21 September, 2018; v1 submitted 1 March, 2018; originally announced March 2018.

Comments: 16 pages, 8 figures

Journal ref: Front. Comput. Neurosci. 12:74, 2018

arXiv:1706.06699 [pdf, other]

doi 10.1016/j.neunet.2018.05.018

Representation Learning using Event-based STDP

Authors: Amirhossein Tavanaei, Timothee Masquelier, Anthony Maida

Abstract: Although representation learning methods developed within the framework of traditional neural networks are relatively mature, developing a spiking representation model remains a challenging problem. This paper proposes an event-based method to train a feedforward spiking neural network (SNN) layer for extracting visual features. The method introduces a novel spike-timing-dependent plasticity (STDP… ▽ More Although representation learning methods developed within the framework of traditional neural networks are relatively mature, developing a spiking representation model remains a challenging problem. This paper proposes an event-based method to train a feedforward spiking neural network (SNN) layer for extracting visual features. The method introduces a novel spike-timing-dependent plasticity (STDP) learning rule and a threshold adjustment rule both derived from a vector quantization-like objective function subject to a sparsity constraint. The STDP rule is obtained by the gradient of a vector quantization criterion that is converted to spike-based, spatio-temporally local update rules in a spiking network of leaky, integrate-and-fire (LIF) neurons. Independence and sparsity of the model are achieved by the threshold adjustment rule and by a softmax function implementing inhibition in the representation layer consisting of WTA-thresholded spiking neurons. Together, these mechanisms implement a form of spike-based, competitive learning. Two sets of experiments are performed on the MNIST and natural image datasets. The results demonstrate a sparse spiking visual representation model with low reconstruction loss comparable with state-of-the-art visual coding approaches, yet our rule is local in both time and space, thus biologically plausible and hardware friendly. △ Less

Submitted 9 March, 2018; v1 submitted 20 June, 2017; originally announced June 2017.

Journal ref: Neural Networks, vol. 105, pp. 294-303, 2018

arXiv:1705.09132 [pdf, other]

doi 10.1109/TNNLS.2018.2826721

First-spike based visual categorization using reward-modulated STDP

Authors: Milad Mozafari, Saeed Reza Kheradpisheh, Timothée Masquelier, Abbas Nowzari-Dalini, Mohammad Ganjtabesh

Abstract: Reinforcement learning (RL) has recently regained popularity, with major achievements such as beating the European game of Go champion. Here, for the first time, we show that RL can be used efficiently to train a spiking neural network (SNN) to perform object recognition in natural images without using an external classifier. We used a feedforward convolutional SNN and a temporal coding scheme whe… ▽ More Reinforcement learning (RL) has recently regained popularity, with major achievements such as beating the European game of Go champion. Here, for the first time, we show that RL can be used efficiently to train a spiking neural network (SNN) to perform object recognition in natural images without using an external classifier. We used a feedforward convolutional SNN and a temporal coding scheme where the most strongly activated neurons fire first, while less activated ones fire later, or not at all. In the highest layers, each neuron was assigned to an object category, and it was assumed that the stimulus category was the category of the first neuron to fire. If this assumption was correct, the neuron was rewarded, i.e. spike-timing-dependent plasticity (STDP) was applied, which reinforced the neuron's selectivity. Otherwise, anti-STDP was applied, which encouraged the neuron to learn something else. As demonstrated on various image datasets (Caltech, ETH-80, and NORB), this reward modulated STDP (R-STDP) approach extracted particularly discriminative visual features, whereas classic unsupervised STDP extracts any feature that consistently repeats. As a result, R-STDP outperformed STDP on these datasets. Furthermore, R-STDP is suitable for online learning, and can adapt to drastic changes such as label permutations. Finally, it is worth mentioning that both feature extraction and classification were done with spikes, using at most one spike per neuron. Thus the network is hardware friendly and energy efficient. △ Less

Submitted 10 July, 2018; v1 submitted 25 May, 2017; originally announced May 2017.

Comments: supplementary materials are added, Caltech face/motorbike demonstration figure is updated, some parts of the main manuscript are moved to the supplementary materials, additional network analysis and performance comparison with deep nets are added

Journal ref: Mozafari, Milad, et al. "First-Spike-Based Visual Categorization Using Reward-Modulated STDP". IEEE Transactions on Neural Networks and Learning Systems (2018). DOI: https://doi.org/10.1109/TNNLS.2018.2826721

arXiv:1703.09990 [pdf, other]

doi 10.3389/fpsyg.2017.01261

Object categorization in finer levels requires higher spatial frequencies, and therefore takes longer

Authors: Matin N. Ashtiani, Saeed Reza Kheradpisheh, Timothée Masquelier, Mohammad Ganjtabesh

Abstract: The human visual system contains a hierarchical sequence of modules that take part in visual perception at different levels of abstraction, i.e., superordinate, basic, and subordinate levels. One important question is to identify the "entry" level at which the visual representation is commenced in the process of object recognition. For a long time, it was believed that the basic level had advantag… ▽ More The human visual system contains a hierarchical sequence of modules that take part in visual perception at different levels of abstraction, i.e., superordinate, basic, and subordinate levels. One important question is to identify the "entry" level at which the visual representation is commenced in the process of object recognition. For a long time, it was believed that the basic level had advantage over two others; a claim that has been challenged recently. Here we used a series of psychophysics experiments, based on a rapid presentation paradigm, as well as two computational models, with bandpass filtered images to study the processing order of the categorization levels. In these experiments, we investigated the type of visual information required for categorizing objects in each level by varying the spatial frequency bands of the input image. The results of our psychophysics experiments and computational models are consistent. They indicate that the different spatial frequency information had different effects on object categorization in each level. In the absence of high frequency information, subordinate and basic level categorization are performed inaccurately, while superordinate level is performed well. This means that, low frequency information is sufficient for superordinate level, but not for the basic and subordinate levels. These finer levels require high frequency information, which appears to take longer to be processed, leading to longer reaction times. Finally, to avoid the ceiling effect, we evaluated the robustness of the results by adding different amounts of noise to the input images and repeating the experiments. As expected, the categorization accuracy decreased and the reaction time increased significantly, but the trends were the same.This shows that our results are not due to a ceiling effect. △ Less

Submitted 29 March, 2017; originally announced March 2017.

Journal ref: Frontiers in Psychology 2017

arXiv:1611.01421 [pdf, other]

doi 10.1016/j.neunet.2017.12.005

STDP-based spiking deep convolutional neural networks for object recognition

Authors: Saeed Reza Kheradpisheh, Mohammad Ganjtabesh, Simon J Thorpe, Timothée Masquelier

Abstract: Previous studies have shown that spike-timing-dependent plasticity (STDP) can be used in spiking neural networks (SNN) to extract visual features of low or intermediate complexity in an unsupervised manner. These studies, however, used relatively shallow architectures, and only one layer was trainable. Another line of research has demonstrated - using rate-based neural networks trained with back-p… ▽ More Previous studies have shown that spike-timing-dependent plasticity (STDP) can be used in spiking neural networks (SNN) to extract visual features of low or intermediate complexity in an unsupervised manner. These studies, however, used relatively shallow architectures, and only one layer was trainable. Another line of research has demonstrated - using rate-based neural networks trained with back-propagation - that having many layers increases the recognition robustness, an approach known as deep learning. We thus designed a deep SNN, comprising several convolutional (trainable with STDP) and pooling layers. We used a temporal coding scheme where the most strongly activated neurons fire first, and less activated neurons fire later or not at all. The network was exposed to natural images. Thanks to STDP, neurons progressively learned features corresponding to prototypical patterns that were both salient and frequent. Only a few tens of examples per category were required and no label was needed. After learning, the complexity of the extracted features increased along the hierarchy, from edge detectors in the first layer to object prototypes in the last layer. Coding was very sparse, with only a few thousands spikes per image, and in some cases the object category could be reasonably well inferred from the activity of a single higher-order neuron. More generally, the activity of a few hundreds of such neurons contained robust category information, as demonstrated using a classifier on Caltech 101, ETH-80, and MNIST databases. We also demonstrate the superiority of STDP over other unsupervised techniques such as random crops (HMAX) or auto-encoders. Taken together, our results suggest that the combination of STDP with latency coding may be a key to understanding the way that the primate visual system learns, its remarkable processing speed and its low energy consumption. △ Less

Submitted 25 December, 2017; v1 submitted 4 November, 2016; originally announced November 2016.

Journal ref: Neural Networks 2018

arXiv:1610.07355 [pdf, other]

doi 10.1016/j.neuroscience.2017.06.032

STDP allows close-to-optimal spatiotemporal spike pattern detection by single coincidence detector neurons

Authors: Timothée Masquelier

Abstract: By recording multiple cells simultaneously, electrophysiologists have found evidence for repeating spatiotemporal spike patterns, which can carry information. How this information is extracted by downstream neurons is unclear. In this theoretical paper, we investigate to what extent a single cell could detect a given spike pattern and what the optimal parameters to do so are, in particular the mem… ▽ More By recording multiple cells simultaneously, electrophysiologists have found evidence for repeating spatiotemporal spike patterns, which can carry information. How this information is extracted by downstream neurons is unclear. In this theoretical paper, we investigate to what extent a single cell could detect a given spike pattern and what the optimal parameters to do so are, in particular the membrane time constant $τ$. Using a leaky integrate-and-fire (LIF) neuron with instantaneous synapses and homogeneous Poisson input, we were able to compute this optimum analytically. Our results indicate that a relatively small $τ$ (at most a few tens of ms) is usually optimal, even when the pattern is much longer. This is somewhat counter intuitive as the resulting detector ignores most of the pattern, due to its fast memory decay. Next, we wondered if spike-timing-dependent plasticity (STDP) could enable a neuron to reach the theoretical optimum. We simulated a LIF neuron equipped with additive spike-timing-dependent potentiation and homeostatic rate-based depression, and repeatedly exposed it to a given input spike pattern. As in previous studies, the LIF progressively became selective to the repeating pattern with no supervision, even when the pattern was embedded in Poisson activity. Here we show that, using certain STDP parameters, the resulting pattern detector can be optimal. Taken together, these results may explain how humans can learn repeating visual or auditory sequences. Long sequences could be recognized thanks to coincidence detectors working at a much shorter timescale. This is consistent with the fact that recognition is still possible if a sound sequence is compressed, played backward, or scrambled using 10ms bins. Coincidence detection is a simple yet powerful mechanism, which could be the main function of neurons in the brain. △ Less

Submitted 9 January, 2017; v1 submitted 24 October, 2016; originally announced October 2016.

Comments: 12 pages, 6 figures, 1 table

Journal ref: Neuroscience 2017

arXiv:1606.01102 [pdf, other]

doi 10.1109/IJCNN.2016.7727213

Acquisition of Visual Features Through Probabilistic Spike-Timing-Dependent Plasticity

Authors: Amirhossein Tavanaei, Timothee Masquelier, Anthony S Maida

Abstract: The final version of this paper has been published in IEEEXplore available at http://ieeexplore.ieee.org/document/7727213. Please cite this paper as: Amirhossein Tavanaei, Timothee Masquelier, and Anthony Maida, Acquisition of visual features through probabilistic spike-timing-dependent plasticity. IEEE International Joint Conference on Neural Networks. pp. 307-314, IJCNN 2016. This paper explor… ▽ More The final version of this paper has been published in IEEEXplore available at http://ieeexplore.ieee.org/document/7727213. Please cite this paper as: Amirhossein Tavanaei, Timothee Masquelier, and Anthony Maida, Acquisition of visual features through probabilistic spike-timing-dependent plasticity. IEEE International Joint Conference on Neural Networks. pp. 307-314, IJCNN 2016. This paper explores modifications to a feedforward five-layer spiking convolutional network (SCN) of the ventral visual stream [Masquelier, T., Thorpe, S., Unsupervised learning of visual features through spike timing dependent plasticity. PLoS Computational Biology, 3(2), 247-257]. The original model showed that a spike-timing-dependent plasticity (STDP) learning algorithm embedded in an appropriately selected SCN could perform unsupervised feature discovery. The discovered features where interpretable and could effectively be used to perform rapid binary decisions in a classifier. In order to study the robustness of the previous results, the present research examines the effects of modifying some of the components of the original model. For improved biological realism, we replace the original non-leaky integrate-and-fire neurons with Izhikevich-like neurons. We also replace the original STDP rule with a novel rule that has a probabilistic interpretation. The probabilistic STDP slightly but significantly improves the performance for both types of model neurons. Use of the Izhikevich-like neuron was not found to improve performance although performance was still comparable to the IF neuron. This shows that the model is robust enough to handle more biologically realistic neurons. We also conclude that the underlying reasons for stable performance in the model are preserved despite the overt changes to the explicit components of the model. △ Less

Submitted 8 November, 2016; v1 submitted 3 June, 2016; originally announced June 2016.

Comments: IEEE-IJCNN 2016

Journal ref: 2016 International Joint Conference on Neural Networks

arXiv:1604.06486 [pdf, other]

doi 10.3389/fncom.2016.00092

Humans and deep networks largely agree on which kinds of variation make object recognition harder

Authors: Saeed Reza Kheradpisheh, Masoud Ghodrati, Mohammad Ganjtabesh, Timothée Masquelier

Abstract: View-invariant object recognition is a challenging problem, which has attracted much attention among the psychology, neuroscience, and computer vision communities. Humans are notoriously good at it, even if some variations are presumably more difficult to handle than others (e.g. 3D rotations). Humans are thought to solve the problem through hierarchical processing along the ventral stream, which… ▽ More View-invariant object recognition is a challenging problem, which has attracted much attention among the psychology, neuroscience, and computer vision communities. Humans are notoriously good at it, even if some variations are presumably more difficult to handle than others (e.g. 3D rotations). Humans are thought to solve the problem through hierarchical processing along the ventral stream, which progressively extracts more and more invariant visual features. This feed-forward architecture has inspired a new generation of bio-inspired computer vision systems called deep convolutional neural networks (DCNN), which are currently the best algorithms for object recognition in natural images. Here, for the first time, we systematically compared human feed-forward vision and DCNNs at view-invariant object recognition using the same images and controlling for both the kinds of transformation as well as their magnitude. We used four object categories and images were rendered from 3D computer models. In total, 89 human subjects participated in 10 experiments in which they had to discriminate between two or four categories after rapid presentation with backward masking. We also tested two recent DCNNs on the same tasks. We found that humans and DCNNs largely agreed on the relative difficulties of each kind of variation: rotation in depth is by far the hardest transformation to handle, followed by scale, then rotation in plane, and finally position. This suggests that humans recognize objects mainly through 2D template matching, rather than by constructing 3D object models, and that DCNNs are not too unreasonable models of human feed-forward vision. Also, our results show that the variation levels in rotation in depth and scale strongly modulate both humans' and DCNNs' recognition performances. We thus argue that these variations should be controlled in the image datasets used in vision research. △ Less

Submitted 21 April, 2016; originally announced April 2016.

Journal ref: Frontiers in Computational Neuroscience (2016) 10:92

arXiv:1508.03929 [pdf, other]

doi 10.1038/srep32672

Deep Networks Can Resemble Human Feed-forward Vision in Invariant Object Recognition

Authors: Saeed Reza Kheradpisheh, Masoud Ghodrati, Mohammad Ganjtabesh, Timothée Masquelier

Abstract: Deep convolutional neural networks (DCNNs) have attracted much attention recently, and have shown to be able to recognize thousands of object categories in natural image databases. Their architecture is somewhat similar to that of the human visual system: both use restricted receptive fields, and a hierarchy of layers which progressively extract more and more abstracted features. Yet it is unknown… ▽ More Deep convolutional neural networks (DCNNs) have attracted much attention recently, and have shown to be able to recognize thousands of object categories in natural image databases. Their architecture is somewhat similar to that of the human visual system: both use restricted receptive fields, and a hierarchy of layers which progressively extract more and more abstracted features. Yet it is unknown whether DCNNs match human performance at the task of view-invariant object recognition, whether they make similar errors and use similar representations for this task, and whether the answers depend on the magnitude of the viewpoint variations. To investigate these issues, we benchmarked eight state-of-the-art DCNNs, the HMAX model, and a baseline shallow model and compared their results to those of humans with backward masking. Unlike in all previous DCNN studies, we carefully controlled the magnitude of the viewpoint variations to demonstrate that shallow nets can outperform deep nets and humans when variations are weak. When facing larger variations, however, more layers were needed to match human performance and error distributions, and to have representations that are consistent with human behavior. A very deep net with 18 layers even outperformed humans at the highest variation level, using the most human-like representations. △ Less

Submitted 28 June, 2016; v1 submitted 17 August, 2015; originally announced August 2015.

Journal ref: Scientific Reports (2016) 6: 32672

arXiv:1504.03871 [pdf, other]

doi 10.1016/j.neucom.2016.04.029

Bio-inspired Unsupervised Learning of Visual Features Leads to Robust Invariant Object Recognition

Authors: Saeed Reza Kheradpisheh, Mohammad Ganjtabesh, Timothée Masquelier

Abstract: Retinal image of surrounding objects varies tremendously due to the changes in position, size, pose, illumination condition, background context, occlusion, noise, and nonrigid deformations. But despite these huge variations, our visual system is able to invariantly recognize any object in just a fraction of a second. To date, various computational models have been proposed to mimic the hierarchica… ▽ More Retinal image of surrounding objects varies tremendously due to the changes in position, size, pose, illumination condition, background context, occlusion, noise, and nonrigid deformations. But despite these huge variations, our visual system is able to invariantly recognize any object in just a fraction of a second. To date, various computational models have been proposed to mimic the hierarchical processing of the ventral visual pathway, with limited success. Here, we show that the association of both biologically inspired network architecture and learning rule significantly improves the models' performance when facing challenging invariant object recognition problems. Our model is an asynchronous feedforward spiking neural network. When the network is presented with natural images, the neurons in the entry layers detect edges, and the most activated ones fire first, while neurons in higher layers are equipped with spike timing-dependent plasticity. These neurons progressively become selective to intermediate complexity visual features appropriate for object categorization. The model is evaluated on 3D-Object and ETH-80 datasets which are two benchmarks for invariant object recognition, and is shown to outperform state-of-the-art models, including DeepConvNet and HMAX. This demonstrates its ability to accurately recognize different instances of multiple object classes even under various appearance conditions (different views, scales, tilts, and backgrounds). Several statistical analysis techniques are used to show that our model extracts class specific and highly informative features. △ Less

Submitted 28 June, 2016; v1 submitted 15 April, 2015; originally announced April 2015.

Journal ref: Neurocomputing 205 (2016) 382-392

Showing 1–34 of 34 results for author: Masquelier, T