Abstract
Spiking neural networks (SNNs) are considered to be biologically plausible and can yield high energy efficiency when implemented on neuromorphic hardware due to their highly sparse asynchronous binary event-driven nature. Recently, surrogate gradient (SG) approaches have enabled SNNs to be trained from scratch with backpropagation (BP) algorithms under a deep learning framework. However, a popular SG approach known as straight-through estimator (STE), which only propagates the same gradient information, does not take into account the activation differences between the membrane potentials and output spikes. To address this issue, we propose surrogate gradient scaling (SGS), which scales up or down the gradient information of the membrane potential according to the sign of the gradient of the spiking neuron output and the difference between the membrane potential and the output of the spiking neuron. This SGS approach can also be applied to unimodal functions that propagate different gradient information from the output spikes to the input membrane potential. In addition, SNNs trained directly from scratch suffer from poor generalization performance, and we introduce Lipschitz regularization (LR), which is incorporated into the loss function. It not only improves the generalization performance of SNNs but also makes them more robust to noise. Extensive experimental results on several popular benchmark datasets (CIFAR10, CIFAR100 and CIFAR10-DVS) show that our approach not only outperforms the SOTA but also has lower inference latency. Remarkably, our SNNs can lead to 34\(\times \), 29\(\times \), and 17\(\times \) computation energy savings compared to standard Artificial neural networks (ANNs) on above three datasets.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
Reproduction of ablation experiments are available at:https://github.com/CHNtao/SGS-SNN
References
He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 770–778
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proc IEEE Conf Comput. Vis Pattern Recognit, pp 3431–3440
Zhao ZQ, Zheng P, St Xu et al (2019) Object detection with deep learning: A review. IEEE Trans Neural Netw Learn Syst 30(11):3212–3232
Roy K, Jaiswal A, Panda P (2019) Towards spike-based machine intelligence with neuromorphic computing. Nature 575(7784):607–617
Hua Y, Wan F, Gan H, et al (2022) Distributed estimation with cross-verification under false data-injection attacks. IEEE T Cybern
Tavanaei A, Ghodrati M, Kheradpisheh SR et al (2019) Deep learning in spiking neural networks. Neural Netw, 111:47–63
Rajendran B, Sebastian A, Schmuker M et al (2019) Low-power neuromorphic hardware for signal processing applications: A review of architectural and system-level design approaches. IEEE Signal Process Mag, 36(6):97–110
Maass W (1997) Networks of spiking neurons: the third generation of neural network models. Neural Netw, 10(9):1659–1671
Zambrano D, Nusselder R, Scholte HS et al (2019) Sparse computation in adaptive spiking neural networks. Front Neurosci, 12:987
Panda P, Aketi SA, Roy K (2020) Toward scalable, efficient, and accurate deep spiking neural networks with backward residual connections, stochastic softmax, and hybridization. Front Neurosci, 14:653
Davies M, Srinivasa N, Lin TH et al (2018) Loihi: A neuromorphic manycore processor with on-chip learning. IEEE Micro, 38(1):82–99
Pei J, Deng L, Song S et al (2019) Towards artificial general intelligence with hybrid tianjic chip architecture. Nature 572(7767):106–111
Ponulak F, Kasiński A (2010) Supervised learning in spiking neural networks with resume: sequence learning, classification, and spike shifting. Neural Comput, 22(2):467–510
Bohte SM, Kok JN, La Poutré JA (2000) Spikeprop: backpropagation for networks of spiking neurons. In: ESANN, Bruges, pp 419–424
Gütig R, Sompolinsky H (2006) The tempotron: A neuron that learns spike timing-based decisions. Nat Neurosci, 9(3):420-428
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Han B, Srinivasan G, Roy K (2020) Rmp-snn: Residual membrane potential neuron for enabling deeper high-accuracy and low-latency spiking neural network. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 13558–13567
Sengupta A, Ye Y, Wang R et al (2019) Going deeper in spiking neural networks: Vgg and residual architectures. Front Neurosci, 13:95
Han B, Roy K (2020) Deep spiking neural network: Energy efficiency through time based coding. In: Proc Eur Conf Comput Vis, pp 388–404
Bu T, Ding J, Yu Z, Huang T (2022) Optimized potential initialization for low-latency spikingneural networks. In: Proc AAAI Conf Artif Intell, pp 11–20
Nitin R, Gopalakrishnan S, Priyadarshini P, Kaushik R (2020) Enabling Deep Spiking Neural Networks with Hybrid Conversion and Spike Timing Dependent Backpropagation. In: Proc Int Conf Learn Represent
Han B, Kaushik R (2020) Deep spiking neural network: Energy efficiency through time based coding. In: Proc Eur Conf Comput Vis, pp 388–404
Bu T, Fang W, Ding J, et al (2021) Optimal ann-snn conversion for high-accuracy and ultra-low-latency spiking neural networks. In: Proc Int Conf Learn Represent
Neftci EO, Mostafa H, Zenke F (2019) Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks. IEEE Signal Process Mag, 36(6):51–63
Lee JH, Delbruck T, Pfeiffer M (2016) Training deep spiking neural networks using backpropagation. Front Neurosci, 10:508
Wu Y, Deng L, Li G et al (2018) Spatio-temporal backpropagation for training high-performance spiking neural networks. Front Neurosci, 12:331
Wu Y, Deng L, Li G, et al (2019) Direct training for spiking neural networks: Faster, larger, better. In: Proc AAAI Conf Artif Intell, pp 1311–1318
Neftci EO, Augustine C, Paul S et al (2017) Event-driven random back-propagation: Enabling neuromorphic deep learning machines. Front Neurosci, 11:324
Woźniak S, Pantazi A, Bohnstingl T et al (2020) Deep learning incorporating biologically inspired neural dynamics and in-memory computing. Nat Mach Intell, 2(6):325–336
Lee C, Sarwar SS, Panda P, et al (2020) Enabling spike-based backpropagation for training deep neural network architectures. Front Neurosci, p 119
Liu Z, Cheng K-T, Huang D Xing EP, Shen Z (2022) Nonuniform-to-uniform quantization: Towards accurate quantization via generalized straight-through estimation. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 4942–4952
Lee J, Kim D, Ham B (2021) Network quantization with element-wise gradient scaling. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 6448–6457
Bellec G, Salaj D, Subramoney A, et al (2018) Long short-term memory and learning-to-learn in networks of spiking neurons. Proc Adv Neural Inf Process Syst, 31
Shrestha SB, Orchard G (2018) Slayer: Spike layer error reassignment in time. Proc Adv Neural Inf Process Syst, 31
Zenke F, Ganguli S (2018) Superspike: Supervised learning in multilayer spiking neural networks. Neural Comput, 30(6):1514-1541
Chen Y, Zhang S, Ren S, et al (2022) Gradual surrogate gradient learning in deep spiking neural networks. In: Proc IEEE Int Conf Acoust Speech Signal Process., IEEE, pp 8927–8931
Li Y, Guo Y, Zhang S et al (2021) Differentiable spike: Rethinking gradient-descent for training spiking neural networks. Proc Adv Neural Inf Process Syst, 34:23426–23439
Yang Y, Zhang W, Li P (2021) Backpropagated neighborhood aggregation for accurate training of spiking neural networks. In: Proc Int Conf Mach Learn, PMLR, pp 11852–11862
Kim Y, Panda P (2020) Revisiting batch normalization for training low-latency deep spiking neural networks from scratch. Front Neurosci, p 1638
Zheng H, Wu Y, Deng L, et al (2021) Going deeper with directly-trained larger spiking neural networks. In: Proc AAAI Conf Artif Intell, pp 11062–11070
Yan Y, Chu H, Jin Y et al (2022) Backpropagation with sparsity regularization for spiking neural network learning. Front Neurosci, 16:760298
Lin J, Gan C Han S (2019) Defensive quantization: When efficiency meets robustness. In: Proc Int Conf Learn Represent
Li H, Liu H, Ji X et al (2017) Cifar10-dvs: an event-stream dataset for object classification. Front Neurosci, 11:309
Rathi N, Roy K (2020) Enabling deep spiking neural networks with hybrid conversion and spike timing dependent backpropagation. In: Proc Int Conf Learn Represent
Rathi N, Roy K (2021) Diet-snn: A low-latency spiking neural network with direct input encoding and leakage and threshold optimization. IEEE Trans Neural Netw Learn Syst, pp 3174–3182
Fang W, Yu Z, Chen Y, et al (2021) Incorporating learnable membrane time constant to enhance learning of spiking neural networks. In: Proc IEEE Int Conf Comput Vis, pp 2661–2671
Sun H, Cai W, Yang B, Cui Y, Xia Y, Yao D, Guo D (2023) A synapse-threshold synergistic learning approach for spiking neural networks. IEEE Trans Cogn Dev Syst
Xiao M, Meng Q, Zhang Z et al (2021) Training feedback spiking neural networks by implicit differentiation on the equilibrium state. Proc Adv Neural Inf Process Syst, 34:14516–14528
Hao Z, Bu T Ding J, Huang T Yu Z (2023) Reducing ANN-SNN Conversion Error through Residual Membrane Potential. In: Proc AAAI Conf Artif Intell
Deng S, Li Y, Zhang S, Gu S (2022) Temporal efficient training of spiking neural network via gradient re-weighting. In: Proc Int Conf Learn Represent
Yao Xingting, Li Fanrong, Mo Zitao, Cheng Jian (2022) Glif: A unified gated leaky integrate-and-fire neuron for spiking neural networks. Proc Adv Neural Inf Process Syst, 35:32160–32171
Yan Z, Zhou J, Wong WF (2021) Near lossless transfer learning for spiking neural networks. In: Proc AAAI Conf Artif Intell, pp 10577–10584
Wu J, Chua Y, Zhang M, et al (2021) A tandem learning rule for effective training and rapid inference of deep spiking neural networks. IEEE Trans Neural Netw Learn Syst
Wu Z, Zhang H, Lin Y, et al (2021) Liaf-net: Leaky integrate and analog fire network for lightweight and efficient spatiotemporal information processing. IEEE Trans Neural Netw Learn Syst
Acknowledgements
This work was supported by the National Key Research and Development Program of China (Grant No. 2018YFB1306600), the National Natural Science Foundation of China (Grant Nos. 62076207, 62076208, and U20A20227), and the Science and Technology Plan Program of Yubei District of Chongqing (Grant No. 2021-17)
Author information
Authors and Affiliations
Contributions
Tao Chen: Conceptualization, Methodology, Software, Writing – original draft. Shu Wang: Visualization, Validation, Revised paper. Yu Gong: Investigation, Validation, Revised paper. Lidan Wang: Supervision, Writing – review & editing, Project administration, Funding acquisition. Shukai Duan: Project administration, Funding acquisition.
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest
Ethical and informed consent for data used
Not applicable
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Detailed Network architecture
The key component residual block in ResNet-19 is shown in Fig. 9. Conv is the convolutional layer, BN denotes batch normalization, and LIF denotes the LIF neurons. I[t] is the input spike train and O[t] is the output spike train. The detailed network architecture of ResNet-19 is shown in Table 7. \(C_{in}\) denote the input channels and \(C_{out}\) denotes the output channels. Block indicates the residual block shown in Fig. 9. Block4 and Block7 indicate that the stride of the first convolution layer of the residual block is 2 and the stride of the second convolution layer remains 1.
1.2 Energy efficiency calculation
Here we refer to [10] to give calculations on the power consumption of DNNs and SNNs. The energy consumption in DNNs is mainly focused on a large number of MAC operations, while the computation of SNNs is performed on binary events, and the energy consumption is mainly focused on AC operations. In this paper, we specify that each MAC and AC operation is performed in register transfer logic on 45nm CMOS technology. Considering 32-bit weight values, the energy consumed by the MAC operation and AC operation for a 32-bit integer is 3.2pJ and 0.1pJ, respectively. The floating point operations (FLOPS) of DNNs and SNNs can be obtained according to:
As for the convolutional layer, N denotes the input channels, M denotes the output channels, the input size is \(I \times I\), the kernel size is \(k \times k\), and the output size is \(O \times O\). For the fully connected layer, \(D_{in}\) denotes the size of the input, and \(D_{out}\) denotes the size of the output. \(S_A\) denotes the spike rate of each layer. Then, the inference energy of SNNs and DNNs can be calculated from:
where \(E_{MAC}\) is 3.2 pJ, \(E_{AC}\) is 0.1 pJ, and T is time steps. The inference energy is considering FLOPS across all N layers according to (19) and (20).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, T., Wang, S., Gong, Y. et al. Surrogate gradient scaling for directly training spiking neural networks. Appl Intell 53, 27966–27981 (2023). https://doi.org/10.1007/s10489-023-04966-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04966-x