Surrogate gradient scaling for directly training spiking neural networks

Chen, Tao; Wang, Shu; Gong, Yu; Wang, Lidan; Duan, Shukai

doi:10.1007/s10489-023-04966-x

Surrogate gradient scaling for directly training spiking neural networks

Published: 20 September 2023

Volume 53, pages 27966–27981, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Tao Chen¹,
Shu Wang¹,
Yu Gong¹,
Lidan Wang ORCID: orcid.org/0000-0003-0730-4202^1,2,3,4 &
…
Shukai Duan^1,2,3,4

650 Accesses
2 Citations
Explore all metrics

Abstract

Spiking neural networks (SNNs) are considered to be biologically plausible and can yield high energy efficiency when implemented on neuromorphic hardware due to their highly sparse asynchronous binary event-driven nature. Recently, surrogate gradient (SG) approaches have enabled SNNs to be trained from scratch with backpropagation (BP) algorithms under a deep learning framework. However, a popular SG approach known as straight-through estimator (STE), which only propagates the same gradient information, does not take into account the activation differences between the membrane potentials and output spikes. To address this issue, we propose surrogate gradient scaling (SGS), which scales up or down the gradient information of the membrane potential according to the sign of the gradient of the spiking neuron output and the difference between the membrane potential and the output of the spiking neuron. This SGS approach can also be applied to unimodal functions that propagate different gradient information from the output spikes to the input membrane potential. In addition, SNNs trained directly from scratch suffer from poor generalization performance, and we introduce Lipschitz regularization (LR), which is incorporated into the loss function. It not only improves the generalization performance of SNNs but also makes them more robust to noise. Extensive experimental results on several popular benchmark datasets (CIFAR10, CIFAR100 and CIFAR10-DVS) show that our approach not only outperforms the SOTA but also has lower inference latency. Remarkably, our SNNs can lead to 34$\times $, 29$\times $, and 17$\times $ computation energy savings compared to standard Artificial neural networks (ANNs) on above three datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Neural Architecture Search for Spiking Neural Networks

Event-based backpropagation can compute exact gradients for spiking neural networks

Article Open access 18 June 2021

High-performance deep spiking neural networks with 0.3 spikes per neuron

Article Open access 09 August 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

Reproduction of ablation experiments are available at:https://github.com/CHNtao/SGS-SNN

References

He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 770–778
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proc IEEE Conf Comput. Vis Pattern Recognit, pp 3431–3440
Zhao ZQ, Zheng P, St Xu et al (2019) Object detection with deep learning: A review. IEEE Trans Neural Netw Learn Syst 30(11):3212–3232
Article Google Scholar
Roy K, Jaiswal A, Panda P (2019) Towards spike-based machine intelligence with neuromorphic computing. Nature 575(7784):607–617
Article Google Scholar
Hua Y, Wan F, Gan H, et al (2022) Distributed estimation with cross-verification under false data-injection attacks. IEEE T Cybern
Tavanaei A, Ghodrati M, Kheradpisheh SR et al (2019) Deep learning in spiking neural networks. Neural Netw, 111:47–63
Rajendran B, Sebastian A, Schmuker M et al (2019) Low-power neuromorphic hardware for signal processing applications: A review of architectural and system-level design approaches. IEEE Signal Process Mag, 36(6):97–110
Maass W (1997) Networks of spiking neurons: the third generation of neural network models. Neural Netw, 10(9):1659–1671
Article Google Scholar
Zambrano D, Nusselder R, Scholte HS et al (2019) Sparse computation in adaptive spiking neural networks. Front Neurosci, 12:987
Article Google Scholar
Panda P, Aketi SA, Roy K (2020) Toward scalable, efficient, and accurate deep spiking neural networks with backward residual connections, stochastic softmax, and hybridization. Front Neurosci, 14:653
Article Google Scholar
Davies M, Srinivasa N, Lin TH et al (2018) Loihi: A neuromorphic manycore processor with on-chip learning. IEEE Micro, 38(1):82–99
Article Google Scholar
Pei J, Deng L, Song S et al (2019) Towards artificial general intelligence with hybrid tianjic chip architecture. Nature 572(7767):106–111
Article Google Scholar
Ponulak F, Kasiński A (2010) Supervised learning in spiking neural networks with resume: sequence learning, classification, and spike shifting. Neural Comput, 22(2):467–510
Article MathSciNet MATH Google Scholar
Bohte SM, Kok JN, La Poutré JA (2000) Spikeprop: backpropagation for networks of spiking neurons. In: ESANN, Bruges, pp 419–424
Gütig R, Sompolinsky H (2006) The tempotron: A neuron that learns spike timing-based decisions. Nat Neurosci, 9(3):420-428
Article Google Scholar
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Article Google Scholar
Han B, Srinivasan G, Roy K (2020) Rmp-snn: Residual membrane potential neuron for enabling deeper high-accuracy and low-latency spiking neural network. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 13558–13567
Sengupta A, Ye Y, Wang R et al (2019) Going deeper in spiking neural networks: Vgg and residual architectures. Front Neurosci, 13:95
Han B, Roy K (2020) Deep spiking neural network: Energy efficiency through time based coding. In: Proc Eur Conf Comput Vis, pp 388–404
Bu T, Ding J, Yu Z, Huang T (2022) Optimized potential initialization for low-latency spikingneural networks. In: Proc AAAI Conf Artif Intell, pp 11–20
Nitin R, Gopalakrishnan S, Priyadarshini P, Kaushik R (2020) Enabling Deep Spiking Neural Networks with Hybrid Conversion and Spike Timing Dependent Backpropagation. In: Proc Int Conf Learn Represent
Han B, Kaushik R (2020) Deep spiking neural network: Energy efficiency through time based coding. In: Proc Eur Conf Comput Vis, pp 388–404
Bu T, Fang W, Ding J, et al (2021) Optimal ann-snn conversion for high-accuracy and ultra-low-latency spiking neural networks. In: Proc Int Conf Learn Represent
Neftci EO, Mostafa H, Zenke F (2019) Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks. IEEE Signal Process Mag, 36(6):51–63
Article Google Scholar
Lee JH, Delbruck T, Pfeiffer M (2016) Training deep spiking neural networks using backpropagation. Front Neurosci, 10:508
Wu Y, Deng L, Li G et al (2018) Spatio-temporal backpropagation for training high-performance spiking neural networks. Front Neurosci, 12:331
Wu Y, Deng L, Li G, et al (2019) Direct training for spiking neural networks: Faster, larger, better. In: Proc AAAI Conf Artif Intell, pp 1311–1318
Neftci EO, Augustine C, Paul S et al (2017) Event-driven random back-propagation: Enabling neuromorphic deep learning machines. Front Neurosci, 11:324
Woźniak S, Pantazi A, Bohnstingl T et al (2020) Deep learning incorporating biologically inspired neural dynamics and in-memory computing. Nat Mach Intell, 2(6):325–336
Lee C, Sarwar SS, Panda P, et al (2020) Enabling spike-based backpropagation for training deep neural network architectures. Front Neurosci, p 119
Liu Z, Cheng K-T, Huang D Xing EP, Shen Z (2022) Nonuniform-to-uniform quantization: Towards accurate quantization via generalized straight-through estimation. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 4942–4952
Lee J, Kim D, Ham B (2021) Network quantization with element-wise gradient scaling. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 6448–6457
Bellec G, Salaj D, Subramoney A, et al (2018) Long short-term memory and learning-to-learn in networks of spiking neurons. Proc Adv Neural Inf Process Syst, 31
Shrestha SB, Orchard G (2018) Slayer: Spike layer error reassignment in time. Proc Adv Neural Inf Process Syst, 31
Zenke F, Ganguli S (2018) Superspike: Supervised learning in multilayer spiking neural networks. Neural Comput, 30(6):1514-1541
Article MathSciNet MATH Google Scholar
Chen Y, Zhang S, Ren S, et al (2022) Gradual surrogate gradient learning in deep spiking neural networks. In: Proc IEEE Int Conf Acoust Speech Signal Process., IEEE, pp 8927–8931
Li Y, Guo Y, Zhang S et al (2021) Differentiable spike: Rethinking gradient-descent for training spiking neural networks. Proc Adv Neural Inf Process Syst, 34:23426–23439
Yang Y, Zhang W, Li P (2021) Backpropagated neighborhood aggregation for accurate training of spiking neural networks. In: Proc Int Conf Mach Learn, PMLR, pp 11852–11862
Kim Y, Panda P (2020) Revisiting batch normalization for training low-latency deep spiking neural networks from scratch. Front Neurosci, p 1638
Zheng H, Wu Y, Deng L, et al (2021) Going deeper with directly-trained larger spiking neural networks. In: Proc AAAI Conf Artif Intell, pp 11062–11070
Yan Y, Chu H, Jin Y et al (2022) Backpropagation with sparsity regularization for spiking neural network learning. Front Neurosci, 16:760298
Lin J, Gan C Han S (2019) Defensive quantization: When efficiency meets robustness. In: Proc Int Conf Learn Represent
Li H, Liu H, Ji X et al (2017) Cifar10-dvs: an event-stream dataset for object classification. Front Neurosci, 11:309
Rathi N, Roy K (2020) Enabling deep spiking neural networks with hybrid conversion and spike timing dependent backpropagation. In: Proc Int Conf Learn Represent
Rathi N, Roy K (2021) Diet-snn: A low-latency spiking neural network with direct input encoding and leakage and threshold optimization. IEEE Trans Neural Netw Learn Syst, pp 3174–3182
Fang W, Yu Z, Chen Y, et al (2021) Incorporating learnable membrane time constant to enhance learning of spiking neural networks. In: Proc IEEE Int Conf Comput Vis, pp 2661–2671
Sun H, Cai W, Yang B, Cui Y, Xia Y, Yao D, Guo D (2023) A synapse-threshold synergistic learning approach for spiking neural networks. IEEE Trans Cogn Dev Syst
Xiao M, Meng Q, Zhang Z et al (2021) Training feedback spiking neural networks by implicit differentiation on the equilibrium state. Proc Adv Neural Inf Process Syst, 34:14516–14528
Hao Z, Bu T Ding J, Huang T Yu Z (2023) Reducing ANN-SNN Conversion Error through Residual Membrane Potential. In: Proc AAAI Conf Artif Intell
Deng S, Li Y, Zhang S, Gu S (2022) Temporal efficient training of spiking neural network via gradient re-weighting. In: Proc Int Conf Learn Represent
Yao Xingting, Li Fanrong, Mo Zitao, Cheng Jian (2022) Glif: A unified gated leaky integrate-and-fire neuron for spiking neural networks. Proc Adv Neural Inf Process Syst, 35:32160–32171
Yan Z, Zhou J, Wong WF (2021) Near lossless transfer learning for spiking neural networks. In: Proc AAAI Conf Artif Intell, pp 10577–10584
Wu J, Chua Y, Zhang M, et al (2021) A tandem learning rule for effective training and rapid inference of deep spiking neural networks. IEEE Trans Neural Netw Learn Syst
Wu Z, Zhang H, Lin Y, et al (2021) Liaf-net: Leaky integrate and analog fire network for lightweight and efficient spatiotemporal information processing. IEEE Trans Neural Netw Learn Syst

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China (Grant No. 2018YFB1306600), the National Natural Science Foundation of China (Grant Nos. 62076207, 62076208, and U20A20227), and the Science and Technology Plan Program of Yubei District of Chongqing (Grant No. 2021-17)

Author information

Authors and Affiliations

College of Artificial Intelligence, Southwest University, 400715, Chongqing, China
Tao Chen, Shu Wang, Yu Gong, Lidan Wang & Shukai Duan
Brain-Inspired Computing & Intelligent Control of Chongqing Key Lab, 400715, Chongqing, China
Lidan Wang & Shukai Duan
National & Local Joint Engineering Laboratory of Intelligent Transmission and Control Technology, 400715, Chongqing, China
Lidan Wang & Shukai Duan
Chongqing Brain Science Collaborative Innovation Center, 400715, Chongqing, China
Lidan Wang & Shukai Duan

Authors

Tao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yu Gong
View author publications
You can also search for this author in PubMed Google Scholar
Lidan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shukai Duan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Tao Chen: Conceptualization, Methodology, Software, Writing – original draft. Shu Wang: Visualization, Validation, Revised paper. Yu Gong: Investigation, Validation, Revised paper. Lidan Wang: Supervision, Writing – review & editing, Project administration, Funding acquisition. Shukai Duan: Project administration, Funding acquisition.

Corresponding author

Correspondence to Lidan Wang.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest

Ethical and informed consent for data used

Not applicable

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Detailed Network architecture

The key component residual block in ResNet-19 is shown in Fig. 9. Conv is the convolutional layer, BN denotes batch normalization, and LIF denotes the LIF neurons. I[t] is the input spike train and O[t] is the output spike train. The detailed network architecture of ResNet-19 is shown in Table 7. $C_{in}$ denote the input channels and $C_{out}$ denotes the output channels. Block indicates the residual block shown in Fig. 9. Block4 and Block7 indicate that the stride of the first convolution layer of the residual block is 2 and the stride of the second convolution layer remains 1.

Table 7 The detailed architecture of ResNet-19

Full size table

1.2 Energy efficiency calculation

Here we refer to [10] to give calculations on the power consumption of DNNs and SNNs. The energy consumption in DNNs is mainly focused on a large number of MAC operations, while the computation of SNNs is performed on binary events, and the energy consumption is mainly focused on AC operations. In this paper, we specify that each MAC and AC operation is performed in register transfer logic on 45nm CMOS technology. Considering 32-bit weight values, the energy consumed by the MAC operation and AC operation for a 32-bit integer is 3.2pJ and 0.1pJ, respectively. The floating point operations (FLOPS) of DNNs and SNNs can be obtained according to:

$$\begin{aligned} {F_{DNNs}} = \left\{ {\begin{array}{*{10}{l}} {{O^2}*N*{k^2}*M,Conv}\\ {{D_{in}} * {D_{out}},FC} \end{array}} \right. \end{aligned}$$

(17)

$$\begin{aligned} {F_{SNNs}} = \left\{ {\begin{array}{*{10}{l}} {{O^2} * N*{k^2} * M * {S_A},Conv}\\ {{D_{in}} * {D_{out}} * {S_A},FC} \end{array}} \right. \end{aligned}$$

(18)

As for the convolutional layer, N denotes the input channels, M denotes the output channels, the input size is $I \times I$, the kernel size is $k \times k$, and the output size is $O \times O$. For the fully connected layer, $D_{in}$ denotes the size of the input, and $D_{out}$ denotes the size of the output. $S_A$ denotes the spike rate of each layer. Then, the inference energy of SNNs and DNNs can be calculated from:

$$\begin{aligned} {E_{DNNs}}= & {} (\sum \limits _{i = 1}^N {{F_{DNNs}}} ) * {E_{MAC}}\end{aligned}$$

(19)

$$\begin{aligned} {E_{SNNs}}= & {} (\sum \limits _{i = 1}^N {{F_{SNNs}}} ) * {E_{AC}} * T \end{aligned}$$

(20)

where $E_{MAC}$ is 3.2 pJ, $E_{AC}$ is 0.1 pJ, and T is time steps. The inference energy is considering FLOPS across all N layers according to (19) and (20).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, T., Wang, S., Gong, Y. et al. Surrogate gradient scaling for directly training spiking neural networks. Appl Intell 53, 27966–27981 (2023). https://doi.org/10.1007/s10489-023-04966-x

Download citation

Accepted: 12 August 2023
Published: 20 September 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s10489-023-04966-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Surrogate gradient scaling for directly training spiking neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Neural Architecture Search for Spiking Neural Networks

Event-based backpropagation can compute exact gradients for spiking neural networks

High-performance deep spiking neural networks with 0.3 spikes per neuron

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest

Ethical and informed consent for data used

Additional information

Publisher's Note

Appendix

1.1 Detailed Network architecture

1.2 Energy efficiency calculation

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Surrogate gradient scaling for directly training spiking neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Neural Architecture Search for Spiking Neural Networks

Event-based backpropagation can compute exact gradients for spiking neural networks

High-performance deep spiking neural networks with 0.3 spikes per neuron

Explore related subjects

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest

Ethical and informed consent for data used

Additional information

Publisher's Note

Appendix

Appendix

1.1 Detailed Network architecture

1.2 Energy efficiency calculation

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation