iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://doi.org/10.1007/s10489-023-04966-x
Surrogate gradient scaling for directly training spiking neural networks | Applied Intelligence Skip to main content
Log in

Surrogate gradient scaling for directly training spiking neural networks

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Spiking neural networks (SNNs) are considered to be biologically plausible and can yield high energy efficiency when implemented on neuromorphic hardware due to their highly sparse asynchronous binary event-driven nature. Recently, surrogate gradient (SG) approaches have enabled SNNs to be trained from scratch with backpropagation (BP) algorithms under a deep learning framework. However, a popular SG approach known as straight-through estimator (STE), which only propagates the same gradient information, does not take into account the activation differences between the membrane potentials and output spikes. To address this issue, we propose surrogate gradient scaling (SGS), which scales up or down the gradient information of the membrane potential according to the sign of the gradient of the spiking neuron output and the difference between the membrane potential and the output of the spiking neuron. This SGS approach can also be applied to unimodal functions that propagate different gradient information from the output spikes to the input membrane potential. In addition, SNNs trained directly from scratch suffer from poor generalization performance, and we introduce Lipschitz regularization (LR), which is incorporated into the loss function. It not only improves the generalization performance of SNNs but also makes them more robust to noise. Extensive experimental results on several popular benchmark datasets (CIFAR10, CIFAR100 and CIFAR10-DVS) show that our approach not only outperforms the SOTA but also has lower inference latency. Remarkably, our SNNs can lead to 34\(\times \), 29\(\times \), and 17\(\times \) computation energy savings compared to standard Artificial neural networks (ANNs) on above three datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

Reproduction of ablation experiments are available at:https://github.com/CHNtao/SGS-SNN

References

  1. He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 770–778

  2. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proc IEEE Conf Comput. Vis Pattern Recognit, pp 3431–3440

  3. Zhao ZQ, Zheng P, St Xu et al (2019) Object detection with deep learning: A review. IEEE Trans Neural Netw Learn Syst 30(11):3212–3232

    Article  Google Scholar 

  4. Roy K, Jaiswal A, Panda P (2019) Towards spike-based machine intelligence with neuromorphic computing. Nature 575(7784):607–617

    Article  Google Scholar 

  5. Hua Y, Wan F, Gan H, et al (2022) Distributed estimation with cross-verification under false data-injection attacks. IEEE T Cybern

  6. Tavanaei A, Ghodrati M, Kheradpisheh SR et al (2019) Deep learning in spiking neural networks. Neural Netw, 111:47–63

  7. Rajendran B, Sebastian A, Schmuker M et al (2019) Low-power neuromorphic hardware for signal processing applications: A review of architectural and system-level design approaches. IEEE Signal Process Mag, 36(6):97–110

  8. Maass W (1997) Networks of spiking neurons: the third generation of neural network models. Neural Netw, 10(9):1659–1671

    Article  Google Scholar 

  9. Zambrano D, Nusselder R, Scholte HS et al (2019) Sparse computation in adaptive spiking neural networks. Front Neurosci, 12:987

    Article  Google Scholar 

  10. Panda P, Aketi SA, Roy K (2020) Toward scalable, efficient, and accurate deep spiking neural networks with backward residual connections, stochastic softmax, and hybridization. Front Neurosci, 14:653

    Article  Google Scholar 

  11. Davies M, Srinivasa N, Lin TH et al (2018) Loihi: A neuromorphic manycore processor with on-chip learning. IEEE Micro, 38(1):82–99

    Article  Google Scholar 

  12. Pei J, Deng L, Song S et al (2019) Towards artificial general intelligence with hybrid tianjic chip architecture. Nature 572(7767):106–111

    Article  Google Scholar 

  13. Ponulak F, Kasiński A (2010) Supervised learning in spiking neural networks with resume: sequence learning, classification, and spike shifting. Neural Comput, 22(2):467–510

    Article  MathSciNet  MATH  Google Scholar 

  14. Bohte SM, Kok JN, La Poutré JA (2000) Spikeprop: backpropagation for networks of spiking neurons. In: ESANN, Bruges, pp 419–424

  15. Gütig R, Sompolinsky H (2006) The tempotron: A neuron that learns spike timing-based decisions. Nat Neurosci, 9(3):420-428

    Article  Google Scholar 

  16. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444

    Article  Google Scholar 

  17. Han B, Srinivasan G, Roy K (2020) Rmp-snn: Residual membrane potential neuron for enabling deeper high-accuracy and low-latency spiking neural network. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 13558–13567

  18. Sengupta A, Ye Y, Wang R et al (2019) Going deeper in spiking neural networks: Vgg and residual architectures. Front Neurosci, 13:95

  19. Han B, Roy K (2020) Deep spiking neural network: Energy efficiency through time based coding. In: Proc Eur Conf Comput Vis, pp 388–404

  20. Bu T, Ding J, Yu Z, Huang T (2022) Optimized potential initialization for low-latency spikingneural networks. In: Proc AAAI Conf Artif Intell, pp 11–20

  21. Nitin R, Gopalakrishnan S, Priyadarshini P, Kaushik R (2020) Enabling Deep Spiking Neural Networks with Hybrid Conversion and Spike Timing Dependent Backpropagation. In: Proc Int Conf Learn Represent

  22. Han B, Kaushik R (2020) Deep spiking neural network: Energy efficiency through time based coding. In: Proc Eur Conf Comput Vis, pp 388–404

  23. Bu T, Fang W, Ding J, et al (2021) Optimal ann-snn conversion for high-accuracy and ultra-low-latency spiking neural networks. In: Proc Int Conf Learn Represent

  24. Neftci EO, Mostafa H, Zenke F (2019) Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks. IEEE Signal Process Mag, 36(6):51–63

    Article  Google Scholar 

  25. Lee JH, Delbruck T, Pfeiffer M (2016) Training deep spiking neural networks using backpropagation. Front Neurosci, 10:508

  26. Wu Y, Deng L, Li G et al (2018) Spatio-temporal backpropagation for training high-performance spiking neural networks. Front Neurosci, 12:331

  27. Wu Y, Deng L, Li G, et al (2019) Direct training for spiking neural networks: Faster, larger, better. In: Proc AAAI Conf Artif Intell, pp 1311–1318

  28. Neftci EO, Augustine C, Paul S et al (2017) Event-driven random back-propagation: Enabling neuromorphic deep learning machines. Front Neurosci, 11:324

  29. Woźniak S, Pantazi A, Bohnstingl T et al (2020) Deep learning incorporating biologically inspired neural dynamics and in-memory computing. Nat Mach Intell, 2(6):325–336

  30. Lee C, Sarwar SS, Panda P, et al (2020) Enabling spike-based backpropagation for training deep neural network architectures. Front Neurosci, p 119

  31. Liu Z, Cheng K-T, Huang D Xing EP, Shen Z (2022) Nonuniform-to-uniform quantization: Towards accurate quantization via generalized straight-through estimation. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 4942–4952

  32. Lee J, Kim D, Ham B (2021) Network quantization with element-wise gradient scaling. In: Proc IEEE Conf Comput Vis Pattern Recognit, pp 6448–6457

  33. Bellec G, Salaj D, Subramoney A, et al (2018) Long short-term memory and learning-to-learn in networks of spiking neurons. Proc Adv Neural Inf Process Syst, 31

  34. Shrestha SB, Orchard G (2018) Slayer: Spike layer error reassignment in time. Proc Adv Neural Inf Process Syst, 31

  35. Zenke F, Ganguli S (2018) Superspike: Supervised learning in multilayer spiking neural networks. Neural Comput, 30(6):1514-1541

    Article  MathSciNet  MATH  Google Scholar 

  36. Chen Y, Zhang S, Ren S, et al (2022) Gradual surrogate gradient learning in deep spiking neural networks. In: Proc IEEE Int Conf Acoust Speech Signal Process., IEEE, pp 8927–8931

  37. Li Y, Guo Y, Zhang S et al (2021) Differentiable spike: Rethinking gradient-descent for training spiking neural networks. Proc Adv Neural Inf Process Syst, 34:23426–23439

  38. Yang Y, Zhang W, Li P (2021) Backpropagated neighborhood aggregation for accurate training of spiking neural networks. In: Proc Int Conf Mach Learn, PMLR, pp 11852–11862

  39. Kim Y, Panda P (2020) Revisiting batch normalization for training low-latency deep spiking neural networks from scratch. Front Neurosci, p 1638

  40. Zheng H, Wu Y, Deng L, et al (2021) Going deeper with directly-trained larger spiking neural networks. In: Proc AAAI Conf Artif Intell, pp 11062–11070

  41. Yan Y, Chu H, Jin Y et al (2022) Backpropagation with sparsity regularization for spiking neural network learning. Front Neurosci, 16:760298

  42. Lin J, Gan C Han S (2019) Defensive quantization: When efficiency meets robustness. In: Proc Int Conf Learn Represent

  43. Li H, Liu H, Ji X et al (2017) Cifar10-dvs: an event-stream dataset for object classification. Front Neurosci, 11:309

  44. Rathi N, Roy K (2020) Enabling deep spiking neural networks with hybrid conversion and spike timing dependent backpropagation. In: Proc Int Conf Learn Represent

  45. Rathi N, Roy K (2021) Diet-snn: A low-latency spiking neural network with direct input encoding and leakage and threshold optimization. IEEE Trans Neural Netw Learn Syst, pp 3174–3182

  46. Fang W, Yu Z, Chen Y, et al (2021) Incorporating learnable membrane time constant to enhance learning of spiking neural networks. In: Proc IEEE Int Conf Comput Vis, pp 2661–2671

  47. Sun H, Cai W, Yang B, Cui Y, Xia Y, Yao D, Guo D (2023) A synapse-threshold synergistic learning approach for spiking neural networks. IEEE Trans Cogn Dev Syst

  48. Xiao M, Meng Q, Zhang Z et al (2021) Training feedback spiking neural networks by implicit differentiation on the equilibrium state. Proc Adv Neural Inf Process Syst, 34:14516–14528

  49. Hao Z, Bu T Ding J, Huang T Yu Z (2023) Reducing ANN-SNN Conversion Error through Residual Membrane Potential. In: Proc AAAI Conf Artif Intell

  50. Deng S, Li Y, Zhang S, Gu S (2022) Temporal efficient training of spiking neural network via gradient re-weighting. In: Proc Int Conf Learn Represent

  51. Yao Xingting, Li Fanrong, Mo Zitao, Cheng Jian (2022) Glif: A unified gated leaky integrate-and-fire neuron for spiking neural networks. Proc Adv Neural Inf Process Syst, 35:32160–32171

  52. Yan Z, Zhou J, Wong WF (2021) Near lossless transfer learning for spiking neural networks. In: Proc AAAI Conf Artif Intell, pp 10577–10584

  53. Wu J, Chua Y, Zhang M, et al (2021) A tandem learning rule for effective training and rapid inference of deep spiking neural networks. IEEE Trans Neural Netw Learn Syst

  54. Wu Z, Zhang H, Lin Y, et al (2021) Liaf-net: Leaky integrate and analog fire network for lightweight and efficient spatiotemporal information processing. IEEE Trans Neural Netw Learn Syst

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China (Grant No. 2018YFB1306600), the National Natural Science Foundation of China (Grant Nos. 62076207, 62076208, and U20A20227), and the Science and Technology Plan Program of Yubei District of Chongqing (Grant No. 2021-17)

Author information

Authors and Affiliations

Authors

Contributions

Tao Chen: Conceptualization, Methodology, Software, Writing – original draft. Shu Wang: Visualization, Validation, Revised paper. Yu Gong: Investigation, Validation, Revised paper. Lidan Wang: Supervision, Writing – review & editing, Project administration, Funding acquisition. Shukai Duan: Project administration, Funding acquisition.

Corresponding author

Correspondence to Lidan Wang.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest

Ethical and informed consent for data used

Not applicable

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Detailed Network architecture

The key component residual block in ResNet-19 is shown in Fig. 9. Conv is the convolutional layer, BN denotes batch normalization, and LIF denotes the LIF neurons. I[t] is the input spike train and O[t] is the output spike train. The detailed network architecture of ResNet-19 is shown in Table 7. \(C_{in}\) denote the input channels and \(C_{out}\) denotes the output channels. Block indicates the residual block shown in Fig. 9. Block4 and Block7 indicate that the stride of the first convolution layer of the residual block is 2 and the stride of the second convolution layer remains 1.

Fig. 9
figure 9

Residual block of Spiking ResNet-19

Table 7 The detailed architecture of ResNet-19

1.2 Energy efficiency calculation

Here we refer to [10] to give calculations on the power consumption of DNNs and SNNs. The energy consumption in DNNs is mainly focused on a large number of MAC operations, while the computation of SNNs is performed on binary events, and the energy consumption is mainly focused on AC operations. In this paper, we specify that each MAC and AC operation is performed in register transfer logic on 45nm CMOS technology. Considering 32-bit weight values, the energy consumed by the MAC operation and AC operation for a 32-bit integer is 3.2pJ and 0.1pJ, respectively. The floating point operations (FLOPS) of DNNs and SNNs can be obtained according to:

$$\begin{aligned} {F_{DNNs}} = \left\{ {\begin{array}{*{10}{l}} {{O^2}*N*{k^2}*M,Conv}\\ {{D_{in}} * {D_{out}},FC} \end{array}} \right. \end{aligned}$$
(17)
$$\begin{aligned} {F_{SNNs}} = \left\{ {\begin{array}{*{10}{l}} {{O^2} * N*{k^2} * M * {S_A},Conv}\\ {{D_{in}} * {D_{out}} * {S_A},FC} \end{array}} \right. \end{aligned}$$
(18)

As for the convolutional layer, N denotes the input channels, M denotes the output channels, the input size is \(I \times I\), the kernel size is \(k \times k\), and the output size is \(O \times O\). For the fully connected layer, \(D_{in}\) denotes the size of the input, and \(D_{out}\) denotes the size of the output. \(S_A\) denotes the spike rate of each layer. Then, the inference energy of SNNs and DNNs can be calculated from:

$$\begin{aligned} {E_{DNNs}}= & {} (\sum \limits _{i = 1}^N {{F_{DNNs}}} ) * {E_{MAC}}\end{aligned}$$
(19)
$$\begin{aligned} {E_{SNNs}}= & {} (\sum \limits _{i = 1}^N {{F_{SNNs}}} ) * {E_{AC}} * T \end{aligned}$$
(20)

where \(E_{MAC}\) is 3.2 pJ, \(E_{AC}\) is 0.1 pJ, and T is time steps. The inference energy is considering FLOPS across all N layers according to (19) and (20).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, T., Wang, S., Gong, Y. et al. Surrogate gradient scaling for directly training spiking neural networks. Appl Intell 53, 27966–27981 (2023). https://doi.org/10.1007/s10489-023-04966-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04966-x

Keywords

Navigation