iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://unpaywall.org/10.1007/S11227-024-06047-3
Dual experience replay-based TD3 for single intersection signal control | The Journal of Supercomputing Skip to main content
Log in

Dual experience replay-based TD3 for single intersection signal control

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Compared to traditional traffic signal control methods, the method driven by Deep Reinforcement Learning (DRL) has shown better performance. But the problem of low sample utilization in reinforcement learning also arises. To deal with the problem, this paper presents a novel Twin Delayed Deep Deterministic Policy Gradient with Dual Buffer (TD3_DB) for traffic signal control. In the proposed framework, two experience buffers are used to store important samples and normal samples, separately, and the proportion of the two buffers is adjusted adaptively. In addition, lane pressure, describing the dynamic feature of lane traffic flow, is used for the state design of the TD3 agent, which enhances the perception of the agent toward intersections. Comprehensive experiments on different traffic flow modes has shown, the dual experience replay scheme can improve the sample utilization, and the proposed TD3_DB performs better than other methods such as original TD3, Proximal Policy Optimization (PPO), etc., effectively reducing vehicle queue length and waiting time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

The datasets and code are available from the corresponding author on reasonable request.

References

  1. Li L, Feiyue W (2018) A century review and future prospect of ground traffic control. J Autom 44(4):7 (in Chinese)

  2. Shaikh PW, El-Abd M, Khanafer M, Gao K (2020) A review on swarm intelligence and evolutionary algorithms for solving the traffic signal control problem. IEEE Trans Intell Transp Syst 23(1):48–63

    Article  Google Scholar 

  3. Webster FV (1958) Traffic signal settings. Tech. Rep

  4. Quan L, Jianwei Z, Zongchang Z et al (2018) A review of deep reinforcement learning. J Comput Sci 41(1):27 (in Chinese)

  5. Sutton RS, Barto AG et al (1998) Introduction to reinforcement learning, vol 135. MIT Press, Cambridge

  6. Dongwei X, Lei Z, Da W et al (2022) A review of urban traffic signal control based on deep reinforcement learning. J Transp Eng Inf, pp 020-001 (in Chinese)

  7. Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: International Conference on Machine Learning, PMLR, pp 1587–1596

  8. Thorpe TL, Anderson CW (1996) Tra c light control using sarsa with three state representations. Technical report, Citeseer

  9. Arel I, Liu C, Urbanik T, Kohls AG (2010) Reinforcement learning-based multi-agent system for network traffic signal control. IET Intell Transp Syst 4(2):128–135

    Article  Google Scholar 

  10. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533

    Article  Google Scholar 

  11. Genders W, Razavi S (2016) Using a deep reinforcement learning agent for traffic signal control. arXiv preprint arXiv:1611.01142

  12. Mousavi SS, Schukat M, Howley E (2017) Traffic light control using deep policy-gradient and value-function-based reinforcement learning. IET Intell Transp Syst 11(7):417–423

    Article  Google Scholar 

  13. Tan T, Bao F, Deng Y, Jin A, Dai Q, Wang J (2019) Cooperative deep reinforcement learning for large-scale traffic grid signal control. IEEE Trans Cybern 50(6):2687–2700

    Article  Google Scholar 

  14. Kim G, Sohn K (2022) Area-wide traffic signal control based on a deep graph Q-network (DGQN) trained in an asynchronous manner. Appl Soft Comput 119:108497

    Article  Google Scholar 

  15. Liang X, Du X, Wang G, Han Z (2019) A deep reinforcement learning network for traffic light cycle control. IEEE Trans Veh Technol 68(2):1243–1253

    Article  Google Scholar 

  16. Zhi L, Shipeng C, Yang S et al (2020) Single intersection signal control based on improved deep reinforcement learning method. Comput Sci 47(12):7 (in Chinese)

  17. Lijun L, Zhou W, Zhen Y (2021) An improved deep deterministic policy gradient network traffic signal control system. J Sichuan Univ (Nat Sci Edn) 058(004):87–93 (in Chinese)

  18. Miletić M, Ivanjko E, Gregurić M, Kušić K (2022) A review of reinforcement learning applications in adaptive traffic signal control. IET Intell Transp Syst 16(10):1269–1285

    Article  Google Scholar 

  19. Van Otterlo M, Wiering M (2012) Reinforcement learning and markov decision processes. In: Reinforcement learning: state-of-the-art. Springer, Berlin, pp 3–42

  20. Wang Z, Bapst V, Heess N, Mnih V, Munos R, Kavukcuoglu K, de Freitas N (2016) Sample efficient actor-critic with experience replay. arXiv preprint arXiv:1611.01224

  21. Munos R, Stepleton T, Harutyunyan A, Bellemare M (2016) Safe and efficient off-policy reinforcement learning. Adv Neural Inf Process Syst 29

  22. Wu Y, Mansimov E, Grosse RB, Liao S, Ba J (2017) Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. Adv Neural Inf Process Syst 30

  23. Doerr A, Volpp M, Toussaint M, Sebastian T, Daniel C (2019) Trajectory-based off-policy deep reinforcement learning. In: International conference on machine learning, PMLR, pp 1636–1645

  24. Li M, Huang T, Zhu W (2022) Clustering experience replay for the effective exploitation in reinforcement learning. Pattern Recogn 131:108875

    Article  Google Scholar 

  25. Beyene SW, Han J-H (2022) Prioritized hindsight with dual buffer for meta-reinforcement learning. Electronics 11(24):4192

    Article  Google Scholar 

  26. Wei H, Chen C, Zheng, G Wu K, Gayah V, Xu K, Li Z (2019) Presslight: learning max pressure control to coordinate traffic signals in arterial network. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp 1290–1298

  27. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  28. Luong M-T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025

  29. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347

  30. Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971

Download references

Acknowledgements

Not applicable

Funding

Not applicable

Author information

Authors and Affiliations

Authors

Contributions

Yichao G wrote the main manuscript text. Yaqi S helped prepare figures and analyzed data. Dake Z and Xin Y provided guidance and improved. All authors reviewed the manuscript.

Corresponding author

Correspondence to Dake Zhou.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

Not applicable

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, Y., Zhou, D., Shen, Y. et al. Dual experience replay-based TD3 for single intersection signal control. J Supercomput 80, 15161–15182 (2024). https://doi.org/10.1007/s11227-024-06047-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-024-06047-3

Keywords

Navigation