Dual experience replay-based TD3 for single intersection signal control

Gao, Yichao; Zhou, Dake; Shen, Yaqi; Yang, Xin

doi:10.1007/s11227-024-06047-3

Dual experience replay-based TD3 for single intersection signal control

Published: 29 March 2024

Volume 80, pages 15161–15182, (2024)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Yichao Gao¹,
Dake Zhou¹,
Yaqi Shen¹ &
…
Xin Yang¹

188 Accesses
Explore all metrics

Abstract

Compared to traditional traffic signal control methods, the method driven by Deep Reinforcement Learning (DRL) has shown better performance. But the problem of low sample utilization in reinforcement learning also arises. To deal with the problem, this paper presents a novel Twin Delayed Deep Deterministic Policy Gradient with Dual Buffer (TD3_DB) for traffic signal control. In the proposed framework, two experience buffers are used to store important samples and normal samples, separately, and the proportion of the two buffers is adjusted adaptively. In addition, lane pressure, describing the dynamic feature of lane traffic flow, is used for the state design of the TD3 agent, which enhances the perception of the agent toward intersections. Comprehensive experiments on different traffic flow modes has shown, the dual experience replay scheme can improve the sample utilization, and the proposed TD3_DB performs better than other methods such as original TD3, Proximal Policy Optimization (PPO), etc., effectively reducing vehicle queue length and waiting time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pri-DDQN: learning adaptive traffic signal control strategy through a hybrid agent

Article Open access 18 November 2024

Adaptive urban traffic signal control based on enhanced deep reinforcement learning

Article Open access 19 June 2024

E-DBRL: efficient double broad reinforcement learning for adaptive traffic signal control

Article 29 June 2024

Data availability

The datasets and code are available from the corresponding author on reasonable request.

References

Li L, Feiyue W (2018) A century review and future prospect of ground traffic control. J Autom 44(4):7 (in Chinese)
Shaikh PW, El-Abd M, Khanafer M, Gao K (2020) A review on swarm intelligence and evolutionary algorithms for solving the traffic signal control problem. IEEE Trans Intell Transp Syst 23(1):48–63
Article Google Scholar
Webster FV (1958) Traffic signal settings. Tech. Rep
Quan L, Jianwei Z, Zongchang Z et al (2018) A review of deep reinforcement learning. J Comput Sci 41(1):27 (in Chinese)
Sutton RS, Barto AG et al (1998) Introduction to reinforcement learning, vol 135. MIT Press, Cambridge
Dongwei X, Lei Z, Da W et al (2022) A review of urban traffic signal control based on deep reinforcement learning. J Transp Eng Inf, pp 020-001 (in Chinese)
Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: International Conference on Machine Learning, PMLR, pp 1587–1596
Thorpe TL, Anderson CW (1996) Tra c light control using sarsa with three state representations. Technical report, Citeseer
Arel I, Liu C, Urbanik T, Kohls AG (2010) Reinforcement learning-based multi-agent system for network traffic signal control. IET Intell Transp Syst 4(2):128–135
Article Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Article Google Scholar
Genders W, Razavi S (2016) Using a deep reinforcement learning agent for traffic signal control. arXiv preprint arXiv:1611.01142
Mousavi SS, Schukat M, Howley E (2017) Traffic light control using deep policy-gradient and value-function-based reinforcement learning. IET Intell Transp Syst 11(7):417–423
Article Google Scholar
Tan T, Bao F, Deng Y, Jin A, Dai Q, Wang J (2019) Cooperative deep reinforcement learning for large-scale traffic grid signal control. IEEE Trans Cybern 50(6):2687–2700
Article Google Scholar
Kim G, Sohn K (2022) Area-wide traffic signal control based on a deep graph Q-network (DGQN) trained in an asynchronous manner. Appl Soft Comput 119:108497
Article Google Scholar
Liang X, Du X, Wang G, Han Z (2019) A deep reinforcement learning network for traffic light cycle control. IEEE Trans Veh Technol 68(2):1243–1253
Article Google Scholar
Zhi L, Shipeng C, Yang S et al (2020) Single intersection signal control based on improved deep reinforcement learning method. Comput Sci 47(12):7 (in Chinese)
Lijun L, Zhou W, Zhen Y (2021) An improved deep deterministic policy gradient network traffic signal control system. J Sichuan Univ (Nat Sci Edn) 058(004):87–93 (in Chinese)
Miletić M, Ivanjko E, Gregurić M, Kušić K (2022) A review of reinforcement learning applications in adaptive traffic signal control. IET Intell Transp Syst 16(10):1269–1285
Article Google Scholar
Van Otterlo M, Wiering M (2012) Reinforcement learning and markov decision processes. In: Reinforcement learning: state-of-the-art. Springer, Berlin, pp 3–42
Wang Z, Bapst V, Heess N, Mnih V, Munos R, Kavukcuoglu K, de Freitas N (2016) Sample efficient actor-critic with experience replay. arXiv preprint arXiv:1611.01224
Munos R, Stepleton T, Harutyunyan A, Bellemare M (2016) Safe and efficient off-policy reinforcement learning. Adv Neural Inf Process Syst 29
Wu Y, Mansimov E, Grosse RB, Liao S, Ba J (2017) Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. Adv Neural Inf Process Syst 30
Doerr A, Volpp M, Toussaint M, Sebastian T, Daniel C (2019) Trajectory-based off-policy deep reinforcement learning. In: International conference on machine learning, PMLR, pp 1636–1645
Li M, Huang T, Zhu W (2022) Clustering experience replay for the effective exploitation in reinforcement learning. Pattern Recogn 131:108875
Article Google Scholar
Beyene SW, Han J-H (2022) Prioritized hindsight with dual buffer for meta-reinforcement learning. Electronics 11(24):4192
Article Google Scholar
Wei H, Chen C, Zheng, G Wu K, Gayah V, Xu K, Li Z (2019) Presslight: learning max pressure control to coordinate traffic signals in arterial network. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp 1290–1298
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Luong M-T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971

Download references

Acknowledgements

Not applicable

Funding

Not applicable

Author information

Authors and Affiliations

College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
Yichao Gao, Dake Zhou, Yaqi Shen & Xin Yang

Authors

Yichao Gao
View author publications
You can also search for this author in PubMed Google Scholar
Dake Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yaqi Shen
View author publications
You can also search for this author in PubMed Google Scholar
Xin Yang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Yichao G wrote the main manuscript text. Yaqi S helped prepare figures and analyzed data. Dake Z and Xin Y provided guidance and improved. All authors reviewed the manuscript.

Corresponding author

Correspondence to Dake Zhou.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

Not applicable

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gao, Y., Zhou, D., Shen, Y. et al. Dual experience replay-based TD3 for single intersection signal control. J Supercomput 80, 15161–15182 (2024). https://doi.org/10.1007/s11227-024-06047-3

Download citation

Accepted: 05 March 2024
Published: 29 March 2024
Issue Date: July 2024
DOI: https://doi.org/10.1007/s11227-024-06047-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dual experience replay-based TD3 for single intersection signal control

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Pri-DDQN: learning adaptive traffic signal control strategy through a hybrid agent

Adaptive urban traffic signal control based on enhanced deep reinforcement learning

E-DBRL: efficient double broad reinforcement learning for adaptive traffic signal control

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Dual experience replay-based TD3 for single intersection signal control

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Pri-DDQN: learning adaptive traffic signal control strategy through a hybrid agent

Adaptive urban traffic signal control based on enhanced deep reinforcement learning

E-DBRL: efficient double broad reinforcement learning for adaptive traffic signal control

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation