Abstract
Compared to traditional traffic signal control methods, the method driven by Deep Reinforcement Learning (DRL) has shown better performance. But the problem of low sample utilization in reinforcement learning also arises. To deal with the problem, this paper presents a novel Twin Delayed Deep Deterministic Policy Gradient with Dual Buffer (TD3_DB) for traffic signal control. In the proposed framework, two experience buffers are used to store important samples and normal samples, separately, and the proportion of the two buffers is adjusted adaptively. In addition, lane pressure, describing the dynamic feature of lane traffic flow, is used for the state design of the TD3 agent, which enhances the perception of the agent toward intersections. Comprehensive experiments on different traffic flow modes has shown, the dual experience replay scheme can improve the sample utilization, and the proposed TD3_DB performs better than other methods such as original TD3, Proximal Policy Optimization (PPO), etc., effectively reducing vehicle queue length and waiting time.
Similar content being viewed by others
Data availability
The datasets and code are available from the corresponding author on reasonable request.
References
Li L, Feiyue W (2018) A century review and future prospect of ground traffic control. J Autom 44(4):7 (in Chinese)
Shaikh PW, El-Abd M, Khanafer M, Gao K (2020) A review on swarm intelligence and evolutionary algorithms for solving the traffic signal control problem. IEEE Trans Intell Transp Syst 23(1):48–63
Webster FV (1958) Traffic signal settings. Tech. Rep
Quan L, Jianwei Z, Zongchang Z et al (2018) A review of deep reinforcement learning. J Comput Sci 41(1):27 (in Chinese)
Sutton RS, Barto AG et al (1998) Introduction to reinforcement learning, vol 135. MIT Press, Cambridge
Dongwei X, Lei Z, Da W et al (2022) A review of urban traffic signal control based on deep reinforcement learning. J Transp Eng Inf, pp 020-001 (in Chinese)
Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: International Conference on Machine Learning, PMLR, pp 1587–1596
Thorpe TL, Anderson CW (1996) Tra c light control using sarsa with three state representations. Technical report, Citeseer
Arel I, Liu C, Urbanik T, Kohls AG (2010) Reinforcement learning-based multi-agent system for network traffic signal control. IET Intell Transp Syst 4(2):128–135
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Genders W, Razavi S (2016) Using a deep reinforcement learning agent for traffic signal control. arXiv preprint arXiv:1611.01142
Mousavi SS, Schukat M, Howley E (2017) Traffic light control using deep policy-gradient and value-function-based reinforcement learning. IET Intell Transp Syst 11(7):417–423
Tan T, Bao F, Deng Y, Jin A, Dai Q, Wang J (2019) Cooperative deep reinforcement learning for large-scale traffic grid signal control. IEEE Trans Cybern 50(6):2687–2700
Kim G, Sohn K (2022) Area-wide traffic signal control based on a deep graph Q-network (DGQN) trained in an asynchronous manner. Appl Soft Comput 119:108497
Liang X, Du X, Wang G, Han Z (2019) A deep reinforcement learning network for traffic light cycle control. IEEE Trans Veh Technol 68(2):1243–1253
Zhi L, Shipeng C, Yang S et al (2020) Single intersection signal control based on improved deep reinforcement learning method. Comput Sci 47(12):7 (in Chinese)
Lijun L, Zhou W, Zhen Y (2021) An improved deep deterministic policy gradient network traffic signal control system. J Sichuan Univ (Nat Sci Edn) 058(004):87–93 (in Chinese)
Miletić M, Ivanjko E, Gregurić M, Kušić K (2022) A review of reinforcement learning applications in adaptive traffic signal control. IET Intell Transp Syst 16(10):1269–1285
Van Otterlo M, Wiering M (2012) Reinforcement learning and markov decision processes. In: Reinforcement learning: state-of-the-art. Springer, Berlin, pp 3–42
Wang Z, Bapst V, Heess N, Mnih V, Munos R, Kavukcuoglu K, de Freitas N (2016) Sample efficient actor-critic with experience replay. arXiv preprint arXiv:1611.01224
Munos R, Stepleton T, Harutyunyan A, Bellemare M (2016) Safe and efficient off-policy reinforcement learning. Adv Neural Inf Process Syst 29
Wu Y, Mansimov E, Grosse RB, Liao S, Ba J (2017) Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. Adv Neural Inf Process Syst 30
Doerr A, Volpp M, Toussaint M, Sebastian T, Daniel C (2019) Trajectory-based off-policy deep reinforcement learning. In: International conference on machine learning, PMLR, pp 1636–1645
Li M, Huang T, Zhu W (2022) Clustering experience replay for the effective exploitation in reinforcement learning. Pattern Recogn 131:108875
Beyene SW, Han J-H (2022) Prioritized hindsight with dual buffer for meta-reinforcement learning. Electronics 11(24):4192
Wei H, Chen C, Zheng, G Wu K, Gayah V, Xu K, Li Z (2019) Presslight: learning max pressure control to coordinate traffic signals in arterial network. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp 1290–1298
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Luong M-T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971
Acknowledgements
Not applicable
Funding
Not applicable
Author information
Authors and Affiliations
Contributions
Yichao G wrote the main manuscript text. Yaqi S helped prepare figures and analyzed data. Dake Z and Xin Y provided guidance and improved. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethical approval
Not applicable
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gao, Y., Zhou, D., Shen, Y. et al. Dual experience replay-based TD3 for single intersection signal control. J Supercomput 80, 15161–15182 (2024). https://doi.org/10.1007/s11227-024-06047-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-024-06047-3