Abstract
We tackle the problem of cooperative visual exploration where multiple agents need to jointly explore unseen regions as fast as possible based on visual signals. Classical planning-based methods often suffer from expensive computation overhead at each step and a limited expressiveness of complex cooperation strategy. By contrast, reinforcement learning (RL) has recently become a popular paradigm for tackling this challenge due to its modeling capability of arbitrarily complex strategies and minimal inference overhead. In this paper, we propose a novel RL-based multi-agent planning module, Multi-agent Spatial Planner (MSP). MSP leverages a transformer-based architecture, Spatial-TeamFormer, which effectively captures spatial relations and intra-agent interactions via hierarchical spatial self-attentions. In addition, we also implement a few multi-agent enhancements to process local information from each agent for an aligned spatial representation and more precise planning. Finally, we perform policy distillation to extract a meta policy to significantly improve the generalization capability of final policy. We call this overall solution, Multi-Agent Active Neural SLAM (MAANS). MAANS substantially outperforms classical planning-based baselines for the first time in a photo-realistic 3D simulator, Habitat. Code and videos can be found at https://sites.google.com/view/maans.
C. Yu, X. Yang, and J. Gao—Equal contribution.
This research is supported by NSFC (U20A20334, U19B2019 and M-0248), Tsinghua-Meituan Joint Institute for Digital Life, Tsinghua EE Independent Research Project, Beijing National Research Center for Information Science and Technology (BNRist), and Beijing Innovation Center for Future Chips.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Anguelov, D., et al.: Google street view: capturing the world at street level. Computer 43(6), 32–38 (2010)
Bhatti, S., Desmaison, A., Miksik, O., Nardelli, N., Siddharth, N., Torr, P.H.: Playing doom with slam-augmented deep reinforcement learning. arXiv preprint arXiv:1612.00380 (2016)
Bresson, G., Alsayed, Z., Yu, L., Glaser, S.: Simultaneous localization and mapping: a survey of current trends in autonomous driving. IEEE Trans. Intell. Veh. 2(3), 194–220 (2017)
Burgard, W., Moors, M., Stachniss, C., Schneider, F.E.: Coordinated multi-robot exploration. IEEE Trans. Rob. 21(3), 376–386 (2005)
Čáp, M., Novák, P., Vokřínek, J., Pěchouček, M.: Multi-agent RRT*: sampling-based cooperative pathfinding. arXiv preprint arXiv:1302.2828 (2013)
Chaplot, D.S., Gandhi, D., Gupta, S., Gupta, A., Salakhutdinov, R.: Learning to explore using active neural slam. In: International Conference on Learning Representations. ICLR (2020)
Chaplot, D.S., Gandhi, D.P., Gupta, A., Salakhutdinov, R.R.: Object goal navigation using goal-oriented semantic exploration. In: Advances in Neural Information Processing Systems, vol. 33 (2020)
Chaplot, D.S., Salakhutdinov, R., Gupta, A., Gupta, S.: Neural topological slam for visual navigation. In: CVPR (2020)
Chen, T., Gupta, S., Gupta, A.: Learning exploration policies for navigation. In: International Conference on Learning Representations. ICLR (2019)
Chu, X., Ye, H.: Parameter sharing deep deterministic policy gradient for cooperative multi-agent reinforcement learning. CoRR abs/1710.00336 (2017)
Cohen, W.W.: Adaptive mapping and navigation by teams of simple robots. Robot. Auton. Syst. 18(4), 411–434 (1996)
Desaraju, V.R., How, J.P.: Decentralized path planning for multi-agent teams in complex environments using rapidly-exploring random trees. In: 2011 IEEE International Conference on Robotics and Automation, pp. 4956–4961. IEEE (2011)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Duan, Y., et al.: One-shot imitation learning. In: NIPS (2017)
Foerster, J.N., Assael, Y.M., De Freitas, N., Whiteson, S.: Learning to communicate with deep multi-agent reinforcement learning. arXiv preprint arXiv:1605.06676 (2016)
Henriques, J.F., Vedaldi, A.: Mapnet: an allocentric spatial memory for mapping environments. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8476–8484 (2018)
Hessel, M., Soyer, H., Espeholt, L., Czarnecki, W., Schmitt, S., van Hasselt, H.: Multi-task deep reinforcement learning with popart. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 3796–3803 (2019)
Hu, J., Niu, H., Carrasco, J., Lennox, B., Arvin, F.: Voronoi-based multi-robot autonomous exploration in unknown environments via deep reinforcement learning. IEEE Trans. Veh. Technol. 69(12), 14413–14423 (2020)
Iqbal, S., Sha, F.: Actor-attention-critic for multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 2961–2970. PMLR (2019)
Iqbal, S., Sha, F.: Coordinated exploration via intrinsic rewards for multi-agent reinforcement learning. arXiv preprint arXiv:1905.12127 (2019)
Isler, S., Sabzevari, R., Delmerico, J., Scaramuzza, D.: An information gain formulation for active volumetric 3D reconstruction. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 3477–3484. IEEE (2016)
Jain, U., et al.: Two body problem: collaborative visual task completion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6689–6699 (2019)
Jiang, J., Dun, C., Huang, T., Lu, Z.: Graph convolutional reinforcement learning. arXiv preprint arXiv:1810.09202 (2018)
Jiang, J., Lu, Z.: Learning attentional communication for multi-agent cooperation. Adv. Neural. Inf. Process. Syst. 31, 7254–7264 (2018)
Juliá, M., Gil, A., Reinoso, O.: A comparison of path planning strategies for autonomous exploration and mapping of unknown environments. Auton. Robot. 33(4), 427–444 (2012)
Kleiner, A., Prediger, J., Nebel, B.: RFID technology-based exploration and slam for search and rescue. In: 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4054–4059. IEEE (2006)
Li, A.Q.: Exploration and mapping with groups of robots: recent trends. Curr. Robot. Rep. 1(4), 1–11 (2020)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Liu, I.J., Jain, U., Yeh, R.A., Schwing, A.: Cooperative exploration for multi-agent deep reinforcement learning. In: International Conference on Machine Learning, pp. 6826–6836. PMLR (2021)
Liu, X., Guo, D., Liu, H., Sun, F.: Multi-agent embodied visual semantic navigation with scene prior knowledge. arXiv preprint arXiv:2109.09531 (2021)
Long, Q., Zhou, Z., Gupta, A., Fang, F., Wu, Y., Wang, X.: Evolutionary population curriculum for scaling multi-agent reinforcement learning. In: International Conference on Learning Representations (2020)
Malysheva, A., Sung, T.T., Sohn, C.B., Kudenko, D., Shpilman, A.: Deep multi-agent reinforcement learning with relevance graphs. arXiv preprint arXiv:1811.12557 (2018)
Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Mousavian, A., Toshev, A., Fišer, M., Košecká, J., Wahid, A., Davidson, J.: Visual representations for semantic target driven navigation. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 8846–8852. IEEE (2019)
Nazif, A.N., Davoodi, A., Pasquier, P.: Multi-agent area coverage using a single query roadmap: a swarm intelligence approach. In: Bai, Q., Fukuta, N. (eds.) Advances in Practical Multi-Agent Systems, pp. 95–112. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16098-1_7
Nguyen, T.T., Nguyen, N.D., Nahavandi, S.: Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications. IEEE Trans. Cybern. 50(9), 3826–3839 (2020)
Parisotto, E., Salakhutdinov, R.: Neural map: structured memory for deep reinforcement learning. In: International Conference on Learning Representations. ICLR (2018)
Patel, S., et al.: Multi-agent ergodic coverage in urban environments (2021)
Peng, P., et al.: Multiagent bidirectionally-coordinated nets for learning to play starcraft combat games. CoRR abs/1703.10069 (2017). https://arxiv.org/abs/1703.10069
Ramakrishnan, S.K., Al-Halah, Z., Grauman, K.: Occupancy anticipation for efficient exploration and navigation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 400–418. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_24
Ramakrishnan, S.K., Jayaraman, D., Grauman, K.: An exploration of embodied visual exploration. Int. J. Comput. Vision 129(5), 1616–1649 (2021)
Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., Whiteson, S.: Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 4295–4304. PMLR (2018)
Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 627–635. JMLR Workshop and Conference Proceedings (2011)
Ryu, H., Shin, H., Park, J.: Multi-agent actor-critic with hierarchical graph attention network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 7236–7243 (2020)
Savinov, N., et al.: Episodic curiosity through reachability. In: International Conference on Learning Representations. ICLR (2019)
Savva, M., et al.: Habitat: a platform for embodied AI research. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9339–9347 (2019)
Sethian, J.A.: A fast marching level set method for monotonically advancing fronts. Proc. Natl. Acad. Sci. 93(4), 1591–1595 (1996)
Sukhbaatar, S., Fergus, R., et al.: Learning multiagent communication with backpropagation. Adv. Neural. Inf. Process. Syst. 29, 2244–2252 (2016)
Sunehag, P., et al.: Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pp. 2085–2087 (2018)
Tagliabue, A., Schneider, S., Pavone, M., Agha-mohammadi, A.: Shapeshifter: a multi-agent, multi-modal robotic platform for exploration of titan. CoRR abs/2002.00515 (2020)
Teh, Y.W., et al.: Distral: robust multitask reinforcement learning. In: NIPS (2017)
Terry, J.K., Grammel, N., Hari, A., Santos, L., Black, B., Manocha, D.: Parameter sharing is surprisingly useful for multi-agent deep reinforcement learning. CoRR abs/2005.13625 (2020)
Umari, H., Mukhopadhyay, S.: Autonomous robotic exploration based on multiple rapidly-exploring randomized trees. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1396–1402 (2017). https://doi.org/10.1109/IROS.2017.8202319
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)
Wakilpoor, C., Martin, P.J., Rebhuhn, C., Vu, A.: Heterogeneous multi-agent reinforcement learning for unknown environment mapping. arXiv preprint arXiv:2010.02663 (2020)
Wang, H., Wang, W., Zhu, X., Dai, J., Wang, L.: Collaborative visual navigation. arXiv preprint arXiv:2107.01151 (2021)
Wang, T., Wang, J., Wu, Y., Zhang, C.: Influence-based multi-agent exploration. In: International Conference on Learning Representations (2020)
Wang, W., et al.: From few to more: large-scale dynamic multiagent curriculum learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 7293–7300 (2020)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Wurm, K.M., Stachniss, C., Burgard, W.: Coordinated multi-robot exploration using a segmentation of the environment. In: 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1160–1165. IEEE (2008)
Xia, F., Zamir, A.R., He, Z., Sax, A., Malik, J., Savarese, S.: Gibson ENV: real-world perception for embodied agents. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9068–9079 (2018)
Yamauchi, B.: A frontier-based approach for autonomous exploration. In: Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA 1997. Towards New Computational Principles for Robotics and Automation, pp. 146–151. IEEE (1997)
Yang, W., Wang, X., Farhadi, A., Gupta, A., Mottaghi, R.: Visual semantic navigation using scene priors. arXiv preprint arXiv:1810.06543 (2018)
Yu, C., Velu, A., Vinitsky, E., Wang, Y., Bayen, A., Wu, Y.: The surprising effectiveness of mappo in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955 (2021)
Yu, J., et al.: SMMR-explore: submap-based multi-robot exploration system with multi-robot multi-target potential field exploration method. In: 2021 IEEE International Conference on Robotics and Automation (ICRA) (2021)
Zambaldi, V., et al.: Relational deep reinforcement learning. arXiv preprint arXiv:1806.01830 (2018)
Zhang, C., Song, D., Huang, C., Swami, A., Chawla, N.V.: Heterogeneous graph neural network. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 793–803 (2019)
Zhang, Y., Hare, J., Prugel-Bennett, A.: Deep set prediction networks. Adv. Neural. Inf. Process. Syst. 32, 3212–3222 (2019)
Zhu, F., et al.: Main: a multi-agent indoor navigation benchmark for cooperative learning (2021)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Yu, C., Yang, X., Gao, J., Yang, H., Wang, Y., Wu, Y. (2022). Learning Efficient Multi-agent Cooperative Visual Exploration. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13699. Springer, Cham. https://doi.org/10.1007/978-3-031-19842-7_29
Download citation
DOI: https://doi.org/10.1007/978-3-031-19842-7_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19841-0
Online ISBN: 978-3-031-19842-7
eBook Packages: Computer ScienceComputer Science (R0)