Who Should Have Been Focused: Transferring Attention-Based Knowledge from Future Observations for Trajectory Prediction

Moon, Seokha; Yeon, Kyuhwan; Kim, Hayoung; Jeong, Seong-Gyun; Kim, Jinkyu

doi:10.1007/978-3-031-78447-7_23

Seokha Moon¹³,
Kyuhwan Yeon¹⁴,
Hayoung Kim¹⁴,
Seong-Gyun Jeong¹⁴ &
…
Jinkyu Kim¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15317))

Included in the following conference series:

International Conference on Pattern Recognition

67 Accesses

Abstract

Accurately predicting the trajectories of dynamic agents is crucial for the safe navigation of autonomous robotics. However, achieving precise predictions based solely on past and current observations is challenging due to the inherent uncertainty in each agent’s intentions, greatly influencing their future trajectory. Furthermore, the lack of precise information about agents’ future poses leads to ambiguity regarding which agents should be focused on for predicting the target agent’s future. To solve this problem, we propose a teacher-student learning approach. Here, the teacher model utilizes actual future poses of other agents to determine which agents should be focused on for the final prediction. This attentional knowledge guides the student model in determining which agents to focus on and how much attention to allocate when predicting future trajectories. Additionally, we introduce a Lane-guided Attention Module (LAM) that considers interactions with local lanes near predicted trajectories to enhance prediction performance. This module is integrated into the student model to refine agent features, thereby facilitating a more accurate emulation of the teacher model. We demonstrate the effectiveness of our proposed model with a large-scale Argoverse motion forecasting dataset, improving overall prediction performance. Our model can be used plug-and-play, showing consistent performance gain. Additionally, it generates more human-intuitive trajectories, e.g., avoiding collisions with other agents, keeping its lane, or considering relations with other agents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.99; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

StretchBEV: Stretching Future Instance Prediction Spatially and Temporally

CAR-Net: Clairvoyant Attentive Recurrent Network

Car First or Pedestrian First? Motion Prediction and Planning in Human-Robot Interactions

References

Beyer, L., Zhai, X., Royer, A., Markeeva, L., Anil, R., Kolesnikov, A.: Knowledge distillation: a good teacher is patient and consistent. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10925–10934 (2022)
Google Scholar
Chai, Y., Sapp, B., Bansal, M., Anguelov, D.: Multipath: multiple probabilistic anchor trajectory hypotheses for behavior prediction. arXiv preprint arXiv:1910.05449 (2019)
Chang, M.F., et al.: Argoverse: 3D tracking and forecasting with rich maps. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8748–8757 (2019)
Google Scholar
Cho, J.H., Hariharan, B.: On the efficacy of knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4794–4802 (2019)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dosovitskiy, A., et al.: An image is worth 16$\times $16 words: transformers for image recognition at scale. In: ICLR (2021)
Google Scholar
Gao, J., et al.: VectorNet: encoding HD maps and agent dynamics from vectorized representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11525–11533 (2020)
Google Scholar
Gilles, T., Sabatini, S., Tsishkou, D., Stanciulescu, B., Moutarde, F.: THOMAS: trajectory heatmap output with learned multi-agent sampling. arXiv preprint arXiv:2110.06607 (2021)
Girgis, R., et al.: Latent variable sequential set transformers for joint multi-agent motion prediction. arXiv preprint arXiv:2104.00563 (2021)
Gou, J., Yu, B., Maybank, S.J., Tao, D.: Knowledge distillation: a survey. Int. J. Comput. Vision 129, 1789–1819 (2021)
Article Google Scholar
Gu, J., Sun, C., Zhao, H.: DenseTNT: end-to-end trajectory prediction from dense goal sets. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15303–15312 (2021)
Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Kim, Y., Rush, A.M.: Sequence-level knowledge distillation. arXiv preprint arXiv:1606.07947 (2016)
Liang, M., Yang, B., Hu, R., Chen, Y., Liao, R., Feng, S., Urtasun, R.: Learning lane graph representations for motion forecasting. In: ECCV 2020, Part II, pp. 541–556. Springer (2020)
Google Scholar
Liu, Y., Zhang, J., Fang, L., Jiang, Q., Zhou, B.: Multimodal motion prediction with stacked transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7577–7586 (2021)
Google Scholar
Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Mirzadeh, S.I., Farajtabar, M., Li, A., Levine, N., Matsukawa, A., Ghasemzadeh, H.: Improved knowledge distillation via teacher assistant. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 5191–5198 (2020)
Google Scholar
Monti, A., Porrello, A., Calderara, S., Coscia, P., Ballan, L., Cucchiara, R.: How many observations are enough? Knowledge distillation for trajectory forecasting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6553–6562 (2022)
Google Scholar
Nayakanti, N., Al-Rfou, R., Zhou, A., Goel, K., Refaat, K.S., Sapp, B.: Wayformer: Motion forecasting via simple & efficient attention networks. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 2980–2987. IEEE (2023)
Google Scholar
Ngiam, J., et al.: Scene transformer: a unified architecture for predicting future trajectories of multiple agents. In: International Conference on Learning Representations (2021)
Google Scholar
Park, W., Kim, D., Lu, Y., Cho, M.: Relational knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3967–3976 (2019)
Google Scholar
Phuong, M., Lampert, C.: Towards understanding knowledge distillation. In: International Conference on Machine Learning, pp. 5142–5151. PMLR (2019)
Google Scholar
Salzmann, T., Ivanovic, B., Chakravarty, P., Pavone, M.: Trajectron++: dynamically-feasible trajectory forecasting with heterogeneous data. In: ECCV 2020, Part XVIII, pp. 683–700. Springer (2020)
Google Scholar
Sheng, Z., Xu, Y., Xue, S., Li, D.: Graph-based spatial-temporal convolutional network for vehicle trajectory prediction in autonomous driving. IEEE Trans. Intell. Transp. Syst. 23(10), 17654–17665 (2022)
Article Google Scholar
Shi, S., Jiang, L., Dai, D., Schiele, B.: MTR-A: 1st place solution for 2022 Waymo open dataset challenge–motion prediction. arXiv preprint arXiv:2209.10033 (2022)
Su, D.A., Douillard, B., Al-Rfou, R., Park, C., Sapp, B.: Narrowing the coordinate-frame gap in behavior prediction models: Distillation for efficient and accurate scene-centric motion forecasting. In: 2022 International Conference on Robotics and Automation (ICRA), pp. 653–659. IEEE (2022)
Google Scholar
Sun, Q., Huang, X., Gu, J., Williams, B.C., Zhao, H.: M2I: from factored marginal trajectory prediction to interactive prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6543–6552 (2022)
Google Scholar
Tang, J., et al.: Understanding and improving knowledge distillation. arXiv preprint arXiv:2002.03532 (2020)
Varadarajan, B., et al.: Multipath++: efficient information fusion and trajectory aggregation for behavior prediction. In: 2022 International Conference on Robotics and Automation (ICRA), pp. 7814–7821. IEEE (2022)
Google Scholar
Wang, M., et al.: GANet: goal area network for motion forecasting. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 1609–1615. IEEE (2023)
Google Scholar
Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., Zhou, M.: MiniLM: deep self-attention distillation for task-agnostic compression of pre-trained transformers. Adv. Neural. Inf. Process. Syst. 33, 5776–5788 (2020)
Google Scholar
Zhang, L., Li, P., Chen, J., Shen, S.: Trajectory prediction with graph-based dual-scale context fusion. In: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 11374–11381. IEEE (2022)
Google Scholar
Zhao, H., et al.: TNT: target-driven trajectory prediction. In: Conference on Robot Learning, pp. 895–904. PMLR (2021)
Google Scholar
Zhou, Z., Ye, L., Wang, J., Wu, K., Lu, K.: HiVT: hierarchical vector transformer for multi-agent motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8823–8833 (2022)
Google Scholar

Download references

Acknowledgement

This work was supported by 42dot. Also, this work was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(NRF-2021R1A6A1A13044830, 15%) and supported by Institute of Information & communications Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIT) (RS-2022-II220043, Adaptive Personality for Intelligent Agents, 15%, IITP-2024-RS-2024-00397085, Leading Generative AI Human Resources Development, 15%).

Author information

Authors and Affiliations

Korea University, 145 Anam-ro, Seoul, 02841, Republic of Korea
Seokha Moon & Jinkyu Kim
42dot, 20 Changeop-ro 40beon-gil, Seongnam-si, Gyeonggi-do, 13449, Republic of Korea
Kyuhwan Yeon, Hayoung Kim & Seong-Gyun Jeong

Authors

Seokha Moon
View author publications
You can also search for this author in PubMed Google Scholar
Kyuhwan Yeon
View author publications
You can also search for this author in PubMed Google Scholar
Hayoung Kim
View author publications
You can also search for this author in PubMed Google Scholar
Seong-Gyun Jeong
View author publications
You can also search for this author in PubMed Google Scholar
Jinkyu Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinkyu Kim .

Editor information

Editors and Affiliations

University of Salford, Salford, Lancashire, UK
Apostolos Antonacopoulos
Indian Institute of Technology Bombay, Mumbai, Maharashtra, India
Subhasis Chaudhuri
Johns Hopkins University, Baltimore, MD, USA
Rama Chellappa
Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
IIT Kharagpur, Kharagpur, West Bengal, India
Saumik Bhattacharya
Indian Statistical Institute Kolkata, Kolkata, West Bengal, India
Umapada Pal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Moon, S., Yeon, K., Kim, H., Jeong, SG., Kim, J. (2025). Who Should Have Been Focused: Transferring Attention-Based Knowledge from Future Observations for Trajectory Prediction. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15317. Springer, Cham. https://doi.org/10.1007/978-3-031-78447-7_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-78447-7_23
Published: 03 December 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78446-0
Online ISBN: 978-3-031-78447-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Who Should Have Been Focused: Transferring Attention-Based Knowledge from Future Observations for Trajectory Prediction

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

StretchBEV: Stretching Future Instance Prediction Spatially and Temporally

CAR-Net: Clairvoyant Attentive Recurrent Network

Car First or Pedestrian First? Motion Prediction and Planning in Human-Robot Interactions

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Who Should Have Been Focused: Transferring Attention-Based Knowledge from Future Observations for Trajectory Prediction

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

StretchBEV: Stretching Future Instance Prediction Spatially and Temporally

CAR-Net: Clairvoyant Attentive Recurrent Network

Car First or Pedestrian First? Motion Prediction and Planning in Human-Robot Interactions

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation