SAT-Morph: Unsupervised Deformable Medical Image Registration Using Vision Foundation Models with Anatomically Aware Text Prompt

Xu, Hao; Xue, Tengfei; Liu, Dongnan; Zhang, Fan; Westin, Carl-Fredrik; Kikinis, Ron; O’Donnell, Lauren J.; Cai, Weidong

doi:10.1007/978-3-031-73471-7_8

Hao Xu¹⁴,
Tengfei Xue^14,15,
Dongnan Liu¹⁴,
Fan Zhang^15,16,
Carl-Fredrik Westin¹⁵,
Ron Kikinis¹⁵,
Lauren J. O’Donnell¹⁵ &
…
Weidong Cai¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15184))

Included in the following conference series:

International Workshop on Foundation Models for General Medical AI

136 Accesses

Abstract

Current unsupervised deformable medical image registration methods rely on image similarity measures. However, these methods are inherently limited by the difficulty of integrating important anatomy knowledge into registration. The development of vision foundation models (e.g., Segment Anything Model (SAM)) has attracted attention for their excellent image segmentation capabilities. Medical-based SAM aligns medical text knowledge with visual knowledge, enabling precise segmentation of organs. In this study, we propose a novel approach that leverages the vision foundation model to enhance medical image registration by integrating anatomical understanding of the vision foundation model into the medical image registration model. Specifically, we propose a novel unsupervised deformable medical image registration framework, called SAT-Morph, which includes Segment Anything with Text prompt (SAT) module and mask registration module. In the SAT module, the medical vision foundation model is utilized to segment anatomical regions within both moving and fixed images according to our designed text prompts. In the mask registration module, we take these segmentation results instead of traditionally used image pairs as the input of the registration model. Compared with utilizing image pairs as input, using segmentation mask pairs incorporates anatomical knowledge and improves the registration performance. Experiments demonstrate that SAT-Morph significantly outperforms existing state-of-the-art methods on both the Abdomen CT and ACDC cardiac MRI datasets. These results illustrate the effectiveness of integrating vision foundation models into medical image registration, showing the potential way for more accurate and anatomically-aware registration. Our code is available at https://github.com/HaoXu0507/SAT-Morph/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Robust Box Prompt Based SAM for Medical Image Segmentation

Multi-prompt Fine-Tuning of Foundation Models for Enhanced Biomedical Image Segmentation

macJNet: weakly-supervised multimodal image deformable registration using joint learning framework and multi-sampling cascaded MIND

Article Open access 19 September 2023

References

Avants, B.B., Epstein, C.L., Grossman, M., Gee, J.C.: Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain. Med. Image Anal. 12(1), 26–41 (2008)
Article Google Scholar
Balakrishnan, G., Zhao, A., Sabuncu, M.R., Guttag, J., Dalca, A.V.: Voxelmorph: A learning framework for deformable medical image registration. IEEE Trans. Med. Imaging 38(8), 1788–1800 (2019)
Article Google Scholar
Beg, M.F., Miller, M.I., Trouvé, A., Younes, L.: Computing large deformation metric mappings via geodesic flows of diffeomorphisms. Int. J. Comput. Vision 61, 139–157 (2005)
Article Google Scholar
Bigalke, A., Hansen, L., Mok, T.C., Heinrich, M.P.: Unsupervised 3d registration through optimization-guided cyclical self-training. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 677–687. Springer (2023)
Google Scholar
Chen, J., Frey, E.C., He, Y., Segars, W.P., Li, Y., Du, Y.: Transmorph: Transformer for unsupervised medical image registration. Med. Image Anal. 82, 102615 (2022)
Article Google Scholar
Chen, Z., Zheng, Y., Gee, J.C.: Transmatch: A transformer-based multilevel dual-stream feature matching network for unsupervised deformable image registration. IEEE Trans. Med. Imaging 43(1), 15–27 (2024)
Article Google Scholar
Chen, Z., Wu, J., Wang, W., Su, W., Chen, G., Xing, S., Zhong, M., Zhang, Q., Zhu, X., Lu, L., et al.: Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 24185–24198 (2024)
Google Scholar
Cheng, J., Ye, J., Deng, Z., Chen, J., Li, T., Wang, H., Su, Y., Huang, Z., Chen, J., Jiang, L., et al.: Sam-med2d. arXiv preprint arXiv:2308.16184 (2023)
Dalca, A.V., Balakrishnan, G., Guttag, J., Sabuncu, M.R.: Unsupervised learning of probabilistic diffeomorphic registration for images and surfaces. Med. Image Anal. 57, 226–236 (2019)
Article Google Scholar
Gu, T., Liu, D., Li, Z., Cai, W.: Complex organ mask guided radiology report generation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 7995–8004 (2024)
Google Scholar
Gu, T., Yang, K., Liu, D., Cai, W.: Lapa: Latent prompt assist model for medical visual question answering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. pp. 4971–4980 (June 2024)
Google Scholar
Heinrich, M.P., Maier, O., Handels, H.: Multi-modal multi-atlas segmentation using discrete optimisation and self-similarities. VISCERAL Challenge@ ISBI 1390, 27 (2015)
Google Scholar
Huang, Y., Yang, X., Liu, L., Zhou, H., Chang, A., Zhou, X., Chen, R., Yu, J., Chen, J., Chen, C., et al.: Segment anything model for medical images? Med. Image Anal. 92, 103061 (2024)
Article Google Scholar
Jin, H., Che, H., Lin, Y., Chen, H.: Promptmrg: Diagnosis-driven prompts for medical report generation. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 38, pp. 2607–2615 (2024)
Google Scholar
Kim, B., Han, I., Ye, J.C.: Diffusemorph: unsupervised deformable image registration using diffusion model. In: European Conference on Computer Vision. pp. 347–364. Springer (2022)
Google Scholar
Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.Y., et al.: Segment anything. arXiv preprint arXiv:2304.02643 (2023)
LeCun, Y., Bengio, Y., et al.: Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks 3361(10), 1995 (1995)
Google Scholar
Li, Z., Tian, L., Mok, T.C., Bai, X., Wang, P., Ge, J., Zhou, J., Lu, L., Ye, X., Yan, K., et al.: Samconvex: Fast discrete optimization for ct registration using self-supervised anatomical embedding and correlation pyramid. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 559–569. Springer (2023)
Google Scholar
Liao, W., Liu, Z., Dai, H., Xu, S., Wu, Z., Zhang, Y., Huang, X., Zhu, D., Cai, H., Li, Q., et al.: Differentiating chatgpt-generated and human-written medical texts: quantitative study. JMIR Medical Education 9(1), e48904 (2023)
Article Google Scholar
Ma, J., He, Y., Li, F., Han, L., You, C., Wang, B.: Segment anything in medical images. Nat. Commun. 15(1), 654 (2024)
Article Google Scholar
Mazurowski, M.A., Dong, H., Gu, H., Yang, J., Konz, N., Zhang, Y.: Segment anything model for medical image analysis: an experimental study. Med. Image Anal. 89, 102918 (2023)
Article Google Scholar
Qin, Y., Li, X.: Fsdiffreg: Feature-wise and score-wise diffusion-guided unsupervised deformable image registration for cardiac images. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 655–665. Springer (2023)
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)
Google Scholar
Wang, H., Guo, S., Ye, J., Deng, Z., Cheng, J., Li, T., Chen, J., Su, Y., Huang, Z., Shen, Y., et al.: Sam-med3d. arXiv preprint arXiv:2310.15161 (2023)
Wang, W., Chen, Z., Chen, X., Wu, J., Zhu, X., Zeng, G., Luo, P., Lu, T., Zhou, J., Qiao, Y., et al.: Visionllm: Large language model is also an open-ended decoder for vision-centric tasks. Advances in Neural Information Processing Systems 36 (2024)
Google Scholar
Xu, J., Lu, L., Peng, X., Pang, J., Ding, J., Yang, L., Song, H., Li, K., Sun, X., Zhang, S., et al.: Data set and benchmark (medgpteval) to evaluate responses from large language models in medicine: Evaluation development and validation. JMIR Med. Inform. 12(1), e57674 (2024)
Article Google Scholar
Zhang, F., Wells, W.M., O’Donnell, L.J.: Deep diffusion mri registration (ddmreg): a deep learning method for diffusion mri registration. IEEE Trans. Med. Imaging 41(6), 1454–1467 (2021)
Article Google Scholar
Zhang, S., Metaxas, D.: On the challenges and perspectives of foundation models for medical image analysis. Medical Image Analysis p. 102996 (2023)
Google Scholar
Zhao, Z., Zhang, Y., Wu, C., Zhang, X., Zhang, Y., Wang, Y., Xie, W.: One model to rule them all: Towards universal segmentation for medical images with text prompts. arXiv preprint arXiv:2312.17183 (2023)

Download references

Author information

Authors and Affiliations

University of Sydney, Sydney, Australia
Hao Xu, Tengfei Xue, Dongnan Liu & Weidong Cai
Harvard Medical School, Boston, USA
Tengfei Xue, Fan Zhang, Carl-Fredrik Westin, Ron Kikinis & Lauren J. O’Donnell
University of Electronic Science and Technology of China, Chengdu, China
Fan Zhang

Authors

Hao Xu
View author publications
You can also search for this author in PubMed Google Scholar
Tengfei Xue
View author publications
You can also search for this author in PubMed Google Scholar
Dongnan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Fan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Carl-Fredrik Westin
View author publications
You can also search for this author in PubMed Google Scholar
Ron Kikinis
View author publications
You can also search for this author in PubMed Google Scholar
Lauren J. O’Donnell
View author publications
You can also search for this author in PubMed Google Scholar
Weidong Cai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weidong Cai .

Editor information

Editors and Affiliations

University of Cambridge, Cambridge, UK
Zhongying Deng
Johns Hopkins University, Baltimore, MD, USA
Yiqing Shen
Korea University, Seoul, Korea (Republic of)
Hyunwoo J. Kim
Korea University, Seoul, Korea (Republic of)
Won-Ki Jeong
University of Cambridge, Cambridge, UK
Angelica I. Aviles-Rivero
Shanghai AI Laboratory, Shanghai, China
Junjun He
Shanghai AI Laboratory, Shanghai, China
Shaoting Zhang

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 146 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, H. et al. (2025). SAT-Morph: Unsupervised Deformable Medical Image Registration Using Vision Foundation Models with Anatomically Aware Text Prompt. In: Deng, Z., et al. Foundation Models for General Medical AI. MedAGI 2024. Lecture Notes in Computer Science, vol 15184. Springer, Cham. https://doi.org/10.1007/978-3-031-73471-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-73471-7_8
Published: 28 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73470-0
Online ISBN: 978-3-031-73471-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

SAT-Morph: Unsupervised Deformable Medical Image Registration Using Vision Foundation Models with Anatomically Aware Text Prompt

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Robust Box Prompt Based SAM for Medical Image Segmentation

Multi-prompt Fine-Tuning of Foundation Models for Enhanced Biomedical Image Segmentation

macJNet: weakly-supervised multimodal image deformable registration using joint learning framework and multi-sampling cascaded MIND

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 146 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

SAT-Morph: Unsupervised Deformable Medical Image Registration Using Vision Foundation Models with Anatomically Aware Text Prompt

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Robust Box Prompt Based SAM for Medical Image Segmentation

Multi-prompt Fine-Tuning of Foundation Models for Enhanced Biomedical Image Segmentation

macJNet: weakly-supervised multimodal image deformable registration using joint learning framework and multi-sampling cascaded MIND

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 146 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation