WsiCaption: Multiple Instance Generation of Pathology Reports for Gigapixel Whole-Slide Images

Chen, Pingyi; Li, Honglin; Zhu, Chenglu; Zheng, Sunyi; Shui, Zhongyi; Yang, Lin

doi:10.1007/978-3-031-72083-3_51

Pingyi Chen^14,15,16,
Honglin Li^14,15,16,
Chenglu Zhu^15,16,
Sunyi Zheng^15,16,
Zhongyi Shui^14,15,16 &
…
Lin Yang^15,16

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15004))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

744 Accesses

Abstract

Whole slide images are the foundation of digital pathology for the diagnosis and treatment of carcinomas. Writing pathology reports is laborious and error-prone for inexperienced pathologists. To reduce the workload and improve clinical automation, we investigate how to generate pathology reports given whole slide images. On the data end, we curated the largest WSI-text dataset (PathText). In specific, we collected nearly 10000 high-quality WSI-text pairs for visual-language models by recognizing and cleaning pathology reports which narrate diagnostic slides in TCGA. On the model end, we propose the multiple instance generative model (MI-Gen) which can produce pathology reports for gigapixel WSIs. We benchmark our model on the largest subset of PathText. Experimental results show our model can generate pathology reports which contain multiple clinical clues and achieve competitive performance on certain slide-level tasks. We observe that simple semantic extraction from the pathology reports can achieve the best performance (0.838 of F1 score) on BRCA subtyping surpassing previous state-of-the-art approaches. Our collected dataset and related code are available at https://github.com/cpystan/Wsi-Caption.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A whole-slide foundation model for digital pathology from real-world data

Article Open access 22 May 2024

Pathology-Knowledge Enhanced Multi-instance Prompt Learning for Few-Shot Whole Slide Image Classification

RepsNet: Combining Vision with Language for Automated Medical Reports

Notes

1.
https://portal.gdc.cancer.gov/.

References

Banerjee, S., Lavie, A.: Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In: Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. pp. 65–72 (2005)
Google Scholar
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot learners. Advances in neural information processing systems 33, 1877–1901 (2020)
Google Scholar
Buckley, J.M., Coopey, S.B., Sharko, J., Polubriaginof, F., Drohan, B., Belli, A.K., Kim, E.M., Garber, J.E., Smith, B.L., Gadd, M.A., et al.: The feasibility of using natural language processing to extract clinical information from breast pathology reports. Journal of pathology informatics 3(1), 23 (2012)
Article Google Scholar
Chan, L., Hosseini, M.S., Rowsell, C., Plataniotis, K.N., Damaskinos, S.: Histosegnet: Semantic segmentation of histological tissue type in whole slide images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 10662–10671 (2019)
Google Scholar
Chen, P., Zhu, C., Shui, Z., Cai, J., Zheng, S., Zhang, S., Yang, L.: Exploring unsupervised cell recognition with prior self-activation maps. In: Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. pp. 559–568. Springer Nature Switzerland, Cham (2023)
Google Scholar
Chen, R.J., Chen, C., Li, Y., Chen, T.Y., Trister, A.D., Krishnan, R.G., Mahmood, F.: Scaling vision transformers to gigapixel images via hierarchical self-supervised learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 16144–16155 (2022)
Google Scholar
Chen, Z., Song, Y., Chang, T.H., Wan, X.: Generating radiology reports via memory-driven transformer. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (Nov 2020)
Google Scholar
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Farahani, N., Parwani, A.V., Pantanowitz, L.: Whole slide imaging in pathology: advantages, limitations, and emerging perspectives. Pathology and Laboratory Medicine International pp. 23–33 (2015)
Google Scholar
Gamper, J., Rajpoot, N.: Multiple instance captioning: Learning representations from histopathology textbooks and articles. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 16549–16559 (2021)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)
Google Scholar
Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T.J., Zou, J.: A visual–language foundation model for pathology image analysis using medical twitter. Nature medicine 29(9), 2307–2316 (2023)
Google Scholar
Ilse, M., Tomczak, J., Welling, M.: Attention-based deep multiple instance learning. In: International conference on machine learning. pp. 2127–2136. PMLR (2018)
Google Scholar
Jing, B., Xie, P., Xing, E.: On the automatic generation of medical imaging reports. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 2577–2586. Association for Computational Linguistics, Melbourne, Australia (Jul 2018). https://doi.org/10.18653/v1/P18-1240, https://aclanthology.org/P18-1240
Kang, M., Song, H., Park, S., Yoo, D., Pereira, S.: Benchmarking self-supervised learning on diverse pathology datasets. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 3344–3354 (2023)
Google Scholar
Li, B., Li, Y., Eliceiri, K.W.: Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 14318–14328 (2021)
Google Scholar
Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: Text summarization branches out. pp. 74–81 (2004)
Google Scholar
Liu, G., Hsu, T.M.H., McDermott, M., Boag, W., Weng, W.H., Szolovits, P., Ghassemi, M.: Clinically accurate chest x-ray report generation. In: Machine Learning for Healthcare Conference. pp. 249–269. PMLR (2019)
Google Scholar
Lu, M.Y., Chen, B., Zhang, A., Williamson, D.F., Chen, R.J., Ding, T., Le, L.P., Chuang, Y.S., Mahmood, F.: Visual language pretrained multiple instance zero-shot transfer for histopathology images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 19764–19775 (2023)
Google Scholar
Lu, M.Y., Williamson, D.F., Chen, T.Y., Chen, R.J., Barbieri, M., Mahmood, F.: Data-efficient and weakly supervised computational pathology on whole-slide images. Nature biomedical engineering 5(6), 555–570 (2021)
Article Google Scholar
Miura, Y., Zhang, Y., Tsai, E.B., Langlotz, C.P., Jurafsky, D.: Improving factual completeness and consistency of image-to-text radiology report generation. arXiv preprint arXiv:2010.10042 (2020)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics. pp. 311–318 (2002)
Google Scholar
Rennie, S.J., Marcheret, E., Mroueh, Y., Ross, J., Goel, V.: Self-critical sequence training for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 7008–7024 (2017)
Google Scholar
Shao, Z., Bian, H., Chen, Y., Wang, Y., Zhang, J., Ji, X., et al.: Transmil: Transformer based correlated multiple instance learning for whole slide image classification. Advances in Neural Information Processing Systems 34, 2136–2147 (2021)
Google Scholar
Smith, R.: An overview of the tesseract ocr engine. In: Ninth international conference on document analysis and recognition (ICDAR 2007). vol. 2, pp. 629–633. IEEE (2007)
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)
Google Scholar
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: A neural image caption generator. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 3156–3164 (2015)
Google Scholar
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: Neural image caption generation with visual attention. In: International conference on machine learning. pp. 2048–2057. PMLR (2015)
Google Scholar
Zhang, H., Meng, Y., Zhao, Y., Qiao, Y., Yang, X., Coupland, S.E., Zheng, Y.: Dtfd-mil: Double-tier feature distillation multiple instance learning for histopathology whole slide image classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 18802–18812 (2022)
Google Scholar
Zhang, Z., Chen, P., McGough, M., Xing, F., Wang, C., Bui, M., Xie, Y., Sapkota, M., Cui, L., Dhillon, J., et al.: Pathologist-level interpretable whole-slide cancer diagnosis with deep learning. Nature Machine Intelligence 1(5), 236–245 (2019)
Article Google Scholar

Download references

Acknowledgement

This study was partially supported by the National Natural Science Foundation of China (Grant no. 92270108), Zhejiang Provincial Natural Science Foundation of China (Grant no. XHD23F0201), and the Research Center for Industries of the Future (RCIF) at Westlake University.

Author information

Authors and Affiliations

Zhejiang University, Hangzhou, China
Pingyi Chen, Honglin Li & Zhongyi Shui
Research Center for Industries of the Future, Westlake University, Hangzhou, China
Pingyi Chen, Honglin Li, Chenglu Zhu, Sunyi Zheng, Zhongyi Shui & Lin Yang
School of Engineering, Westlake University, Hangzhou, China
Pingyi Chen, Honglin Li, Chenglu Zhu, Sunyi Zheng, Zhongyi Shui & Lin Yang

Authors

Pingyi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Honglin Li
View author publications
You can also search for this author in PubMed Google Scholar
Chenglu Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Sunyi Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Zhongyi Shui
View author publications
You can also search for this author in PubMed Google Scholar
Lin Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lin Yang .

Editor information

Editors and Affiliations

Children’s National Hospital/George Washington University, Washington, DC, USA
Marius George Linguraru
The Chinese University of Hong Kong, Hong Kong, China
Qi Dou
Technical University of Denmark, Kgs Lyngby, Denmark
Aasa Feragen
Imperial College London, London, UK
Stamatia Giannarou
Imperial College London, London, UK
Ben Glocker
Universitat de Barcelona, Barcelona, Spain
Karim Lekadir
Helmholtz Munich, Technical University of Munich and King’s College London, Munich, Germany
Julia A. Schnabel

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 336 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, P., Li, H., Zhu, C., Zheng, S., Shui, Z., Yang, L. (2024). WsiCaption: Multiple Instance Generation of Pathology Reports for Gigapixel Whole-Slide Images. In: Linguraru, M.G., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15004. Springer, Cham. https://doi.org/10.1007/978-3-031-72083-3_51

Download citation

DOI: https://doi.org/10.1007/978-3-031-72083-3_51
Published: 14 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72082-6
Online ISBN: 978-3-031-72083-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

WsiCaption: Multiple Instance Generation of Pathology Reports for Gigapixel Whole-Slide Images

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A whole-slide foundation model for digital pathology from real-world data

Pathology-Knowledge Enhanced Multi-instance Prompt Learning for Few-Shot Whole Slide Image Classification

RepsNet: Combining Vision with Language for Automated Medical Reports

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

1 Electronic supplementary material

Supplementary material 1 (pdf 336 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

WsiCaption: Multiple Instance Generation of Pathology Reports for Gigapixel Whole-Slide Images

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A whole-slide foundation model for digital pathology from real-world data

Pathology-Knowledge Enhanced Multi-instance Prompt Learning for Few-Shot Whole Slide Image Classification

RepsNet: Combining Vision with Language for Automated Medical Reports

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

1 Electronic supplementary material

Supplementary material 1 (pdf 336 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation