Abstract
In this work, we present RadioTransformer, a novel student-teacher transformer framework, that leverages radiologists’ gaze patterns and models their visuo-cognitive behavior for disease diagnosis on chest radiographs. Domain experts, such as radiologists, rely on visual information for medical image interpretation. On the other hand, deep neural networks have demonstrated significant promise in similar tasks even where visual interpretation is challenging. Eye-gaze tracking has been used to capture the viewing behavior of domain experts, lending insights into the complexity of visual search. However, deep learning frameworks, even those that rely on attention mechanisms, do not leverage this rich domain information for diagnostic purposes. RadioTransformerfills this critical gap by learning from radiologists’ visual search patterns, encoded as ‘human visual attention regions’ in a cascaded global-focal transformer framework. The overall ‘global’ image characteristics and the more detailed ‘local’ features are captured by the proposed global and focal modules, respectively. We experimentally validate the efficacy of RadioTransformeron 8 datasets involving different disease classification tasks where eye-gaze data is not available during the inference phase. Code: https://github.com/bmi-imaginelab/radiotransformer
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)
Bertram, R., et al.: Eye movements of radiologists reflect expertise in CT study interpretation: a potential tool to measure resident development. Radiology 281(3), 805–815 (2016)
Canayaz, M.: MH-COVIDNet: diagnosis of COVID-19 using deep neural networks and meta-heuristic-based feature selection on X-ray images. Biomed. Signal Process. Control 64, 102257 (2021)
Cao, H., et al.: Swin-Unet: Unet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537 (2021)
Cheerla, A., Gevaert, O.: Deep learning with multimodal representation for pancancer prognosis prediction. Bioinformatics 35(14), i446–i454 (2019)
Chen, B., Li, J., Lu, G., Yu, H., Zhang, D.: Label co-occurrence learning with graph convolutional networks for multi-label chest x-ray image classification. IEEE J. Biomed. Health Inform. 24(8), 2292–2302 (2020)
Chen, B., Zhang, Z., Li, Y., Lu, G., Zhang, D.: Multi-label chest x-ray image classification via semantic similarity graph embedding. IEEE Trans. Circuits Syst. Video Technol. 32(4), 2455–2468 (2021)
Chollet, F., et al.: Keras: deep learning library for Theano and TensorFlow. https://keras.io/k 7(8), T1 (2015)
Chowdhury, M.E.H., et al.: Can AI help in screening viral and COVID-19 pneumonia? IEEE Access 8, 132665–132676 (2020). https://doi.org/10.1109/ACCESS.2020.3010287
Christoph, R., Pinz, F.A.: Spatiotemporal residual networks for video action recognition. In: Advances in Neural Information Processing Systems, pp. 3468–3476 (2016)
Clark, K., et al.: The cancer imaging archive (TCIA): maintaining and operating a public information repository. J. Digit. Imaging 26(6), 1045–1057 (2013)
Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Feichtenhofer, C., Fan, H., Malik, J., He, K.: SlowFast networks for video recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6202–6211 (2019)
Fox, S.E., Faulkner-Jones, B.E.: Eye-tracking in the study of visual expertise: methodology and approaches in medicine. Frontline Learn. Res. 5(3), 29–40 (2017)
van der Gijp, A., et al.: How visual search relates to visual diagnostic performance: a narrative systematic review of eye-tracking research in radiology. Adv. Health Sci. Educ. 22(3), 765–787 (2016). https://doi.org/10.1007/s10459-016-9698-1
Goldberger, A.L., et al.: PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101(23), e215–e220 (2000)
Hanna, T.N., et al.: The effects of fatigue from overnight shifts on radiology search patterns and diagnostic performance. J. Am. Coll. Radiol. 15(12), 1709–1716 (2018)
Hassani, A., Walton, S., Shah, N., Abuduweili, A., Li, J., Shi, H.: Escaping the big data paradigm with compact transformers. arXiv preprint arXiv:2104.05704 (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Huang, Y., Cai, M., Li, Z., Lu, F., Sato, Y.: Mutual context network for jointly estimating egocentric gaze and action. IEEE Trans. Image Process. 29, 7795–7806 (2020)
Hussain, E., Hasan, M., Rahman, M.A., Lee, I., Tamanna, T., Parvez, M.Z.: CoroDet: a deep learning based classification for COVID-19 detection using chest X-ray images. Chaos, Solitons Fractals 142, 110495 (2021)
Itti, L.: Quantifying the contribution of low-level saliency to human eye movements in dynamic scenes. Vis. Cogn. 12(6), 1093–1123 (2005)
Itti, L., Baldi, P.: Bayesian surprise attracts human attention. Vision. Res. 49(10), 1295–1306 (2009)
Itti, L., Koch, C.: A saliency-based search mechanism for overt and covert shifts of visual attention. Vision. Res. 40(10–12), 1489–1506 (2000)
Itti, L., Koch, C.: Computational modelling of visual attention. Nat. Rev. Neurosci. 2(3), 194–203 (2001)
Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L.A., Mark IV, R.: MIMIC-IV (version 0.4). PhysioNet (2020)
Johnson, A., Pollard, T., Mark, R., Berkowitz, S., Horng, S.: MIMIC-CXR database. PhysioNet (2019). https://doi.org/10.13026/C2JT1Q
Kar, A., Corcoran, P.: A review and analysis of eye-gaze estimation systems, algorithms and performance evaluation methods in consumer platforms. IEEE Access 5, 16495–16519 (2017)
Karargyris, A., et al.: Eye gaze data for chest x-rays
Karargyris, A., et al.: Creation and validation of a chest x-ray dataset with eye-tracking and report dictation for AI development. Sci. Data 8(1), 1–18 (2021)
Kelly, B.S., Rainford, L.A., Darcy, S.P., Kavanagh, E.C., Toomey, R.J.: The development of expertise in radiology: in chest radiograph interpretation,“expert’’ search pattern may predate “expert’’ levels of diagnostic accuracy for pneumothorax identification. Radiology 280(1), 252–260 (2016)
Kermany, D.S., et al.: Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172(5), 1122–1131 (2018)
Kleinke, C.L.: Gaze and eye contact: a research review. Psychol. Bull. 100(1), 78 (1986)
Konwer, A., et al.: Attention-based multi-scale gated recurrent encoder with novel correlation loss for COVID-19 progression prediction. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12905, pp. 824–833. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87240-3_79
Kundel, H.L., Nodine, C.F., Krupinski, E.A.: Computer-displayed eye position as a visual aid to pulmonary nodule interpretation. Invest. Radiol. 25(8), 890–896 (1990)
Lakhani, P., et al.: The 2021 SIIM-FISABIO-RSNA machine learning COVID-19 challenge: annotation and standard exam classification of COVID-19 chest radiographs (2021)
Lee, A., et al.: Identification of gaze pattern and blind spots by upper gastrointestinal endoscopy using an eye-tracking technique. Surg. Endosc. 36, 2574–2581 (2021). https://doi.org/10.1007/s00464-021-08546-3
Lévêque, L., Bosmans, H., Cockmartin, L., Liu, H.: State of the art: eye-tracking studies in medical imaging. IEEE Access 6, 37023–37034 (2018)
Li, Y., Liu, M., Rehg, J.: In the eye of the beholder: gaze and actions in first person video. IEEE Trans. Pattern Anal. Mach. Intell. (2021)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Litchfield, D., Ball, L.J., Donovan, T., Manning, D.J., Crawford, T.: Viewing another person’s eye movements improves identification of pulmonary nodules in chest x-ray inspection. J. Exp. Psychol. Appl. 16(3), 251 (2010)
Liu, Y., et al.: Goal-oriented gaze estimation for zero-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3794–3803 (2021)
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030 (2021)
Mackworth, N.H., Morandi, A.J.: The gaze selects informative details within pictures. Percept. Psychophys. 2(11), 547–552 (1967)
Mahmud, T., Rahman, M.A., Fattah, S.A.: CovXNet: a multi-dilation convolutional neural network for automatic COVID-19 and other pneumonia detection from chest X-ray images with transferable multi-receptive feature optimization. Comput. Biol. Med. 122, 103869 (2020)
Mall, S., Brennan, P.C., Mello-Thoms, C.: Can a machine learn from radiologists’ visual search behaviour and their interpretation of mammograms–a deep-learning study. J. Digit. Imaging 32(5), 746–760 (2019)
Mathe, S., Sminchisescu, C.: Dynamic eye movement datasets and learnt saliency models for visual action recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7573, pp. 842–856. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33709-3_60
McLaughlin, L., Hughes, C., Bond, R., McConnell, J., Cairns, A., McFadden, S.: The effect of a digital training tool to aid chest image interpretation: hybridising eye tracking technology and a decision support tool. Radiography 27(2), 505–511 (2021)
Min, K., Corso, J.J.: Integrating human gaze into attention for egocentric activity recognition. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1069–1078 (2021)
Mishra, A., Aloimonos, Y., Fah, C.L.: Active segmentation with fixation. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 468–475. IEEE (2009)
Modi, N., Singh, J.: A review of various state of art eye gaze estimation techniques. In: Advances in Computational Intelligence and Communication Technology, pp. 501–510 (2021)
Mondal, A.K., Bhattacharjee, A., Singla, P., Prathosh, A.P.: xViTCOS: explainable vision transformer based COVID-19 screening using radiography. IEEE J. Transl. Eng. Health Med. 10, 1–10 (2021)
Moser, T., Lohmeyer, Q., Meboldt, M., Distler, O., Becker, M.O.: Visual assessment of digital ulcers in systemic sclerosis analysed by eye tracking: implications for wound assessment. Clin. Exp. Rheumatol. 38(3), 137–139 (2020)
Murray, I., Plainis, S.: Contrast coding and magno/parvo segregation revealed in reaction time studies. Vision. Res. 43(25), 2707–2719 (2003)
Nguyen, H.Q., et al.: VinDr-CXR: an open dataset of chest X-rays with radiologist’s annotations. Sci. Data 9, 429 (2022). https://doi.org/10.1038/s41597-022-01498-w
Papadopoulos, D.P., Clarke, A.D.F., Keller, F., Ferrari, V.: Training object class detectors from eye tracking data. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 361–376. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_24
Park, S., et al.: Vision transformer for COVID-19 CXR diagnosis using chest x-ray feature corpus. arXiv preprint arXiv:2103.07055 (2021)
Parkhurst, D., Law, K., Niebur, E.: Modeling the role of salience in the allocation of overt visual attention. Vision. Res. 42(1), 107–123 (2002)
Patra, A., et al.: Efficient ultrasound image analysis models with sonographer gaze assisted distillation. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11767, pp. 394–402. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32251-9_43
Prasanna, P., et al.: Radiographic-deformation and textural heterogeneity (r-DepTH): an integrated descriptor for brain tumor prognosis. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10434, pp. 459–467. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66185-8_52
Rahman, T., et al.: Exploring the effect of image enhancement techniques on COVID-19 detection using chest x-ray images. Comput. Biol. Med. 132, 104319 (2021)
Rajpurkar, P., et al.: Deep learning for chest radiograph diagnosis: a retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Med. 15(11), e1002686 (2018)
Rajpurkar, P., et al.: CheXnet: radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv preprint arXiv:1711.05225 (2017)
Ramanathan, S., Katti, H., Sebe, N., Kankanhalli, M., Chua, T.-S.: An eye fixation database for saliency detection in images. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 30–43. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_3
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 658–666 (2019)
Saltz, J., et al.: Stony brook university COVID-19 positive cases [data set] (2021)
Shapovalova, N., Raptis, M., Sigal, L., Mori, G.: Action is in the eye of the beholder: eye-gaze driven model for spatio-temporal action localization. In: Advances in Neural Information Processing Systems, pp. 2409–2417. Citeseer (2013)
Shih, G., et al.: Augmenting the national institutes of health chest radiograph dataset with expert annotations of possible pneumonia. Radiol. Artif. Intell. 1(1), e180041 (2019)
Stember, J.N., et al.: Eye tracking for deep learning segmentation using convolutional neural networks. J. Digit. Imaging 32(4), 597–604 (2019)
Tatler, B.W., Baddeley, R.J., Vincent, B.T.: The long and the short of it: spatial statistics at fixation vary with saccade amplitude and task. Vision. Res. 46(12), 1857–1862 (2006)
Taylor-Phillips, S., Stinton, C.: Fatigue in radiology: a fertile area for future research. Br. J. Radiol. 92(1099), 20190043 (2019)
Teixeira, V., Braz, L., Pedrini, H., Dias, Z.: DuaLAnet: dual lesion attention network for thoracic disease classification in chest X-rays. In: 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), pp. 69–74. IEEE (2020)
Theeuwes, J.: Stimulus-driven capture and attentional set: selective search for color and visual abrupt onsets. J. Exp. Psychol. Hum. Percept. Perform. 20(4), 799 (1994)
Theeuwes, J., Kramer, A.F., Hahn, S., Irwin, D.E., Zelinsky, G.J.: Influence of attentional capture on oculomotor control. J. Exp. Psychol. Hum. Percept. Perform. 25(6), 1595 (1999)
Tsai, E.B., et al.: Data from medical imaging data resource center (MIDRC) - RSNA international COVID radiology database (RICORD) release 1C - chest X-ray, COVID+ (MIDRC-RICORD-1C). Cancer Imaging Archive 6(7), 13 (2021)
Tsai, E.B., et al.: The RSNA international COVID-19 open radiology database (RICORD). Radiology 299(1), E204–E213 (2021)
Vasudevan, A.B., Dai, D., Van Gool, L.: Object referring in videos with language and human gaze. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4129–4138 (2018)
Vertinsky, T., Forster, B.: Prevalence of eye strain among radiologists: influence of viewing variables on symptoms. Am. J. Roentgenol. 184(2), 681–686 (2005)
Vig, E., Dorr, M., Cox, D.: Space-variant descriptor sampling for action recognition based on saliency and eye movements. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7578, pp. 84–97. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33786-4_7
Waite, S., et al.: Analysis of perceptual expertise in radiology-current knowledge and a new perspective. Front. Hum. Neurosci. 13, 213 (2019)
Waite, S., et al.: Tired in the reading room: the influence of fatigue in radiology. J. Am. Coll. Radiol. 14(2), 191–197 (2017)
Wang, H., Wang, S., Qin, Z., Zhang, Y., Li, R., Xia, Y.: Triple attention learning for classification of 14 thoracic diseases using chest radiography. Med. Image Anal. 67, 101846 (2021)
Wang, L., Lin, Z.Q., Wong, A.: COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images. Sci. Rep. 10(1), 1–12 (2020). https://doi.org/10.1038/s41598-020-76550-z
Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.M.: ChestX-ray8: hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2097–2106 (2017)
Wong, A., et al.: COVID-Net S: towards computer-aided severity assessment via training and validation of deep neural networks for geographic extent and opacity extent scoring of chest X-rays for SARS-CoV-2 lung disease severity. arXiv preprint arXiv:2005.12855 (2020)
Yoonessi, A., Yoonessi, A.: Functional assessment of magno, parvo and konio-cellular pathways; current state and future clinical applications. J. Ophthalmic Vis. Res. 6(2), 119 (2011)
Yoshie, T., et al.: The influence of experience on gazing patterns during endovascular treatment: eye-tracking study. J. Neuroendovascular Ther. oa–2021 (2021)
Yun, K., Peng, Y., Samaras, D., Zelinsky, G.J., Berg, T.L.: Exploring the role of gaze behavior and object detection in scene understanding. Front. Psychol. 4, 917 (2013)
Yun, K., Peng, Y., Samaras, D., Zelinsky, G.J., Berg, T.L.: Studying relationships between human gaze, description, and computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 739–746 (2013)
Zimmermann, J.M., et al.: Quantification of avoidable radiation exposure in interventional fluoroscopy with eye tracking technology. Invest. Radiol. 55(7), 457–462 (2020)
Acknowledgement
The reported research was partly supported by NIH 1R21CA258493-01A1, NIH 75N92020D00021 (subcontract), and the OVPR and IEDM seed grants at Stony Brook University. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bhattacharya, M., Jain, S., Prasanna, P. (2022). RadioTransformer: A Cascaded Global-Focal Transformer for Visual Attention–Guided Disease Classification. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13681. Springer, Cham. https://doi.org/10.1007/978-3-031-19803-8_40
Download citation
DOI: https://doi.org/10.1007/978-3-031-19803-8_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19802-1
Online ISBN: 978-3-031-19803-8
eBook Packages: Computer ScienceComputer Science (R0)