Classification of Shoulder X-ray Images with Deep Learning Ensemble Models
Abstract
:1. Introduction
- The most suitable model for the classification of shoulder bone X-ray images as a fracture or non-fracture is determined.
- An approach that can be used in similar studies is developed via new ensemble learning models.
- The study can assist physicians who are not experts in the field in the classification stage, especially in cases of shoulder fractures, which are frequently encountered in the emergency departments of hospitals.
- The method suggested in the study contributes to the literature with two different ensemble approaches.
- With the first model proposed in the study, a performance study was conducted with transfer learning for the MURA dataset, which is widely used in X-ray studies. Thus, three models that give the best classification results have been combined into a single model, and the classification performance has been increased. The developed model can be used to classify many medical X-ray images.
- With the second model, it is determined which model finds which class best by looking at the success of finding the classes on the models in the dataset. It is ensured that the model decides the prediction of that class. Thus, regardless of the dataset studied, a similar decision system can be designed with models that find the classes in a given dataset, and a higher performance can be achieved compared to a single model.
- The proposed ensemble model approaches can be applied and generalized by performing similar preprocessing steps in other X-ray biomedical datasets. In addition, the proposed method can be easily used in studies with transfer learning.
2. Related Works
- what the studies conducted using the MURA dataset are, and why this dataset is preferred in this study;
- what kinds of studies have been conducted on the classification of shoulder bone images, and what kinds of innovations may be set forth based on the efficiency of deep learning models used in the classification procedures carried out in this study.
2.1. Studies Conducted Using the MURA Dataset
2.2. Classification Studies Carried Out on the Shoulder Bone
3. Methods
3.1. Classification Models Based on CNNs for Shoulder Bone X-ray Images
3.1.1. ResNet
3.1.2. DenseNet
3.1.3. VGG
3.1.4. InceptionV3
3.1.5. MobileNetV2
3.2. SpinalNet
3.3. Proposed Classification Models Based on EL for Shoulder Bone X-ray Images
3.3.1. EL1 (ResNext50 with Spinal FC, DenseNet169 with Standard FC, and DenseNet201 with Spinal FC)
- Step 1: The last layers of the three pre-trained sub-models in the EL1 ensemble model are adjusted as the Identity layer.
- Step 2: A final layer with 80 outputs for ResNext50 with Spinal FC, 1664 for DenseNet169 with Standard FC, and 960 for DenseNet201 with Spinal FC is achieved.
- Step 3: These outputs are combined to form a single linear layer and a hidden layer with 2704 outputs.
- Step 4: This hidden layer is connected to the classifying layer connected to the sigmoid activation function, the output of which is 2.
- Step 5: The network established as a result of these procedures is re-trained, providing results.
3.3.2. EL2 (ResNet34 with Spinal FC, DenseNet169 with Standard FC, DenseNet201 with Spinal FC, and ResNext50 with Spinal FC)
- Part 1: In the “Input” section, there is a shoulder X-ray image dataset that has been subjected to certain image processing techniques.
- Part 2: In the “Classification Network” section, the models that establish our ensemble model are defined. There are three single sub-models and a sub-ensemble model that constitute our ensemble model therein. Our sub-ensemble model is an architecture trained by connecting the predicted outputs of ResNet34 with Spinal FC, DenseNet201, ResNeXt50, and DenseNet169 with Standard FC to a linear layer with eight inputs and two outputs. The evaluation of 26 models as single was effective in the selection of these four models.
- Part 3: In the “Prediction” sections, there are normal/abnormal (fracture) class type outputs achieved as a result of the classification performed in the previous section.
- Part 4: In the “Main Check” section, there is a main check mechanism that plays a role in determining the class of the input image. Therein, Models I and II suggest classifying the input image as abnormal in the final classification, while Models III and IV suggest classifying the input image as normal in the final classification. In cases where suggestions are not available, the classification is carried out with the sub-ensemble model. In the selection of the referred models for each class, Confusion Matrix and Recall parameters previously obtained for the 26 CNN models were taken into consideration.
- Part 5: In the “Sub Check” section, there is a supplementary check mechanism under the main check mechanism. The aim here is to use the classification result of a model (Model IV for Class 1 (abnormal) and Model II for Class 0 (normal)) other than the two models referred to in the main check section as a supplement for the final classification process.
- Part 6: In the “Final Prediction” section, the final output determined as a result of the check mechanisms is achieved.
Algorithm 1 EL2 |
Input: Shoulder bone X-ray images Dataset = test_dataset |
Process: |
for image in test_dataset: |
pred_1 = Model_I(image) |
pred_2 = Model_II(image) |
pred_3 = Model_III(image) |
pred_4 = Model_IV(image) |
if(pred_1 and pred_2==1): |
if pred_4==1: |
final_pred = 1: |
if pred_4==0: |
final_pred = 0: |
elif(pred_3 and pred_4==0): |
if pred_2==1: |
final_pred = 1: |
if pred_2==0: |
final_pred = 0: |
else: |
final_pred = pred_3 |
Output: final_pred |
4. Experiments
4.1. Dataset of Shoulder Bone X-ray Images
4.2. Data Augmentation for Shoulder Bone X-ray Images
4.3. Data Pre-Processing for Shoulder Bone X-ray Images
- Detection of the Corresponding Area: Most of the X-ray images in the used dataset were insufficient in terms of semantic information in relation to the image size. In order to eliminate such insufficiency, the images were first converted to gray-scale and then subjected to double thresholding and to an adaptive threshold value determined using Otsu’s thresholding value method. In the gray-scale images, the within-class variance value corresponding to all possible threshold values for the two color classes assumed as background and foreground was calculated. The threshold value that made this variance the smallest was the optimal threshold value. This method is known as Otsu’s thresholding value method [35]. Subsequently, the edge in the thresholded image was determined using edge detection methods. After this process, the original image was cropped based on the calculated values.
- CLAHE Transformation: In the next step, the contrast-limited adaptive histogram equalization (CLAHE) transformation in the OpenCV library was used. In the transformation used, the input image was divided into parts by the user, with each part containing a histogram within itself. The histogram of each part was then adjusted based on the histogram cropping limit entered by the user, and all parts were finally brought together to obtain a Clahe-transformed version of the input image [36,37]. New outputs were achieved by contrast equalization of the cropped images by the CLAHE method.
- Normalization and Standardization: In the last step, the images were normalized and standardized using the image-net values.
4.4. Classification Results
4.4.1. Evaluation Metrics
4.4.2. Classification Results of 13 CNN-Based Deep Learning Models with Standard FC/Spinal FC
4.4.3. Classification Results of Our Ensemble Models
- ResNeXt50 with Spinal FC was selected because the AUC score achieved by detecting fracture images is the second highest, with 0.8783, after DenseNet169 with Standard FC.
- DenseNet169 with Standard FC was selected because its classification had the highest test accuracy, Cohen’s kappa score, and AUC score among all models used in this study.
- DenseNet201 with Spinal FC was selected because the test accuracy and Cohen’s kappa score were the second highest after DenseNet169 with Standard FC among all models and the highest among models with Spinal FC.
5. Conclusions and Future Work
- In similar studies, binary classification is mostly performed. However, while there are mainly two classes (normal/abnormal) in this study, differently from the literature, multi-class classification was carried out, in order to determine the most compatible models to be used in ensemble models, developed by evaluating the outputs of each class of the 26 classification models initially used. This allowed the best results in this study to be achieved with ensemble models.
- This is the first time that Spinal FC, which, compared to the Standard FC, has a lower number of weights in the hidden layer, was used in many models (Inception, ResNeXt, and MobileNet). Moreover, SpinalNet was used on medical images for the first time, and it had a positive effect on more than half of the classification results.
- A unique structure is introduced, since the reliability of the detection of classes was used as a basis when designing the EL2 model, which further improves classification results.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Shoulder Fracture 2014. Available online: https://www.assh.org/handcare/condition/shoulder-fracture (accessed on 1 September 2020).
- Rajpurkar, P.; Irvin, J.; Bagul, A.; Ding, D.; Duan, T.; Mehta, H.; Yang, B.; Zhu, K.; Laird, D.; Ball, R.L.; et al. MURA Dataset: Towards Radiologist-Level Abnormality Detection in Musculoskeletal Radiographs. In Proceedings of the 1st Conference on Medical Imaging with Deep Learning, Amsterdam, The Netherlands, 4–6 June 2018. [Google Scholar]
- Guan, B.; Zhang, G.; Yao, J.; Wang, X.; Wang, M. Arm fracture detection in X-rays based on improved deep convolutional neural network. Comput. Electr. Eng. 2020, 81, 1–11. [Google Scholar] [CrossRef]
- Galal, A.; Hisham, F.; Mohamed, M.; Hassan, S.; Ghanim, T.; Nabil, A. Automatic Recognition of Elbow Musculoskeletal Disorders using Cloud Application. In Proceedings of the 2019 8th International Conference on Software and Information Engineering, Cairo, Egypt, 9–12 April 2019. [Google Scholar]
- Liang, S.; Gu, Y. Towards Robust and Accurate Detection of Abnormalities in Musculoskeletal Radiographs with a Multi-Network Model. Sensors 2020, 20, 3153. [Google Scholar] [CrossRef] [PubMed]
- Saif, A.F.M.; Shahnaz, C.; Zhu, W.P.; Ahmad, M.O. Abnormality Detection in Musculoskeletal Radiographs Using Capsule Network. IEEE Access 2019, 7, 81494–81503. [Google Scholar] [CrossRef]
- Cheng, K.; Iriondo, C.; Calivá, F.; Krogue, J.; Majumdar, S.; Pedoia, V. Adversarial Policy Gradient for Deep Learning Image Augmentation. In Proceedings of the 22nd International Conference on Medical Image Computing and Computer Assisted Intervention, Shenzhen, China, 13–17 October 2019. [Google Scholar]
- Pelka, O.; Nensa, F.; Friedrich, C.M. Branding-Fusion of Meta Data and Musculoskeletal Radiographs for Multi-modal Diagnostic Recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop, Seoul, Korea, 27–28 October 2019. [Google Scholar]
- Varma, M.; Lu, M.; Gardner, R.; Dunnmon, J.; Khandwala, N.; Rajpurkar, P.; Long, J.; Beaulieu, C.; Shpanskaya, K.; Fei-Fei, L.; et al. Automated abnormality detection in lower extremity radiographs using deep learning. Nat. Mach. Intell. 2019, 1, 578–583. [Google Scholar] [CrossRef]
- Harini, N.; Ramji, B.; Sriram, S.; Sowmya, V.; Soman, K.P. Musculoskeletal radiographs classification using deep learning. In Deep Learning for Data Analytics: Foundations, Biomedical Applications and Challenges, 1st ed.; Das, H., Pradhan, C., Dey, N., Eds.; Academic Press: London, UK, 2020; pp. 79–98. [Google Scholar]
- Fang, L.; Jin, Y.; Huang, L.; Guo, S.; Zhao, G.; Chen, X. Iterative fusion convolutional neural networks for classification of optical coherence tomography images. J. Vis. Commun. Image Represent. 2019, 59, 327–333. [Google Scholar] [CrossRef]
- Mondol, T.C.; Iqbal, H.; Hashem, M. Deep CNN-Based Ensemble CADx Model for Musculoskeletal Abnormality Detection from Radiographs. In Proceedings of the 2019 5th International Conference on Advances in Electrical Engineering, Dhaka, Bangladesh, 26–28 September 2019. [Google Scholar]
- Pradhan, N.; Dhaka, V.S.; Chaudhary, H. Classification of Human Bones Using Deep Convolutional Neural Network. In Proceedings of the IOP Conference Series: Materials Science and Engineering, International Conference on Startup Ventures: Technology Developments and Future Strategies, Rajasthan, India, 8–9 October 2019. [Google Scholar]
- Shao, Y.; Wang, X. A Two Stage Method for Abnormality Diagnosis of Musculoskeletal Radiographs. In Proceedings of the International Conference on Pattern Recognition and Artificial Intelligence, Zhongshan, China, 19–23 October 2020. [Google Scholar]
- Chung, S.W.; Han, S.S.; Lee, J.W.; Oh, K.S.; Kim, N.R.; Yoon, J.P.; Kim, J.Y.; Moon, S.H.; Kwon, J.; Lee, H.J.; et al. Automated detection and classification of the proximal humerus fracture by using deep learning algorithm. Acta Orthop. 2018, 89, 468–473. [Google Scholar] [CrossRef] [Green Version]
- Sezer, A.; Sezer, H.B. Convolutional neural network based diagnosis of bone pathologies of proximal humerus. Neurocomputing 2020, 392, 124–131. [Google Scholar] [CrossRef]
- Urban, G.; Porhemmat, S.; Stark, M.; Feeley, B.; Okada, K.; Baldi, P. Classifying shoulder implants in X-ray images using deep learning. Comput. Struct. Biotechnol. J. 2020, 18, 967–972. [Google Scholar] [CrossRef]
- Sezer, A.; Sigirci, I.O.; Sezer, H.B. Shoulder lesion classification using shape and texture features via composite kernel. In Proceedings of the 25th Signal Processing and Communications Applications Conference (SIU), Antalya, Turkey, 15–18 May 2017. [Google Scholar]
- Sezer, A.; Sezer, H.B. Capsule network-based classification of rotator cuffpathologies from MRI. Comput. Electr. Eng. 2019, 80, 106480. [Google Scholar] [CrossRef]
- Khan, M.A.; Kim, Y. Cardiac Arrhythmia Disease Classification Using LSTM Deep Learning Approach. Comput. Mater. Contin. 2021, 67, 427–443. [Google Scholar] [CrossRef]
- Storey, O.; Wei, B.; Zhang, L.; Mtope, F.R.F. Adaptive bone abnormality detection in medical imagery using deep neural networks. In Proceedings of the 14th International FLINS Conference, Cologne, Germany, 18–21 August 2020. [Google Scholar]
- Yin, S.; Peng, Q.; Li, H.; Zhang, Z.; You, X.; Liu, H.; Fischer, K.; Furth, S.L.; Tasian, G.E.; Fan, Y. Multi-instance Deep Learning with Graph Convolutional Neural Networks for Diagnosis of Kidney Diseases Using Ultrasound Imaging. In Proceedings of the International Workshop on Uncertainty for Safe Utilization of Machine Learning in Medical Imaging, Lima, Peru, 17 October 2019. [Google Scholar]
- Dias, D.D.A. Musculoskeletal Abnormality Detection on X-ray Using Transfer Learning. Master’s Thesis, Pompeu Fabra University, Barcelona, Spain, July 2019. [Google Scholar]
- Khan, M.A.; Kim, J. Toward Developing Efficient Conv-AE-Based Intrusion Detection System Using Heterogeneous Dataset. Electronics 2020, 9, 1771. [Google Scholar] [CrossRef]
- Kegelman, C.D.; Nijsure, M.P.; Moharrer, Y.; Pearson, H.B.; Dawahare, J.H.; Jordan, K.M.; Qin, L.; Boerckel, J.D. YAP and TAZ Promote Periosteal Osteoblast Precursor Expansion and Differentiation for Fracture Repair. J. Bone Miner. Res. 2020, 36, 143–157. [Google Scholar] [CrossRef] [PubMed]
- Sharma, A.; Mishra, A.; Bansal, A.; Bansal, A. Bone Fractured Detection Using Machine Learning and Digital Geometry. In Mobile Radio Communications and 5G Networks, Lecture Notes in Networks and Systems, 1st ed.; Marriwala, N., Tripathi, C.C., Kumar, D., Jain, S., Eds.; Springer: Singapore, 2021; Volume 140, pp. 369–376. [Google Scholar]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Kabir, H.M.D.; Abdar, M.; Jalali, S.M.J.; Khosravi, A.; Atiya, A.F.; Nahavandi, S.; Srinivasan, D. SpinalNet: Deep neural networkwith gradual input. arXiv 2020, arXiv:2007.03347. [Google Scholar]
- Otsu, N. A Threshold Selection Method from Gray-Level Histograms. IEEE Trans. Syst. Man Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef] [Green Version]
- Pizer, S.M.; Johnston, R.E.; Ericksen, J.P.; Yankaskas, B.C.; Muller, K.E. Contrast-limited adaptive histogram equalization: Speed and effectiveness. In Proceedings of the First Conference on Visualization in Biomedical Computing, Atlanta, GA, USA, 22–25 May 1990. [Google Scholar]
- Zuiderveld, K. Contrast limited adaptive histogram equalization. In Graphics Gems IV; Heckbert, P.S., Ed.; Academic Press Professional: San Diego, CA, USA, 1994; pp. 474–485. [Google Scholar]
Models | Spinal FC Layer Width | Models | Spinal FC Layer Width |
---|---|---|---|
ResNet34 | 256 | ResNeXt50 | 20 |
ResNet50 | 128 | ResNeXt101 | 128 |
ResNet101,152 | 1024 | DenseNet169, 201 | 240 |
VGG13 | 256 | MobileNetV2 | 320 |
VGG16,19 | 512 | InceptionV3 | 20 |
Shoulder Bone X-ray Images | Image Types | Train Dataset | Test Dataset | Org. Image Size | New Image Size |
---|---|---|---|---|---|
Class 0: Normal (Negative) Bone X-ray Images | png, 3-ch. | 4211 | 285 | various | 320 × 320 × 3 |
Class 1: Abnormal (Positive) Bone X-ray Images | png, 3-ch. | 4168 | 278 | various | 320 × 320 × 3 |
Total | png, 3-ch. | 8379 | 563 | various | 320 × 320 × 3 |
Label Value | Prediction Value | |
---|---|---|
TP | positive | positive |
TN | negative | negative |
FP | positive | negative |
FN | negative | positive |
Predicted Class Negative | Predicted Class Positive | |
---|---|---|
Actual Class: Negative | TN | FP |
Actual Class: Positive | FN | TP |
P0 | (TP + TN)/(TP + TN + FP + FN) |
Ppositive | (TP + FP)(TP + FN)/(TP + TN + FP + FN)2 |
Pnegative | (FN + TN)(FP + TN)/(TP + TN + FP + FN)2 |
Pe | Ppositive + Pnegative |
Confusion Matrix | TP, TN, FP, FN |
Training/Testing Accuracy | (TP + TN)/(TP + TN + FP + FN) |
Precision | TP/(TP + FP) |
Recall | TP/(TP + FN) |
F1-score | 2TP/(2TP + FP + FN) |
Cohen’s kappa score | (p0 − pe)/(1 − pe) |
ROC curve | TP rate—FP rate change |
AUC scores | Area under the ROC curve |
Models | Standart FC | Spinal Net | Models | Standart FC | Spinal Net |
---|---|---|---|---|---|
DenseNet169 | 0.8883 | 0.8666 | ResNet101 | 0.8419 | 0.8225 |
DenseNet201 | 0.8934 | 0.8694 | ResNet152 | 0.845 | 0.8346 |
InceptionV3 | 0.8882 | 0.8914 | ResNeXt50 | 0.8707 | 0.8654 |
MobileNetV2 | 0.8647 | 0.8845 | ResNeXt101 | 0.8539 | 0.8303 |
ResNet34 | 0.8673 | 0.8517 | VGG13 | 0.9389 | 0.8507 |
ResNet50 | 0.8489 | 0.8395 | VGG16 | 0.8698 | 0.812 |
VGG19 | 0.9055 | 0.8011 |
Models | Standart FC | Spinal Net | Models | Standart FC | Spinal Net |
---|---|---|---|---|---|
DenseNet169 | 0.8419 | 0.8152 | ResNet101 | 0.817 | 0.8188 |
DenseNet201 | 0.8206 | 0.8294 | ResNet152 | 0.8117 | 0.817 |
InceptionV3 | 0.8259 | 0.817 | ResNeXt50 | 0.817 | 0.8241 |
MobileNetV2 | 0.8241 | 0.8099 | ResNeXt101 | 0.8206 | 0.8082 |
ResNet34 | 0.8188 | 0.8206 | VGG13 | 0.7797 | 0.8223 |
ResNet50 | 0.8188 | 0.8081 | VGG16 | 0.785 | 0.801 |
VGG19 | 0.785 | 0.8046 |
Models | Standart FC | Spinal Net | Models | Standart FC | Spinal Net |
---|---|---|---|---|---|
DenseNet169 | 0.845 | 0.815 | ResNet101 | 0.815 | 0.82 |
DenseNet201 | 0.82 | 0.83 | ResNet152 | 0.815 | 0.82 |
InceptionV3 | 0.825 | 0.82 | ResNeXt50 | 0.82 | 0.825 |
MobileNetV2 | 0.83 | 0.81 | ResNeXt101 | 0.82 | 0.815 |
ResNet34 | 0.825 | 0.82 | VGG13 | 0.8 | 0.825 |
ResNet50 | 0.815 | 0.81 | VGG16 | 0.79 | 0.805 |
VGG19 | 0.785 | 0.81 |
Models | Standart FC | Spinal Net | Models | Standart FC | Spinal Net |
---|---|---|---|---|---|
DenseNet169 | 0.84 | 0.815 | ResNet101 | 0.815 | 0.82 |
DenseNet201 | 0.815 | 0.83 | ResNet152 | 0.81 | 0.815 |
InceptionV3 | 0.825 | 0.815 | ResNeXt50 | 0.815 | 0.825 |
MobileNetV2 | 0.82 | 0.81 | ResNeXt101 | 0.815 | 0.815 |
ResNet34 | 0.815 | 0.82 | VGG13 | 0.78 | 0.825 |
ResNet50 | 0.82 | 0.81 | VGG16 | 0.785 | 0.8 |
VGG19 | 0.785 | 0.805 |
Models | Standart FC | Spinal Net | Models | Standart FC | Spinal Net |
---|---|---|---|---|---|
DenseNet169 | 0.84 | 0.815 | ResNet101 | 0.82 | 0.82 |
DenseNet201 | 0.82 | 0.83 | ResNet152 | 0.81 | 0.815 |
InceptionV3 | 0.825 | 0.815 | ResNeXt50 | 0.815 | 0.825 |
MobileNetV2 | 0.82 | 0.81 | ResNeXt101 | 0.82 | 0.815 |
ResNet34 | 0.82 | 0.82 | VGG13 | 0.775 | 0.82 |
ResNet50 | 0.815 | 0.805 | VGG16 | 0.785 | 0.8 |
VGG19 | 0.785 | 0.805 |
Models | Standart FC | Spinal Net | Models | Standart FC | Spinal Net |
---|---|---|---|---|---|
DenseNet169 | 0.6834 | 0.6302 | ResNet101 | 0.634 | 0.6372 |
DenseNet201 | 0.641 | 0.6588 | ResNet152 | 0.6231 | 0.6332 |
InceptionV3 | 0.6514 | 0.6338 | ResNeXt50 | 0.6338 | 0.648 |
MobileNetV2 | 0.6478 | 0.6195 | ResNeXt101 | 0.641 | 0.6267 |
ResNet34 | 0.6372 | 0.6411 | VGG13 | 0.558 | 0.6442 |
ResNet50 | 0.6375 | 0.6161 | VGG16 | 0.5695 | 0.6014 |
VGG19 | 0.5698 | 0.6085 |
Models | TP | FP | FN | TN | Class0: AUC | Class1: AUC |
---|---|---|---|---|---|---|
DenseNet169a | 222 | 33 | 56 | 252 | 0.8809 | 0.8797 |
DenseNet169b | 218 | 44 | 60 | 241 | 0.8602 | 0.8598 |
DenseNet201a | 225 | 48 | 53 | 237 | 0.8584 | 0.8653 |
DenseNet201b | 228 | 46 | 50 | 239 | 0.8727 | 0.8724 |
InceptionV3a | 218 | 38 | 60 | 247 | 0.8754 | 0.8707 |
InceptionV3b | 220 | 45 | 58 | 240 | 0.8582 | 0.8585 |
MobileNetV2a | 215 | 36 | 63 | 249 | 0.8777 | 0.8378 |
MobileNetV2b | 216 | 45 | 62 | 240 | 0.8633 | 0.861 |
ResNet34a | 215 | 39 | 63 | 246 | 0.8705 | 0.8767 |
ResNet34b | 228 | 51 | 50 | 234 | 0.8617 | 0.8619 |
ResNet50a | 224 | 48 | 54 | 237 | 0.8715 | 0.8662 |
ResNet50b | 219 | 49 | 59 | 236 | 0.8588 | 0.8584 |
ResNet101a | 228 | 53 | 50 | 232 | 0.8683 | 0.8703 |
ResNet101b | 216 | 40 | 62 | 245 | 0.8609 | 0.861 |
ResNet152a | 217 | 45 | 61 | 240 | 0.8648 | 0.8701 |
ResNet152b | 220 | 45 | 58 | 240 | 0.8597 | 0.8606 |
ResNeXt50a | 221 | 46 | 57 | 239 | 0.8644 | 0.8699 |
ResNeXt50b | 223 | 44 | 55 | 241 | 0.8789 | 0.8783 |
ResNeXt101a | 225 | 48 | 53 | 237 | 0.8772 | 0.8765 |
ResNeXt101b | 219 | 46 | 59 | 239 | 0.8652 | 0.8561 |
VGG13a | 181 | 27 | 97 | 258 | 0.8406 | 0.8415 |
VGG13b | 231 | 35 | 65 | 250 | 0.8705 | 0.8737 |
VGG16a | 203 | 46 | 75 | 239 | 0.8517 | 0.8523 |
VGG16b | 204 | 38 | 74 | 247 | 0.8542 | 0.857 |
VGG19a | 212 | 55 | 66 | 230 | 0.8374 | 0.8502 |
VGG19b | 205 | 37 | 73 | 248 | 0.858 | 0.8539 |
Models | Test Acc. | Pre. | Recall | F1-Score | Cohen’s Kappa |
---|---|---|---|---|---|
ResNeXt50b | 0.8241 | 0.825 | 0.825 | 0.825 | 0.648 |
DenseNet169a | 0.8419 | 0.845 | 0.84 | 0.84 | 0.6834 |
DenseNet201b | 0.8294 | 0.83 | 0.83 | 0.83 | 0.6588 |
EL1 | 0.8455 | 0.8631 | 0.8165 | 0.8455 | 0.6907 |
Models | Test Acc. | Pre. | Recall | F1-score | Cohen’s Kappa |
---|---|---|---|---|---|
ResNet34b | 0.8206 | 0.82 | 0.82 | 0.82 | 0.6411 |
DenseNet169a | 0.8419 | 0.815 | 0.84 | 0.84 | 0.6834 |
DenseNet201b | 0.8294 | 0.83 | 0.83 | 0.83 | 0.6588 |
SubEnsemble | 0.8401 | 0.84 | 0.84 | 0.84 | 0.6799 |
EL2 | 0.8472 | 0.85 | 0.845 | 0.845 | 0.6942 |
Studies | Dataset | Best Method | Classification Type and Test Accuracy |
---|---|---|---|
Our study | MURA Shoulder, 563 validation/test images, open data | EL2 | Fracture/normal: 0.8472 |
Liang and Gu [5] | MURA dataset, 194 validation images | GCN | Fracture/normal: 0.9112 |
Saif et al. [6] | MURA dataset, various test images, %50 test data, test quantity not specified | Capsule Network | Fracture/normal: 0.9208 |
Chung et al. [15] | 1376 CT images | ResNet152 | Normal/four different type: 0.96 |
Urban et al. [17] | 597 X-ray images | NASNet | Implant: 0.804 |
Sezers [16] | 219 MR images | CNN | Normal/edematous/Hill–Sachs lesions: 0.9843 |
Sezers [18] | 219 MR images | Extreme learning machines | Normal/edematous/Hill–Sachs lesions: 0.94 |
Sezers [19] | 1006 MR images | CapsNet | Normal/degenerated/torn: 0.9474 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Uysal, F.; Hardalaç, F.; Peker, O.; Tolunay, T.; Tokgöz, N. Classification of Shoulder X-ray Images with Deep Learning Ensemble Models. Appl. Sci. 2021, 11, 2723. https://doi.org/10.3390/app11062723
Uysal F, Hardalaç F, Peker O, Tolunay T, Tokgöz N. Classification of Shoulder X-ray Images with Deep Learning Ensemble Models. Applied Sciences. 2021; 11(6):2723. https://doi.org/10.3390/app11062723
Chicago/Turabian StyleUysal, Fatih, Fırat Hardalaç, Ozan Peker, Tolga Tolunay, and Nil Tokgöz. 2021. "Classification of Shoulder X-ray Images with Deep Learning Ensemble Models" Applied Sciences 11, no. 6: 2723. https://doi.org/10.3390/app11062723