COMPARATIVE ANALYSIS OF VISION TRANSFORMERS AND CONVOLUTIONAL NEURAL NETWORKS IN DIABETIC RETINOPATHY DIAGNOSIS

Esra Yüzgeç Özdemir; Canan Koç; Fatih Özyurt

doi:10.17780/ksujes.1521858

Research Article

COMPARATIVE ANALYSIS OF VISION TRANSFORMERS AND CONVOLUTIONAL NEURAL NETWORKS IN DIABETIC RETINOPATHY DIAGNOSIS

Year 2025, Volume: 28 Issue: 2, 592 - 600, 03.06.2025

Esra Yüzgeç Özdemir , Canan Koç , Fatih Özyurt

https://doi.org/10.17780/ksujes.1521858

Abstract

Diabetic retinopathy can lead to significant visual complications and significantly affects individuals' quality of life. This study focuses on comparing the performance of Vision Transformer (ViT) models and Convolutional Neural Networks (CNN) methods in diabetic retinopathy diagnosis and aims to evaluate their potential as an alternative to traditional diagnostic methods. In this study, the performance of four different ViT model architectures and four different convolutional neural network (CNN) models in training and testing phases were comparatively analyzed. ViT models achieved accuracy rates of 97.83%, 98.41%, 95.2%, and 98.26% for "tiny," "base," "small," and "large," respectively. Additionally, models trained with VGG13, ResNet18, ResNet50, and SqueezeNet architectures from CNN techniques achieved accuracy rates of 96.1%, 97.83%, 90.9%, and 93.93%, respectively. ViT architectures achieved higher accuracy rates than CNN architectures. When the results were evaluated, it was concluded that ViT methods were more successful in the diagnosis of diabetic retinopathy.

Keywords

Diabetic Retinopathy , Vision Transformers , Deep Learning , Convolutional Neural Networks

References

Alhawas, N., & Tüfekçi, Z. (2022). The Identification of Red-Meat Types using The Fine-Tuned Vision Transformer and MobileNet Models. European Journal of Science and Technology. https://doi.org/10.31590/ejosat.1112892
Beyer, L., Zhai, X., & Kolesnikov, A. I. (2022). Better plain ViT baselines for ImageNet-1k. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2205.01580
Chen, J., He, Y., Frey, E. C., Li, Y., & Du, Y. (2021). VIT-V-Net: Vision Transformer for unsupervised Volumetric Medical Image Registration. arXiv.org. https://arxiv.org/abs/2104.06468
Chintamreddy, D., & Seshasayee, U. R. (2024, June). Detection of Diabetic Retinopathy (DR) Severity from Fundus Photographs using Conv-ViT. In 2024 International Conference on Advancements in Power, Communication and Intelligent Systems (APCI) (pp. 1-6). IEEE.
Darabi, P. K. (n.d.). Competitions Contributor. Kaggle. https://www.kaggle.com/pkdarabi/competitions. Accessed [24.07.2024].
Dosovitskiy, A., et al. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv.org. https://arxiv.org/abs/2010.11929
Fang, L., & Qiao, H. (2022). Diabetic retinopathy classification using a novel DAG network based on multi-feature of fundus images. Biomedical Signal Processing and Control, 77, 103810. https://doi.org/10.1016/j.bspc.2022.103810
Huang, Y.-H., et al. (2023). Model long-range dependencies for multi-modality and multi-view retinopathy diagnosis through transformers. Knowledge-Based Systems, 271, 110544. https://doi.org/10.1016/j.knosys.2023.110544
Karthika, S., & Durgadevi, M. (2024). Improved ResNet_101 assisted attentional global transformer network for automated detection and classification of diabetic retinopathy disease. Biomedical Signal Processing and Control, 88, 105674. https://doi.org/10.1016/j.bspc.2023.105674
Lian, J., & Li, T. (2024). Lesion identification in fundus images via convolutional neural network-vision transformer. Biomedical Signal Processing and Control, 88, 105607. https://doi.org/10.1016/j.bspc.2023.105607
Manzari, O. N., Ahmadabadi, H., Kashiani, H., Shokouhi, S. B., & Ayatollahi, A. (2023). MedViT: A robust vision transformer for generalized medical image classification. Computers in Biology and Medicine, 157, 106791. https://doi.org/10.1016/j.compbiomed.2023.106791
Özçelik, Y. B., & Altan, A. (2021). Diyabetik Retinopati Teşhisi için Fundus Görüntülerinin Derin Öğrenme Tabanlı Sınıflandırılması. European Journal of Science and Technology. December 2021. https://doi.org/10.31590/ejosat.1011806
Özçelik, Y. B., & Altan, A. (2023). Overcoming nonlinear dynamics in diabetic retinopathy classification: a robust AI-based model with chaotic swarm intelligence optimization and recurrent long short-term memory. Fractal and Fractional, 7(8), 598.
Patil, M. S., Chickerur, S., Abhimalya, C., Naik, A., Kumari, N., & Maurya, S. K. (2023). Effective deep learning data augmentation techniques for diabetic retinopathy classification. Procedia Computer Science, 218, 1156-1165. https://doi.org/10.1016/j.procs.2023.01.094
Powers, D. M. (2020). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061.
Rahmanlar, H., Atılgan, C. Ü., Çıtırık, M., Yaradilmiş, İ. M., & Gürsöz, H. (2019). Türkiye’de diyabetik retinopati tanısında endikasyon dışı ilaç kullanımı. Sakarya Medical Journal, 9(3), 499-505. https://doi.org/10.31832/smj.543998
Sunkari, S., et al. (2024). A refined ResNet18 architecture with Swish activation function for Diabetic Retinopathy classification. Biomedical Signal Processing and Control, 88, 105630. https://doi.org/10.1016/j.bspc.2023.105630
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., & Jégou, H. (2021). Training data-efficient image transformers & distillation through attention. arXiv preprint arXiv:2012.12877.
Uğurlu, N., Taşlıpınar, A. G., Yülek, F., Özdemir, D., Ersoy, R., & Çakır, B. (2018). Evaluation of Retinal Microvascular Structures in Type 1 Diabetic Patients without Diabetic Retinopathy. Ankara Medical Journal. December 2018. https://doi.org/10.17098/amj.501136
Wang, Z., Dong, N., & Voiculescu, I. (2022). Computationally-Efficient Vision transformer for medical image semantic segmentation via dual Pseudo-Label supervision. 2022 IEEE International Conference on Image Processing (ICIP). https://doi.org/10.1109/icip46576.2022.9897482
Wu, J., Hu, R., Xiao, Z., Chen, J., & Liu, J. (2021). Vision Transformer‐based recognition of diabetic retinopathy grade. Medical Physics, 48(12), 7850-7863.
Wu, K., et al. (2023). TinyCLIP: CLIP distillation via affinity mimicking and weight inheritance. arXiv.org. https://arxiv.org/abs/2309.12314
Zhou, D., et al. (2021). DeepVIT: Towards Deeper Vision Transformer. arXiv.org. https://arxiv.org/abs/2103.11886

GÖRÜNTÜ TRANSFORMATÖRLERİ VE EVRİŞİMLİ SİNİR AĞLARININ DİYABETİK RETİNOPATİ TEŞHİSİNDE KARŞILAŞTIRMALI ANALİZİ

Year 2025, Volume: 28 Issue: 2, 592 - 600, 03.06.2025

Esra Yüzgeç Özdemir , Canan Koç , Fatih Özyurt

https://doi.org/10.17780/ksujes.1521858

Abstract

Diyabetik retinopati, önemli görsel komplikasyonlara yol açabilen ve bireylerin yaşam kalitesini önemli ölçüde etkileyen bir hastalıktır. Bu çalışma, diyabetik retinopatinin erken evrelerde teşhis edilmesinin önemini vurgulamakta, mevcut teşhis yöntemlerinin sınırlılıklarına dikkat çekmekte ve geleneksel yöntemlere alternatif olarak Görüntü Dönüştürücüsü (ViT) modellerinin potansiyelini ele almaktadır. Bu çalışmada, dört farklı ViT model mimarisinin yanı sıra döt farklı evrişimli sinir ağı (CNN) modellerinin eğitim ve test aşamalarındaki performansları karşılaştırmalı olarak analiz edilmiştir. ViT modelleri 'tiny', 'base', 'small' ve 'large' sırasıyla %97,83, %98,41, %95,2 ve %98,26 doğruluk oranlarına ulaşmıştır. Ayrıca CNN tekniklerinden VGG13, ResNet18, ResNet50 ve SqueezeNet mimarileri ile eğitilen modeller sırasıyla %96,1, %97,83, %90,9 ve %93,93 doğruluk oranlarına ulaşmıştır. Çalışma sonucunda ViT mimarileri CNN mimarilerine göre daha yüksek doğruluk oranlarına ulaşmıştır. Sonuçlar değerlendirildiğinde ViT yöntemlerinin diyabetik retinopati teşhisinde daha başarılı olduğu sonucuna varılmıştır.

Keywords

Diyabetik Retinopati , Görüntü Dönüştürücüleri , Derin Öğrenme , Evrişimli Sinir Ağları

References

Alhawas, N., & Tüfekçi, Z. (2022). The Identification of Red-Meat Types using The Fine-Tuned Vision Transformer and MobileNet Models. European Journal of Science and Technology. https://doi.org/10.31590/ejosat.1112892
Beyer, L., Zhai, X., & Kolesnikov, A. I. (2022). Better plain ViT baselines for ImageNet-1k. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2205.01580
Chen, J., He, Y., Frey, E. C., Li, Y., & Du, Y. (2021). VIT-V-Net: Vision Transformer for unsupervised Volumetric Medical Image Registration. arXiv.org. https://arxiv.org/abs/2104.06468
Chintamreddy, D., & Seshasayee, U. R. (2024, June). Detection of Diabetic Retinopathy (DR) Severity from Fundus Photographs using Conv-ViT. In 2024 International Conference on Advancements in Power, Communication and Intelligent Systems (APCI) (pp. 1-6). IEEE.
Darabi, P. K. (n.d.). Competitions Contributor. Kaggle. https://www.kaggle.com/pkdarabi/competitions. Accessed [24.07.2024].
Dosovitskiy, A., et al. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv.org. https://arxiv.org/abs/2010.11929
Fang, L., & Qiao, H. (2022). Diabetic retinopathy classification using a novel DAG network based on multi-feature of fundus images. Biomedical Signal Processing and Control, 77, 103810. https://doi.org/10.1016/j.bspc.2022.103810
Huang, Y.-H., et al. (2023). Model long-range dependencies for multi-modality and multi-view retinopathy diagnosis through transformers. Knowledge-Based Systems, 271, 110544. https://doi.org/10.1016/j.knosys.2023.110544
Karthika, S., & Durgadevi, M. (2024). Improved ResNet_101 assisted attentional global transformer network for automated detection and classification of diabetic retinopathy disease. Biomedical Signal Processing and Control, 88, 105674. https://doi.org/10.1016/j.bspc.2023.105674
Lian, J., & Li, T. (2024). Lesion identification in fundus images via convolutional neural network-vision transformer. Biomedical Signal Processing and Control, 88, 105607. https://doi.org/10.1016/j.bspc.2023.105607
Manzari, O. N., Ahmadabadi, H., Kashiani, H., Shokouhi, S. B., & Ayatollahi, A. (2023). MedViT: A robust vision transformer for generalized medical image classification. Computers in Biology and Medicine, 157, 106791. https://doi.org/10.1016/j.compbiomed.2023.106791
Özçelik, Y. B., & Altan, A. (2021). Diyabetik Retinopati Teşhisi için Fundus Görüntülerinin Derin Öğrenme Tabanlı Sınıflandırılması. European Journal of Science and Technology. December 2021. https://doi.org/10.31590/ejosat.1011806
Özçelik, Y. B., & Altan, A. (2023). Overcoming nonlinear dynamics in diabetic retinopathy classification: a robust AI-based model with chaotic swarm intelligence optimization and recurrent long short-term memory. Fractal and Fractional, 7(8), 598.
Patil, M. S., Chickerur, S., Abhimalya, C., Naik, A., Kumari, N., & Maurya, S. K. (2023). Effective deep learning data augmentation techniques for diabetic retinopathy classification. Procedia Computer Science, 218, 1156-1165. https://doi.org/10.1016/j.procs.2023.01.094
Powers, D. M. (2020). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061.
Rahmanlar, H., Atılgan, C. Ü., Çıtırık, M., Yaradilmiş, İ. M., & Gürsöz, H. (2019). Türkiye’de diyabetik retinopati tanısında endikasyon dışı ilaç kullanımı. Sakarya Medical Journal, 9(3), 499-505. https://doi.org/10.31832/smj.543998
Sunkari, S., et al. (2024). A refined ResNet18 architecture with Swish activation function for Diabetic Retinopathy classification. Biomedical Signal Processing and Control, 88, 105630. https://doi.org/10.1016/j.bspc.2023.105630
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., & Jégou, H. (2021). Training data-efficient image transformers & distillation through attention. arXiv preprint arXiv:2012.12877.
Uğurlu, N., Taşlıpınar, A. G., Yülek, F., Özdemir, D., Ersoy, R., & Çakır, B. (2018). Evaluation of Retinal Microvascular Structures in Type 1 Diabetic Patients without Diabetic Retinopathy. Ankara Medical Journal. December 2018. https://doi.org/10.17098/amj.501136
Wang, Z., Dong, N., & Voiculescu, I. (2022). Computationally-Efficient Vision transformer for medical image semantic segmentation via dual Pseudo-Label supervision. 2022 IEEE International Conference on Image Processing (ICIP). https://doi.org/10.1109/icip46576.2022.9897482
Wu, J., Hu, R., Xiao, Z., Chen, J., & Liu, J. (2021). Vision Transformer‐based recognition of diabetic retinopathy grade. Medical Physics, 48(12), 7850-7863.
Wu, K., et al. (2023). TinyCLIP: CLIP distillation via affinity mimicking and weight inheritance. arXiv.org. https://arxiv.org/abs/2309.12314
Zhou, D., et al. (2021). DeepVIT: Towards Deeper Vision Transformer. arXiv.org. https://arxiv.org/abs/2103.11886

There are 23 citations in total.

Details

Primary Language	English
Subjects	Software Engineering (Other)
Journal Section	Research Article
Authors	Esra Yüzgeç Özdemir 0000-0003-2914-2603 Canan Koç 0000-0002-2651-9471 Fatih Özyurt 0000-0002-8154-6691
Publication Date	June 3, 2025
Submission Date	July 24, 2024
Acceptance Date	January 23, 2025
Published in Issue	Year 2025 Volume: 28 Issue: 2

Cite

APA	Yüzgeç Özdemir, E., Koç, C., & Özyurt, F. (2025). COMPARATIVE ANALYSIS OF VISION TRANSFORMERS AND CONVOLUTIONAL NEURAL NETWORKS IN DIABETIC RETINOPATHY DIAGNOSIS. Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi, 28(2), 592-600. https://doi.org/10.17780/ksujes.1521858

Download Cover Image

Article Files

Full Text