EN
TR
COMPARATIVE ANALYSIS OF VISION TRANSFORMERS AND CONVOLUTIONAL NEURAL NETWORKS IN DIABETIC RETINOPATHY DIAGNOSIS
Abstract
Diabetic retinopathy can lead to significant visual complications and significantly affects individuals' quality of life. This study focuses on comparing the performance of Vision Transformer (ViT) models and Convolutional Neural Networks (CNN) methods in diabetic retinopathy diagnosis and aims to evaluate their potential as an alternative to traditional diagnostic methods. In this study, the performance of four different ViT model architectures and four different convolutional neural network (CNN) models in training and testing phases were comparatively analyzed. ViT models achieved accuracy rates of 97.83%, 98.41%, 95.2%, and 98.26% for "tiny," "base," "small," and "large," respectively. Additionally, models trained with VGG13, ResNet18, ResNet50, and SqueezeNet architectures from CNN techniques achieved accuracy rates of 96.1%, 97.83%, 90.9%, and 93.93%, respectively. ViT architectures achieved higher accuracy rates than CNN architectures. When the results were evaluated, it was concluded that ViT methods were more successful in the diagnosis of diabetic retinopathy.
Keywords
References
- Alhawas, N., & Tüfekçi, Z. (2022). The Identification of Red-Meat Types using The Fine-Tuned Vision Transformer and MobileNet Models. European Journal of Science and Technology. https://doi.org/10.31590/ejosat.1112892
- Beyer, L., Zhai, X., & Kolesnikov, A. I. (2022). Better plain ViT baselines for ImageNet-1k. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2205.01580
- Chen, J., He, Y., Frey, E. C., Li, Y., & Du, Y. (2021). VIT-V-Net: Vision Transformer for unsupervised Volumetric Medical Image Registration. arXiv.org. https://arxiv.org/abs/2104.06468
- Chintamreddy, D., & Seshasayee, U. R. (2024, June). Detection of Diabetic Retinopathy (DR) Severity from Fundus Photographs using Conv-ViT. In 2024 International Conference on Advancements in Power, Communication and Intelligent Systems (APCI) (pp. 1-6). IEEE.
- Darabi, P. K. (n.d.). Competitions Contributor. Kaggle. https://www.kaggle.com/pkdarabi/competitions. Accessed [24.07.2024].
- Dosovitskiy, A., et al. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv.org. https://arxiv.org/abs/2010.11929
- Fang, L., & Qiao, H. (2022). Diabetic retinopathy classification using a novel DAG network based on multi-feature of fundus images. Biomedical Signal Processing and Control, 77, 103810. https://doi.org/10.1016/j.bspc.2022.103810
- Huang, Y.-H., et al. (2023). Model long-range dependencies for multi-modality and multi-view retinopathy diagnosis through transformers. Knowledge-Based Systems, 271, 110544. https://doi.org/10.1016/j.knosys.2023.110544
Details
Primary Language
English
Subjects
Software Engineering (Other)
Journal Section
Research Article
Publication Date
June 3, 2025
Submission Date
July 24, 2024
Acceptance Date
January 23, 2025
Published in Issue
Year 2025 Volume: 28 Number: 2
APA
Yüzgeç Özdemir, E., Koç, C., & Özyurt, F. (2025). COMPARATIVE ANALYSIS OF VISION TRANSFORMERS AND CONVOLUTIONAL NEURAL NETWORKS IN DIABETIC RETINOPATHY DIAGNOSIS. Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi, 28(2), 592-600. https://doi.org/10.17780/ksujes.1521858