EN
TR
A COMPREHENSIVE COMPARISON OF THE PERFORMANCE OF CLASSIFICATION ALGORITHMS IN DETERMINING DIABETES RISK STATUS
Abstract
Diabetes is a metabolic public health problem with an increasing prevalence worldwide. If untreated, it can cause irreversible effects on many tissues and organs. Therefore, early diagnosis and effective management of diabetes is critical to improve patients' quality of life and reduce potential health risks. In the healthcare industry, machine learning (ML) based decision support systems (DSS) are widely used for disease diagnosis. In this study, a proposed ML-based CDS for diabetes diagnosis is presented. Within the scope of the study, the dataset is randomly split five times in a ratio of 80:20 and the performances of five different ML algorithms (k-nearest neighbor, ridge, extreme gradient boosting, extra tree and gradient boosting) are evaluated. For this purpose, the features in the dataset are evaluated with the RO algorithm and the most significant features are determined by the SelectKBest method based on the Chi-square test. In addition, the effects of resampling techniques (synthetic minority oversampling technique, Near Miss) on the performance of the proposed system were analyzed. As a result of the analysis, it was found that the gradient boosting algorithm performed best when the Near Miss resampling technique was applied to the dataset. In this case, the F-score, precision, accuracy and sensitivity values were calculated as 99.44%, 98.89%, 99.45% and 100%, respectively, based on the analysis with the test data.
Keywords
References
- Alehegn, M., Raghvendra Joshi, R., & Mulay, P. (2019). Diabetes Analysis And Prediction Using Random Forest, KNN, Naïve Bayes, And J48: An Ensemble Approach. International Journal of Scientific & Technology Research, 8(9), 1346-1354.
- Akyol, K., & Şen, B. (2018). Diabetes Mellitus Data Classification by Cascading of Feature Selection Methods and Ensemble Learning Algorithms. International Journal of Modern Education and Computer Science, 10(6), 10-16. https://doi.org/10.5815/ijmecs.2018.06.02
- Dal, A., Gümüş, İ. H., Güldal, S. & Yavaş, M. (2021). Dengesiz Veriler İçin Ağırlıklı Geometrik Ortalama Tabanlı Yeni Bir Yeniden Örnekleme Yaklaşımı, Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi, 8 (15), 343-352. https://doi.org/10.54365/adyumbd.940539
- Daghistani, T., & Alshammari, R. (2020). Comparison of statistical logistic regression and randomforest machine learning techniques in predicting diabetes. Journal of Advances in Information Technology, 11(2), 78-83. https://doi.org/10.12720/jait.11.2.78-83
- Das, H., Naik, B., & Behera, H. S. (2018). Classification of diabetes mellitus disease (DMD): A data mining (DM) approach. Advances in Intelligent Systems and Computing, 710, 539-549. Springer Verlag. https://doi.org/10.1007/978-981-10-7871-2_52
- Hacıbeyoglu, M., Çelik, M., & Erdaş Çiçek, Ö. (2023). En Yakın Komşu Algoritması ile Binalarda Enerji Verimliliği Tahmini. Necmettin Erbakan Üniversitesi Fen ve Mühendislik Bilimleri Dergisi, 5(2), 28-37. https://doi.org/10.47112/neufmbd.2023.10
- Harman, G. (2021). Destek vektör makineleri ve naive bayes sınıflandırma algoritmalarını kullanarak diabetes mellitus tahmini. Avrupa Bilim ve Teknoloji Dergisi, (32), 7-13. https://doi.org/ 10.31590/ejosat.1041186
- IDF Diabetes Atlas. Diabetes around the world in 2021. https://diabetesatlas.org/ Accessed 04.04.2024
Details
Primary Language
Turkish
Subjects
Artificial Intelligence (Other)
Journal Section
Research Article
Authors
Okan Erkaymaz
0000-0002-1996-8623
Türkiye
Publication Date
December 3, 2024
Submission Date
April 4, 2024
Acceptance Date
July 19, 2024
Published in Issue
Year 1970 Volume: 27 Number: 4
APA
Uzun Arslan, R., Şenyer Yapıcı, İ., & Erkaymaz, O. (2024). DİYABET RİSK DURUMUNUN BELİRLENMESİNDE SINIFLANDIRMA ALGORİTMALARININ PERFORMANSLARININ KAPSAMLI BİR ŞEKİLDE KARŞILAŞTIRILMASI. Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi, 27(4), 1320-1333. https://doi.org/10.17780/ksujes.1465177
Cited By
Predictive analytics for thyroid cancer recurrence: a feature selection and data balancing approach
The European Physical Journal Special Topics
https://doi.org/10.1140/epjs/s11734-025-01720-xMULTILAYER ANALYSIS OF NICOTINE-INDUCED GENE EXPRESSION ALTERATIONS IN BREAST CANCER CELLS USING CLUSTERING AND SUPERVISED LEARNING METHODS
Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi
https://doi.org/10.17780/ksujes.1730962