TR
EN
PREDICTING LUNG CANCER USING EXPLAINABLE ARTIFICIAL INTELLIGENCE AND BORUTA-SHAP METHODS
Abstract
Machine learning algorithms, a popular approach for disease prediction in recent years, can also be used to predict lung cancer, which has fatal effects. A prediction model based on machine learning algorithms is proposed to predict lung cancer. Five decision tree-based algorithms were preferred as classifiers. The experiment was conducted on a publicly available data set that contained risk factors. The Boruta-SHAP approach was employed to reveal the most salient features in the dataset. The use of the feature selection method improved the performance of the classifiers in the prediction process. Experiments were conducted using all features and reduced features separately. When comparing all the classifiers' performances, the XGBoost algorithm produced the best prediction rate with an accuracy of 97.22% and an AUROC of 0.972. The proposed model has a good classification rate compared to similar studies in the literature. We used the SHAP (SHapley Additive exPlanation) approach to investigate the effect of risk factors in the dataset on the model output. As a result, allergy was found to be the most significant risk factor for this disease.
Keywords
Ethical Statement
Bu çalışmada kamuya açık erişimi olan bir veri seti kullanıldı. Bu yüzden, etik kurul iznine ihtiyaç bulunmamaktadır.
References
- Sung, H., Ferlay, J., Siegel, R. L., Laversanne, M., Soerjomataram, I., Jemal, A., & Bray, F. (2021). Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: a cancer journal for clinicians, 71(3), 209-249.
- Li, C., Lei, S., Ding, L., Xu, Y., Wu, X., Wang, H., Zhang, Z., Gao, T., Zhang, Y., Li, L. (2023). Global burden and trends of lung cancer incidence and mortality. Chin Med J (Engl), 136(13):1583-1590
- Latimer, K. M., & Mott, T. F. (2015). Lung cancer: diagnosis, treatment principles, and screening. American family physician, 91(4), 250-256.
- Kaplanoglu, E., & Nasab, A. (2023). Evaluation of artificial intelligence techniques in disease diagnosis and prediction. Discover Artificial Intelligence, 3(1). Turk, F. &. Kokver, Y. (2022). Application with deep learning models for COVID-19 diagnosis, SAUCIS, vol. 5, no. 2, pp. 169–180. Turk, F., Luy, M., Barıscı, N. & Yalcınkaya, F., (2022), Kidney tumour segmentation using two-stage bottleneck block architecture, Intelligent Automation and Soft Computing, 33(1).
- Cai, J., Luo, J., Wang, S., & Yang, S. (2018). Feature selection in machine learning: A new perspective. Neurocomputing, 300, 70-79.
- Theng, D., & Bhoyar, K. K. (2023). Feature selection techniques for machine learning: a survey of more than two decades of research. Knowledge and Information Systems, 1-63.
- Confalonieri, R., Coba, L., Wagner, B., & Besold, T. R. (2021). A historical perspective of explainable Artificial Intelligence. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 11(1), e1391.
- Arrieta, A. B., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., ... & Herrera, F. (2020). Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information fusion, 58, 82-115.
Details
Primary Language
English
Subjects
Machine Learning (Other)
Journal Section
Research Article
Publication Date
September 3, 2024
Submission Date
January 25, 2024
Acceptance Date
March 4, 2024
Published in Issue
Year 2024 Volume: 27 Number: 3