TR
EN
ADVANCED HYBRID FEATURE SELECTION AND BAYESIAN OPTIMIZATION FOR PHISHING URL DETECTION
Öz
Phishing attacks are a major cybersecurity threat that aim to steal sensitive information by redirecting users to fraudulent websites. Traditional blacklist- and rule-based methods often remain insufficient, especially against zero-day attacks. This study presents a systematic methodological benchmark framework for phishing URL detection, integrating hybrid feature selection, Bayesian hyperparameter optimization, model comparison, ensemble learning, explainability analysis, and additional validation strategies. Experiments were conducted on a dataset of 88,647 URLs with 111 features, evaluating 7 hybrid feature selection methods, 3 Bayesian optimization techniques, and 252 optimized model configurations, complemented by deep learning baselines. The findings show that hybrid feature selection and optimization provide a clear contribution, particularly for tree-based models. The LightGBM model optimized on the L1+Boruta feature set with Scikit-Optimize produced the strongest single-model performance (Test Accuracy: 97.28%, F1: 95.99%, AUC: 99.54%), while the best ensemble (CAT+XGB+LGBM, Soft Voting) reached 97.34% accuracy, 96.17% F1, and 99.60% AUC — a comparable level of performance. Deep learning models yielded lower performance, while additional validation experiments demonstrated that the ensemble structure provided a consistent improvement over the baseline across different data splits.
Anahtar Kelimeler
- Phishing detection
- hybrid feature selection
- ensemble learning
- Bayesian optimization
- machine learning
Destekleyen Kurum
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Etik Beyan
This study does not require ethical approval as it uses a publicly available dataset and does not involve human subjects, animals, or personal data collection.
Teşekkür
The authors would like to thank Vrbančič et al. for making the phishing
URL dataset publicly available, which made this research possible.
Kaynakça
- Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019). Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2623–2631. https://doi.org/10.1145/3292500.3330701
- Aydemir, M. (2024). Siberuzamda suç tipolojileri ve siber iletişim tabanlı çözümleme modelinin analizi. Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi, 27(4), 1375–1400. https://doi.org/10.17780/ksujes.1477116
- Batur Dinler, Ö., & Batur Şahin, C. (2021). Prediction of phishing web sites with deep learning using WEKA environment. European Journal of Science and Technology, 24, 35–41. https://doi.org/10.31590/ejosat.901465
- Batur Dinler, Ö., Batur Şahin, C., & Abualigah, L. (2021). Comparison of performance of phishing web sites with different DeepLearning4J models. European Journal of Science and Technology, 28, 425–431. https://doi.org/10.31590/ejosat.1004778
- Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13, 281-305.
- Bergstra, J., Yamins, D., & Cox, D. D. (2013). Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In Proceedings of the 30th International Conference on Machine Learning (ICML), PMLR 28(1), 115-123.
- Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321-357. https://doi.org/10.1613/jair.953
- Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD, 785-794. https://doi.org/10.1145/2939672.2939785
Ayrıntılar
Birincil Dil
İngilizce
Konular
Sistem ve Ağ Güvenliği
Bölüm
Araştırma Makalesi
Yayımlanma Tarihi
3 Haziran 2026
Gönderilme Tarihi
28 Kasım 2025
Kabul Tarihi
24 Nisan 2026
Yayımlandığı Sayı
Yıl 2026 Cilt: 29 Sayı: 2
APA
Berkil, H., & Batur Dinler, Ö. (2026). ADVANCED HYBRID FEATURE SELECTION AND BAYESIAN OPTIMIZATION FOR PHISHING URL DETECTION. Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi, 29(2), 678-698. https://doi.org/10.17780/ksujes.1831905
AMA
1.Berkil H, Batur Dinler Ö. ADVANCED HYBRID FEATURE SELECTION AND BAYESIAN OPTIMIZATION FOR PHISHING URL DETECTION. Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi. 2026;29(2):678-698. doi:10.17780/ksujes.1831905
Chicago
Berkil, Hacer, ve Özlem Batur Dinler. 2026. “ADVANCED HYBRID FEATURE SELECTION AND BAYESIAN OPTIMIZATION FOR PHISHING URL DETECTION”. Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi 29 (2): 678-98. https://doi.org/10.17780/ksujes.1831905.
EndNote
Berkil H, Batur Dinler Ö (01 Haziran 2026) ADVANCED HYBRID FEATURE SELECTION AND BAYESIAN OPTIMIZATION FOR PHISHING URL DETECTION. Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi 29 2 678–698.
IEEE
[1]H. Berkil ve Ö. Batur Dinler, “ADVANCED HYBRID FEATURE SELECTION AND BAYESIAN OPTIMIZATION FOR PHISHING URL DETECTION”, Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi, c. 29, sy 2, ss. 678–698, Haz. 2026, doi: 10.17780/ksujes.1831905.
ISNAD
Berkil, Hacer - Batur Dinler, Özlem. “ADVANCED HYBRID FEATURE SELECTION AND BAYESIAN OPTIMIZATION FOR PHISHING URL DETECTION”. Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi 29/2 (01 Haziran 2026): 678-698. https://doi.org/10.17780/ksujes.1831905.
JAMA
1.Berkil H, Batur Dinler Ö. ADVANCED HYBRID FEATURE SELECTION AND BAYESIAN OPTIMIZATION FOR PHISHING URL DETECTION. Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi. 2026;29:678–698.
MLA
Berkil, Hacer, ve Özlem Batur Dinler. “ADVANCED HYBRID FEATURE SELECTION AND BAYESIAN OPTIMIZATION FOR PHISHING URL DETECTION”. Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi, c. 29, sy 2, Haziran 2026, ss. 678-9, doi:10.17780/ksujes.1831905.
Vancouver
1.Hacer Berkil, Özlem Batur Dinler. ADVANCED HYBRID FEATURE SELECTION AND BAYESIAN OPTIMIZATION FOR PHISHING URL DETECTION. Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi. 01 Haziran 2026;29(2):678-9. doi:10.17780/ksujes.1831905