TR
EN
ADVANCED HYBRID FEATURE SELECTION AND BAYESIAN OPTIMIZATION FOR PHISHING URL DETECTION
Abstract
Phishing attacks are a major cybersecurity threat that aim to steal sensitive information by redirecting users to fraudulent websites. Traditional blacklist- and rule-based methods often remain insufficient, especially against zero-day attacks. This study presents a systematic methodological benchmark framework for phishing URL detection, integrating hybrid feature selection, Bayesian hyperparameter optimization, model comparison, ensemble learning, explainability analysis, and additional validation strategies. Experiments were conducted on a dataset of 88,647 URLs with 111 features, evaluating 7 hybrid feature selection methods, 3 Bayesian optimization techniques, and 252 optimized model configurations, complemented by deep learning baselines. The findings show that hybrid feature selection and optimization provide a clear contribution, particularly for tree-based models. The LightGBM model optimized on the L1+Boruta feature set with Scikit-Optimize produced the strongest single-model performance (Test Accuracy: 97.28%, F1: 95.99%, AUC: 99.54%), while the best ensemble (CAT+XGB+LGBM, Soft Voting) reached 97.34% accuracy, 96.17% F1, and 99.60% AUC — a comparable level of performance. Deep learning models yielded lower performance, while additional validation experiments demonstrated that the ensemble structure provided a consistent improvement over the baseline across different data splits.
Keywords
Supporting Institution
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Ethical Statement
This study does not require ethical approval as it uses a publicly available dataset and does not involve human subjects, animals, or personal data collection.
Thanks
The authors would like to thank Vrbančič et al. for making the phishing
URL dataset publicly available, which made this research possible.
References
- Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019). Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2623–2631. https://doi.org/10.1145/3292500.3330701
- Aydemir, M. (2024). Siberuzamda suç tipolojileri ve siber iletişim tabanlı çözümleme modelinin analizi. Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi, 27(4), 1375–1400. https://doi.org/10.17780/ksujes.1477116
- Batur Dinler, Ö., & Batur Şahin, C. (2021). Prediction of phishing web sites with deep learning using WEKA environment. European Journal of Science and Technology, 24, 35–41. https://doi.org/10.31590/ejosat.901465
- Batur Dinler, Ö., Batur Şahin, C., & Abualigah, L. (2021). Comparison of performance of phishing web sites with different DeepLearning4J models. European Journal of Science and Technology, 28, 425–431. https://doi.org/10.31590/ejosat.1004778
- Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13, 281-305.
- Bergstra, J., Yamins, D., & Cox, D. D. (2013). Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In Proceedings of the 30th International Conference on Machine Learning (ICML), PMLR 28(1), 115-123.
- Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321-357. https://doi.org/10.1613/jair.953
- Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD, 785-794. https://doi.org/10.1145/2939672.2939785
Details
Primary Language
English
Subjects
System and Network Security
Journal Section
Research Article
Publication Date
June 3, 2026
Submission Date
November 28, 2025
Acceptance Date
April 24, 2026
Published in Issue
Year 2026 Volume: 29 Number: 2
APA
Berkil, H., & Batur Dinler, Ö. (2026). ADVANCED HYBRID FEATURE SELECTION AND BAYESIAN OPTIMIZATION FOR PHISHING URL DETECTION. Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi, 29(2), 678-698. https://izlik.org/JA44UW96BD
AMA
1.Berkil H, Batur Dinler Ö. ADVANCED HYBRID FEATURE SELECTION AND BAYESIAN OPTIMIZATION FOR PHISHING URL DETECTION. KSU J. Eng. Sci. 2026;29(2):678-698. https://izlik.org/JA44UW96BD
Chicago
Berkil, Hacer, and Özlem Batur Dinler. 2026. “ADVANCED HYBRID FEATURE SELECTION AND BAYESIAN OPTIMIZATION FOR PHISHING URL DETECTION”. Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi 29 (2): 678-98. https://izlik.org/JA44UW96BD.
EndNote
Berkil H, Batur Dinler Ö (June 1, 2026) ADVANCED HYBRID FEATURE SELECTION AND BAYESIAN OPTIMIZATION FOR PHISHING URL DETECTION. Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi 29 2 678–698.
IEEE
[1]H. Berkil and Ö. Batur Dinler, “ADVANCED HYBRID FEATURE SELECTION AND BAYESIAN OPTIMIZATION FOR PHISHING URL DETECTION”, KSU J. Eng. Sci., vol. 29, no. 2, pp. 678–698, June 2026, [Online]. Available: https://izlik.org/JA44UW96BD
ISNAD
Berkil, Hacer - Batur Dinler, Özlem. “ADVANCED HYBRID FEATURE SELECTION AND BAYESIAN OPTIMIZATION FOR PHISHING URL DETECTION”. Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi 29/2 (June 1, 2026): 678-698. https://izlik.org/JA44UW96BD.
JAMA
1.Berkil H, Batur Dinler Ö. ADVANCED HYBRID FEATURE SELECTION AND BAYESIAN OPTIMIZATION FOR PHISHING URL DETECTION. KSU J. Eng. Sci. 2026;29:678–698.
MLA
Berkil, Hacer, and Özlem Batur Dinler. “ADVANCED HYBRID FEATURE SELECTION AND BAYESIAN OPTIMIZATION FOR PHISHING URL DETECTION”. Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi, vol. 29, no. 2, June 2026, pp. 678-9, https://izlik.org/JA44UW96BD.
Vancouver
1.Hacer Berkil, Özlem Batur Dinler. ADVANCED HYBRID FEATURE SELECTION AND BAYESIAN OPTIMIZATION FOR PHISHING URL DETECTION. KSU J. Eng. Sci. [Internet]. 2026 Jun. 1;29(2):678-9. Available from: https://izlik.org/JA44UW96BD