EXPLORING THE EFFECTIVENESS OF PRE-TRAINED TRANSFORMER MODELS FOR TURKISH QUESTION ANSWERING

Abdullah Talha Kabakuş

doi:10.17780/ksujes.1649970

EN TR

EXPLORING THE EFFECTIVENESS OF PRE-TRAINED TRANSFORMER MODELS FOR TURKISH QUESTION ANSWERING

Abstract

Recent advancements in Natural Language Processing (NLP) and Artificial Intelligence (AI) have been propelled by the emergence of Transformer-based Large Language Models (LLMs), which have demonstrated outstanding performance across various tasks, including Question Answering (QA). However, the adoption and performance of these models in low-resource and morphologically rich languages like Turkish remain underexplored. This study addresses this gap by systematically evaluating several state-of-the-art Transformer-based LLMs on a curated, gold-standard Turkish QA dataset. The models evaluated include BERTurk, XLM-RoBERTa, ELECTRA-Turkish, DistilBERT, and T5-Small, with a focus on their ability to handle the unique linguistic challenges posed by Turkish. The experimental results indicate that the BERTurk model outperforms other models, achieving an F1-score of 0.8144, an Exact Match of 0.6351, and a BLEU score of 0.4035. The study highlights the importance of language-specific pre-training and the need for further research to improve the performance of LLMs in low-resource languages. The findings provide valuable insights for future efforts in enhancing Turkish NLP resources and advancing QA systems in underrepresented linguistic contexts.

Keywords

TÜRKÇE SORU CEVAPLAMA İÇİN ÖNCEDEN EĞİTİLMİŞ TRANSFORMER MODELLERİNİN ETKİNLİĞİNİ KEŞFETME

Öz

Doğal Dil İşleme (NLP) ve Yapay Zekâ (AI) alanındaki son gelişmeler, Soru Cevaplama (QA) gibi çeşitli görevlerde olağanüstü performans sergileyen Transformer tabanlı büyük dil modellerinin (LLM’ler) ortaya çıkmasıyla ivme kazanmıştır. Ancak, bu modellerin düşük kaynaklı ve morfolojik açıdan zengin dillerde, özellikle Türkçe’de benimsenmesi ve performansı yeterince araştırılmamıştır. Bu çalışma, özenle hazırlanmış, altın standart bir Türkçe QA veri kümesi üzerinde çeşitli son teknoloji Transformer tabanlı LLM’leri sistematik olarak değerlendirerek bu boşluğu doldurmayı amaçlamaktadır. Değerlendirilen modeller arasında BERTurk, XLM-RoBERTa, ELECTRA-Turkish, DistilBERT ve T5-Small yer almakta olup, bu modellerin Türkçenin kendine özgü dilsel zorluklarını ele alma yeteneklerine odaklanılmıştır. Deneysel sonuçlar, BERTurk modelinin diğer modellerden üstün performans göstererek 0.8144 F1-skoru, 0.6351 Exact Match ve 0.4035 BLEU skoru elde ettiğini ortaya koymaktadır. Çalışma, dile özgü ön eğitimlerin önemini vurgulamakta ve düşük kaynaklı dillerde LLM performansını artırmaya yönelik daha fazla araştırmaya duyulan ihtiyacı ortaya koymaktadır. Elde edilen bulgular, Türkçe NLP kaynaklarını geliştirme ve yeterince temsil edilmeyen dil bağlamlarında QA sistemlerini ilerletme çabalarına değerli katkılar sunmaktadır.

Anahtar Kelimeler

Kaynakça

Bilgin, M., Bozdemir, M., and Demir, E. (2024). Performance Analysis of Large Language Models on Turkish Question-Answer Texts. Proceedings of the 2024 Electrical-Electronics and Biomedical Engineering Conference (ELECO 2024), 1–5. Bursa, Türkiye: IEEE. https://doi.org/10.1109/ELECO64362.2024.10847201
Bonov, P. (2025). DeepSeek climbs to top spot of the App Store, beats ChatGPT in the process. Retrieved February6, 2025, from GSMArena website: https://www.gsmarena.com/deepseek_climbs_to_top_spot_of_the_app_store_beats_chatgpt_in_the_process-news-66286.php
Celebi, E., Gunel, B., and Sen, B. (2011). Automatic Question Answering for Turkish with Pattern Parsing. Proceedings of the 2011 International Symposium on INnovations in Intelligent SysTems and Applications, 389–393. Istanbul, Türkiye: IEEE. https://doi.org/10.1109/INISTA.2011.5946098
Clark, K., Luong, M. T., Le, Q. V., and Manning, C. D. (2020). ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. Proceedings of the 8th International Conference on Learning Representations (ICLR 2020), 1–18. Online.
Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., … Stoyanov, V. (2020). Unsupervised Cross-lingual Representation Learning at Scale. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 8440–8451. Online: ACL. https://doi.org/10.18653/v1/2020.acl-main.747
Datasets. (2025). Retrieved April 7, 2025, from Hugging Face website: https://huggingface.co/docs/datasets/en/index
Derici, C., Çelik, K., Kutbay, E., Aydın, Y., Güngör, T., Özgür, A., and Kartal, G. (2015). Question Analysis for a Closed Domain Question Answering System. Proceedings of the 16th International Conference Computational Linguistics and Intelligent Text Processing (CICLing 2015), 9042. Cairo, Egypt: Springer. https://doi.org/10.1007/978-3-319-18117-2_35
Devlin, J., Chang, M. W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2019), 1. Minneapolis, Minnesota, USA: ACL.

Evaluate. (2025). Retrieved April 9, 2025, from Hugging Face website: https://huggingface.co/docs/evaluate/index
Hu, K. (2023). ChatGPT sets record for fastest-growing user base - analyst note. Retrieved February 13, 2025, from Reuters website: https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/
Hunter, J. D. (2007). Matplotlib: A 2D Graphics Environment. Computing in Science and Engineering, 9(3), 90–95. https://doi.org/10.1109/MCSE.2007.55
Kotstein, S., and Decker, C. (2024). RESTBERTa: a Transformer-based question answering approach for semantic search in Web API documentation. Cluster Computing, 27(4). https://doi.org/10.1007/s10586-023-04237-x
Kuligowska, K., and Kowalczuk, B. (2021). Pseudo-labeling with transformers for improving Question Answering systems. Procedia Computer Science, 192, 1162–1169. https://doi.org/10.1016/J.PROCS.2021.08.119
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., … Allen, P. G. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv, 1907.11692, 1–13.
Loshchilov, I., and Hutter, F. (2019). Decoupled Weight Decay Regularization. Proceedings of the 7th International Conference on Learning Representations (ICLR 2019), 1–19. New Orleans, LA, USA.
Luo, K., Lin, F., Luo, X., and Zhu, K. Q. (2018). Knowledge base question answering via encoding of complex query graphs. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018, 2185–2194. https://doi.org/10.18653/v1/d18-1242
MDZ Digital Library team. (2024a). dbmdz/bert-base-turkish-cased. Retrieved January 21, 2025, from Hugging Face website: https://huggingface.co/dbmdz/bert-base-turkish-cased
MDZ Digital Library team. (2024b). dbmdz/electra-base-turkish-cased-discriminator. Retrieved January 17, 2025, from Hugging Face website: https://huggingface.co/dbmdz/electra-base-turkish-cased-discriminator
Mehta, I. (2025). DeepSeek reaches No. 1 on US Play Store | TechCrunch. Retrieved February 6, 2025, from TechCrunch website: https://techcrunch.com/2025/01/28/deepseek-reaches-no-1-on-us-play-store/
Papineni, K., Roukos, S., Ward, T., and Zhu, W. (2002). BLEU: a Method for Automatic Evaluation of Machine Translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), 311–318. Philadelphia, Pennsylvania, USA: ACL.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., … Chintala, S. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. Proceedings of the Thirty-Third Conference on Neural Information Processing Systems (NIPS 2019), 8026–8037. Vancouver, BC, Canada.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … Duchesnay, É. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., … Liu, P. J. (2020). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research, 21, 1–67.
Sanh, V., Debut, L., Chaumond, J., and Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. Proceedings of the 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing (NeurIPS 2019), 1–5. Vancouver, BC, Canada.
Soygazi, F., Ciftci, O., Kok, U., and Cengiz, S. (2021). THQuAD: Turkish Historic Question Answering Dataset for Reading Comprehension. Proceedings of the 6th International Conference on Computer Science and Engineering (UBMK 2021). Ankara, Türkiye: IEEE. https://doi.org/10.1109/UBMK52708.2021.9559013
Streamlit: A faster way to build and share data apps. (2025). Retrieved March 2, 2025, from Snowflake Inc. website: https://streamlit.io
The pandas development team. (2020). pandas: Python Data Analysis Library. Retrieved January 7, 2024, from https://pandas.pydata.org
Vazrala, S., and Khatoon Mohammed, T. (2025). RBTM: A Hybrid gradient Regression-Based transformer model for biomedical question answering. Biomedical Signal Processing and Control, 102, 107325. https://doi.org/10.1016/J.BSPC.2024.107325
Waskom, M. L. (2021). seaborn: statistical data visualization. Journal of Open Source Software, 6(60), 1–4. https://doi.org/10.21105/joss.03021
Wolf, T., Debut, L., Sanh, V., Chaumond, J., and ... (2020). Transformers: State-of-the-art Natural Language Processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 1910.03771, 38–45. Online: ACL.
Xu, K., Reddy, S., Feng, Y., Huang, S., and Zhao, D. (2016). Question answering on freebase via relation extraction and textual evidence. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016), 4, 2326–2336. Berlin, Germany: ACL. https://doi.org/10.18653/v1/p16-1220
Xue, X., Zhang, J., and Chen, Y. (2024). Question-answering framework for building codes using fine-tuned and distilled pre-trained transformer models. Automation in Construction, 168, 105730. https://doi.org/10.1016/J.AUTCON.2024.105730
Yu, M., Yin, W., Hasan, K. S., dos Santos, C., Xiang, B., and Zhou, B. (2017). Improved neural relation detection for knowledge base question answering. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (ACL 2017), 1, 571–581. Vancouver, Canada: ACL. https://doi.org/10.18653/v1/P17-1053
Zhu, S., Cheng, X., and Su, S. (2020). Knowledge-based question answering by tree-to-sequence learning. Neurocomputing, 372, 64–72. https://doi.org/10.1016/j.neucom.2019.09.003

Ayrıntılar

Birincil Dil

İngilizce

Konular

Derin Öğrenme

Bölüm

Araştırma Makalesi

Yazarlar

Abdullah Talha Kabakuş ^*
0000-0003-2181-4292
Türkiye

Yayımlanma Tarihi

3 Haziran 2025

Gönderilme Tarihi

2 Mart 2025

Kabul Tarihi

12 Nisan 2025

Yayımlandığı Sayı

Yıl 2025 Cilt: 28 Sayı: 2

DOI

https://doi.org/10.17780/ksujes.1649970

IZ

https://izlik.org/JA57CY57ZT

Kaynak Göster

RIS / Bibtex

APA

Kabakuş, A. T. (2025). EXPLORING THE EFFECTIVENESS OF PRE-TRAINED TRANSFORMER MODELS FOR TURKISH QUESTION ANSWERING. Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi, 28(2), 975-993. https://doi.org/10.17780/ksujes.1649970