Research Article
BibTex RIS Cite

TÜRKÇE SORU CEVAPLAMA İÇİN ÖNCEDEN EĞİTİLMİŞ TRANSFORMER MODELLERİNİN ETKİNLİĞİNİ KEŞFETME

Year 2025, Volume: 28 Issue: 2, 975 - 993, 03.06.2025

Abstract

Doğal Dil İşleme (NLP) ve Yapay Zekâ (AI) alanındaki son gelişmeler, Soru Cevaplama (QA) gibi çeşitli görevlerde olağanüstü performans sergileyen Transformer tabanlı büyük dil modellerinin (LLM’ler) ortaya çıkmasıyla ivme kazanmıştır. Ancak, bu modellerin düşük kaynaklı ve morfolojik açıdan zengin dillerde, özellikle Türkçe’de benimsenmesi ve performansı yeterince araştırılmamıştır. Bu çalışma, özenle hazırlanmış, altın standart bir Türkçe QA veri kümesi üzerinde çeşitli son teknoloji Transformer tabanlı LLM’leri sistematik olarak değerlendirerek bu boşluğu doldurmayı amaçlamaktadır. Değerlendirilen modeller arasında BERTurk, XLM-RoBERTa, ELECTRA-Turkish, DistilBERT ve T5-Small yer almakta olup, bu modellerin Türkçenin kendine özgü dilsel zorluklarını ele alma yeteneklerine odaklanılmıştır. Deneysel sonuçlar, BERTurk modelinin diğer modellerden üstün performans göstererek 0.8144 F1-skoru, 0.6351 Exact Match ve 0.4035 BLEU skoru elde ettiğini ortaya koymaktadır. Çalışma, dile özgü ön eğitimlerin önemini vurgulamakta ve düşük kaynaklı dillerde LLM performansını artırmaya yönelik daha fazla araştırmaya duyulan ihtiyacı ortaya koymaktadır. Elde edilen bulgular, Türkçe NLP kaynaklarını geliştirme ve yeterince temsil edilmeyen dil bağlamlarında QA sistemlerini ilerletme çabalarına değerli katkılar sunmaktadır.

References

  • Bilgin, M., Bozdemir, M., and Demir, E. (2024). Performance Analysis of Large Language Models on Turkish Question-Answer Texts. Proceedings of the 2024 Electrical-Electronics and Biomedical Engineering Conference (ELECO 2024), 1–5. Bursa, Türkiye: IEEE. https://doi.org/10.1109/ELECO64362.2024.10847201
  • Bonov, P. (2025). DeepSeek climbs to top spot of the App Store, beats ChatGPT in the process. Retrieved February6, 2025, from GSMArena website: https://www.gsmarena.com/deepseek_climbs_to_top_spot_of_the_app_store_beats_chatgpt_in_the_process-news-66286.php
  • Celebi, E., Gunel, B., and Sen, B. (2011). Automatic Question Answering for Turkish with Pattern Parsing. Proceedings of the 2011 International Symposium on INnovations in Intelligent SysTems and Applications, 389–393. Istanbul, Türkiye: IEEE. https://doi.org/10.1109/INISTA.2011.5946098
  • Clark, K., Luong, M. T., Le, Q. V., and Manning, C. D. (2020). ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. Proceedings of the 8th International Conference on Learning Representations (ICLR 2020), 1–18. Online.
  • Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., … Stoyanov, V. (2020). Unsupervised Cross-lingual Representation Learning at Scale. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 8440–8451. Online: ACL. https://doi.org/10.18653/v1/2020.acl-main.747
  • Datasets. (2025). Retrieved April 7, 2025, from Hugging Face website: https://huggingface.co/docs/datasets/en/index
  • Derici, C., Çelik, K., Kutbay, E., Aydın, Y., Güngör, T., Özgür, A., and Kartal, G. (2015). Question Analysis for a Closed Domain Question Answering System. Proceedings of the 16th International Conference Computational Linguistics and Intelligent Text Processing (CICLing 2015), 9042. Cairo, Egypt: Springer. https://doi.org/10.1007/978-3-319-18117-2_35
  • Devlin, J., Chang, M. W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2019), 1. Minneapolis, Minnesota, USA: ACL.
  • Evaluate. (2025). Retrieved April 9, 2025, from Hugging Face website: https://huggingface.co/docs/evaluate/index
  • Hu, K. (2023). ChatGPT sets record for fastest-growing user base - analyst note. Retrieved February 13, 2025, from Reuters website: https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/
  • Hunter, J. D. (2007). Matplotlib: A 2D Graphics Environment. Computing in Science and Engineering, 9(3), 90–95. https://doi.org/10.1109/MCSE.2007.55
  • Kotstein, S., and Decker, C. (2024). RESTBERTa: a Transformer-based question answering approach for semantic search in Web API documentation. Cluster Computing, 27(4). https://doi.org/10.1007/s10586-023-04237-x
  • Kuligowska, K., and Kowalczuk, B. (2021). Pseudo-labeling with transformers for improving Question Answering systems. Procedia Computer Science, 192, 1162–1169. https://doi.org/10.1016/J.PROCS.2021.08.119
  • Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., … Allen, P. G. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv, 1907.11692, 1–13.
  • Loshchilov, I., and Hutter, F. (2019). Decoupled Weight Decay Regularization. Proceedings of the 7th International Conference on Learning Representations (ICLR 2019), 1–19. New Orleans, LA, USA.
  • Luo, K., Lin, F., Luo, X., and Zhu, K. Q. (2018). Knowledge base question answering via encoding of complex query graphs. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018, 2185–2194. https://doi.org/10.18653/v1/d18-1242
  • MDZ Digital Library team. (2024a). dbmdz/bert-base-turkish-cased. Retrieved January 21, 2025, from Hugging Face website: https://huggingface.co/dbmdz/bert-base-turkish-cased
  • MDZ Digital Library team. (2024b). dbmdz/electra-base-turkish-cased-discriminator. Retrieved January 17, 2025, from Hugging Face website: https://huggingface.co/dbmdz/electra-base-turkish-cased-discriminator
  • Mehta, I. (2025). DeepSeek reaches No. 1 on US Play Store | TechCrunch. Retrieved February 6, 2025, from TechCrunch website: https://techcrunch.com/2025/01/28/deepseek-reaches-no-1-on-us-play-store/
  • Papineni, K., Roukos, S., Ward, T., and Zhu, W. (2002). BLEU: a Method for Automatic Evaluation of Machine Translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), 311–318. Philadelphia, Pennsylvania, USA: ACL.
  • Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., … Chintala, S. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. Proceedings of the Thirty-Third Conference on Neural Information Processing Systems (NIPS 2019), 8026–8037. Vancouver, BC, Canada.
  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … Duchesnay, É. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
  • Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., … Liu, P. J. (2020). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research, 21, 1–67.
  • Sanh, V., Debut, L., Chaumond, J., and Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. Proceedings of the 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing (NeurIPS 2019), 1–5. Vancouver, BC, Canada.
  • Soygazi, F., Ciftci, O., Kok, U., and Cengiz, S. (2021). THQuAD: Turkish Historic Question Answering Dataset for Reading Comprehension. Proceedings of the 6th International Conference on Computer Science and Engineering (UBMK 2021). Ankara, Türkiye: IEEE. https://doi.org/10.1109/UBMK52708.2021.9559013
  • Streamlit: A faster way to build and share data apps. (2025). Retrieved March 2, 2025, from Snowflake Inc. website: https://streamlit.io
  • The pandas development team. (2020). pandas: Python Data Analysis Library. Retrieved January 7, 2024, from https://pandas.pydata.org
  • Vazrala, S., and Khatoon Mohammed, T. (2025). RBTM: A Hybrid gradient Regression-Based transformer model for biomedical question answering. Biomedical Signal Processing and Control, 102, 107325. https://doi.org/10.1016/J.BSPC.2024.107325
  • Waskom, M. L. (2021). seaborn: statistical data visualization. Journal of Open Source Software, 6(60), 1–4. https://doi.org/10.21105/joss.03021
  • Wolf, T., Debut, L., Sanh, V., Chaumond, J., and ... (2020). Transformers: State-of-the-art Natural Language Processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 1910.03771, 38–45. Online: ACL.
  • Xu, K., Reddy, S., Feng, Y., Huang, S., and Zhao, D. (2016). Question answering on freebase via relation extraction and textual evidence. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016), 4, 2326–2336. Berlin, Germany: ACL. https://doi.org/10.18653/v1/p16-1220
  • Xue, X., Zhang, J., and Chen, Y. (2024). Question-answering framework for building codes using fine-tuned and distilled pre-trained transformer models. Automation in Construction, 168, 105730. https://doi.org/10.1016/J.AUTCON.2024.105730
  • Yu, M., Yin, W., Hasan, K. S., dos Santos, C., Xiang, B., and Zhou, B. (2017). Improved neural relation detection for knowledge base question answering. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (ACL 2017), 1, 571–581. Vancouver, Canada: ACL. https://doi.org/10.18653/v1/P17-1053
  • Zhu, S., Cheng, X., and Su, S. (2020). Knowledge-based question answering by tree-to-sequence learning. Neurocomputing, 372, 64–72. https://doi.org/10.1016/j.neucom.2019.09.003

EXPLORING THE EFFECTIVENESS OF PRE-TRAINED TRANSFORMER MODELS FOR TURKISH QUESTION ANSWERING

Year 2025, Volume: 28 Issue: 2, 975 - 993, 03.06.2025

Abstract

Recent advancements in Natural Language Processing (NLP) and Artificial Intelligence (AI) have been propelled by the emergence of Transformer-based Large Language Models (LLMs), which have demonstrated outstanding performance across various tasks, including Question Answering (QA). However, the adoption and performance of these models in low-resource and morphologically rich languages like Turkish remain underexplored. This study addresses this gap by systematically evaluating several state-of-the-art Transformer-based LLMs on a curated, gold-standard Turkish QA dataset. The models evaluated include BERTurk, XLM-RoBERTa, ELECTRA-Turkish, DistilBERT, and T5-Small, with a focus on their ability to handle the unique linguistic challenges posed by Turkish. The experimental results indicate that the BERTurk model outperforms other models, achieving an F1-score of 0.8144, an Exact Match of 0.6351, and a BLEU score of 0.4035. The study highlights the importance of language-specific pre-training and the need for further research to improve the performance of LLMs in low-resource languages. The findings provide valuable insights for future efforts in enhancing Turkish NLP resources and advancing QA systems in underrepresented linguistic contexts.

References

  • Bilgin, M., Bozdemir, M., and Demir, E. (2024). Performance Analysis of Large Language Models on Turkish Question-Answer Texts. Proceedings of the 2024 Electrical-Electronics and Biomedical Engineering Conference (ELECO 2024), 1–5. Bursa, Türkiye: IEEE. https://doi.org/10.1109/ELECO64362.2024.10847201
  • Bonov, P. (2025). DeepSeek climbs to top spot of the App Store, beats ChatGPT in the process. Retrieved February6, 2025, from GSMArena website: https://www.gsmarena.com/deepseek_climbs_to_top_spot_of_the_app_store_beats_chatgpt_in_the_process-news-66286.php
  • Celebi, E., Gunel, B., and Sen, B. (2011). Automatic Question Answering for Turkish with Pattern Parsing. Proceedings of the 2011 International Symposium on INnovations in Intelligent SysTems and Applications, 389–393. Istanbul, Türkiye: IEEE. https://doi.org/10.1109/INISTA.2011.5946098
  • Clark, K., Luong, M. T., Le, Q. V., and Manning, C. D. (2020). ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. Proceedings of the 8th International Conference on Learning Representations (ICLR 2020), 1–18. Online.
  • Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., … Stoyanov, V. (2020). Unsupervised Cross-lingual Representation Learning at Scale. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 8440–8451. Online: ACL. https://doi.org/10.18653/v1/2020.acl-main.747
  • Datasets. (2025). Retrieved April 7, 2025, from Hugging Face website: https://huggingface.co/docs/datasets/en/index
  • Derici, C., Çelik, K., Kutbay, E., Aydın, Y., Güngör, T., Özgür, A., and Kartal, G. (2015). Question Analysis for a Closed Domain Question Answering System. Proceedings of the 16th International Conference Computational Linguistics and Intelligent Text Processing (CICLing 2015), 9042. Cairo, Egypt: Springer. https://doi.org/10.1007/978-3-319-18117-2_35
  • Devlin, J., Chang, M. W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2019), 1. Minneapolis, Minnesota, USA: ACL.
  • Evaluate. (2025). Retrieved April 9, 2025, from Hugging Face website: https://huggingface.co/docs/evaluate/index
  • Hu, K. (2023). ChatGPT sets record for fastest-growing user base - analyst note. Retrieved February 13, 2025, from Reuters website: https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/
  • Hunter, J. D. (2007). Matplotlib: A 2D Graphics Environment. Computing in Science and Engineering, 9(3), 90–95. https://doi.org/10.1109/MCSE.2007.55
  • Kotstein, S., and Decker, C. (2024). RESTBERTa: a Transformer-based question answering approach for semantic search in Web API documentation. Cluster Computing, 27(4). https://doi.org/10.1007/s10586-023-04237-x
  • Kuligowska, K., and Kowalczuk, B. (2021). Pseudo-labeling with transformers for improving Question Answering systems. Procedia Computer Science, 192, 1162–1169. https://doi.org/10.1016/J.PROCS.2021.08.119
  • Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., … Allen, P. G. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv, 1907.11692, 1–13.
  • Loshchilov, I., and Hutter, F. (2019). Decoupled Weight Decay Regularization. Proceedings of the 7th International Conference on Learning Representations (ICLR 2019), 1–19. New Orleans, LA, USA.
  • Luo, K., Lin, F., Luo, X., and Zhu, K. Q. (2018). Knowledge base question answering via encoding of complex query graphs. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018, 2185–2194. https://doi.org/10.18653/v1/d18-1242
  • MDZ Digital Library team. (2024a). dbmdz/bert-base-turkish-cased. Retrieved January 21, 2025, from Hugging Face website: https://huggingface.co/dbmdz/bert-base-turkish-cased
  • MDZ Digital Library team. (2024b). dbmdz/electra-base-turkish-cased-discriminator. Retrieved January 17, 2025, from Hugging Face website: https://huggingface.co/dbmdz/electra-base-turkish-cased-discriminator
  • Mehta, I. (2025). DeepSeek reaches No. 1 on US Play Store | TechCrunch. Retrieved February 6, 2025, from TechCrunch website: https://techcrunch.com/2025/01/28/deepseek-reaches-no-1-on-us-play-store/
  • Papineni, K., Roukos, S., Ward, T., and Zhu, W. (2002). BLEU: a Method for Automatic Evaluation of Machine Translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), 311–318. Philadelphia, Pennsylvania, USA: ACL.
  • Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., … Chintala, S. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. Proceedings of the Thirty-Third Conference on Neural Information Processing Systems (NIPS 2019), 8026–8037. Vancouver, BC, Canada.
  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … Duchesnay, É. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
  • Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., … Liu, P. J. (2020). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research, 21, 1–67.
  • Sanh, V., Debut, L., Chaumond, J., and Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. Proceedings of the 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing (NeurIPS 2019), 1–5. Vancouver, BC, Canada.
  • Soygazi, F., Ciftci, O., Kok, U., and Cengiz, S. (2021). THQuAD: Turkish Historic Question Answering Dataset for Reading Comprehension. Proceedings of the 6th International Conference on Computer Science and Engineering (UBMK 2021). Ankara, Türkiye: IEEE. https://doi.org/10.1109/UBMK52708.2021.9559013
  • Streamlit: A faster way to build and share data apps. (2025). Retrieved March 2, 2025, from Snowflake Inc. website: https://streamlit.io
  • The pandas development team. (2020). pandas: Python Data Analysis Library. Retrieved January 7, 2024, from https://pandas.pydata.org
  • Vazrala, S., and Khatoon Mohammed, T. (2025). RBTM: A Hybrid gradient Regression-Based transformer model for biomedical question answering. Biomedical Signal Processing and Control, 102, 107325. https://doi.org/10.1016/J.BSPC.2024.107325
  • Waskom, M. L. (2021). seaborn: statistical data visualization. Journal of Open Source Software, 6(60), 1–4. https://doi.org/10.21105/joss.03021
  • Wolf, T., Debut, L., Sanh, V., Chaumond, J., and ... (2020). Transformers: State-of-the-art Natural Language Processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 1910.03771, 38–45. Online: ACL.
  • Xu, K., Reddy, S., Feng, Y., Huang, S., and Zhao, D. (2016). Question answering on freebase via relation extraction and textual evidence. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016), 4, 2326–2336. Berlin, Germany: ACL. https://doi.org/10.18653/v1/p16-1220
  • Xue, X., Zhang, J., and Chen, Y. (2024). Question-answering framework for building codes using fine-tuned and distilled pre-trained transformer models. Automation in Construction, 168, 105730. https://doi.org/10.1016/J.AUTCON.2024.105730
  • Yu, M., Yin, W., Hasan, K. S., dos Santos, C., Xiang, B., and Zhou, B. (2017). Improved neural relation detection for knowledge base question answering. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (ACL 2017), 1, 571–581. Vancouver, Canada: ACL. https://doi.org/10.18653/v1/P17-1053
  • Zhu, S., Cheng, X., and Su, S. (2020). Knowledge-based question answering by tree-to-sequence learning. Neurocomputing, 372, 64–72. https://doi.org/10.1016/j.neucom.2019.09.003
There are 34 citations in total.

Details

Primary Language English
Subjects Deep Learning
Journal Section Computer Engineering
Authors

Abdullah Talha Kabakuş 0000-0003-2181-4292

Publication Date June 3, 2025
Submission Date March 2, 2025
Acceptance Date April 12, 2025
Published in Issue Year 2025Volume: 28 Issue: 2

Cite

APA Kabakuş, A. T. (2025). EXPLORING THE EFFECTIVENESS OF PRE-TRAINED TRANSFORMER MODELS FOR TURKISH QUESTION ANSWERING. Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi, 28(2), 975-993.