MULTILAYER ANALYSIS OF NICOTINE-INDUCED GENE EXPRESSION ALTERATIONS IN BREAST CANCER CELLS USING CLUSTERING AND SUPERVISED LEARNING METHODS
Yıl 2025,
Cilt: 28 Sayı: 3, 1558 - 1573, 03.09.2025
Taybe Alabed
,
Sema Servi
Öz
Nicotine is known not only for its addictive properties but also for its potential to alter the genetic structure of cancer cells. This study investigates the genetic alterations caused by chronic nicotine exposure in HCC38 breast cancer cells through a multi-layered computational analysis. Using K-means clustering and seven supervised machine learning algorithms, differentially expressed genes were grouped into three clusters and used as class labels in classification models. Logistic Regression achieved the highest performance with 98.76% accuracy and an F1 score of 0.9869. The Friedman test was applied to evaluate the statistical significance of performance differences among classifiers, and multi-class ROC curves were used to demonstrate their discriminative power. The findings indicate that nicotine exposure leads to genetic reprogramming and activates inflammation-related gene pathways in specific cellular subpopulations. These results highlight the utility of machine learning methods in uncovering biologically meaningful gene expression patterns in cancer research.
Kaynakça
-
Ali, F., Kwak, K.-S., & Kim, Y.-G. (2016). Opinion mining based on fuzzy domain ontology and Support Vector Machine: A proposal to automate online review classification. Applied Soft Computing, 47, 235-250. https://doi.org/10.1016/j.asoc.2016.06.003
-
Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3), 175-185.
-
Amasyali, M. F., & Ersoy, O. (2008). The performance factors of clustering ensembles. 2008 IEEE 16th Signal Processing, Communication and Applications Conference,
-
Arlot, S., & Celisse, A. (2010). A survey of cross-validation procedures for model selection. https://doi.org/10.1214/09-SS054
-
Arslan, R. U., Yapıcı, İ. Ş., & Erkaymaz, O. (2024). Diyabet risk durumunun belirlenmesinde siniflandirma algoritmalarinin performanslarinin kapsamli bir şekilde karşilaştirilmasi. Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi, 27(4), 1320-1333. https://doi.org/10.17780/ksujes.1465177
-
Aydın, C. (2018). Makine öğrenmesi algoritmaları kullanılarak itfaiye istasyonu ihtiyacının sınıflandırılması. Avrupa Bilim ve Teknoloji Dergisi(14), 169-175. https://doi.org/10.31590/ejosat.458613
-
Başer, B. Ö., Yangın, M., & Sarıdaş, E. S. (2021). Makine öğrenmesi teknikleriyle diyabet hastalığının sınıflandırılması. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 25(1), 112-120. https://doi.org/10.19113/sdufenbed.842460
-
Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
-
Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining,
-
Coşkun, C., & Kuncan, F. (2022). Evaluation of performance of classification algorithms in prediction of heart failure disease. Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi, 25(4), 622-632. https://doi.org/10.17780/ksujes.1144570
-
ÇINAR, A. (2019). Veri madenciliğinde sınıflandırma algoritmalarının performans değerlendirmesi ve R dili ile bir uygulama. Öneri, 14(51), 90. https://doi.org/10.14783/maruoneri.vi.522168
-
Dasgupta, P., Rizwani, W., Pillai, S., Kinkade, R., Kovacs, M., Rastogi, S., Banerjee, S., Carless, M., Kim, E., & Coppola, D. (2009). Nicotine induces cell proliferation, invasion and epithelial‐mesenchymal transition in a variety of human cancer cell lines. International Journal of Cancer, 124(1), 36-45. https://doi.org/10.1002/ijc.23894
-
Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine learning research, 7(Jan), 1-30.
-
Dinçer, E. (2006). Veri madenciliğinde K-means algoritması ve tıp alanında uygulanması. Kocaeli Üniversitesi, Fen Bilimleri Enstitüsü, Yüksek Lisans Tezi.
-
Evans, R. S., Lloyd, J. F., Stoddard, G. J., Nebeker, J. R., & Samore, M. H. (2005). Risk factors for adverse drug events: a 10-year analysis. Annals of Pharmacotherapy, 39(7-8), 1161-1168. https://doi.org/10.1345/aph.1E642
-
García, S., Fernández, A., Luengo, J., & Herrera, F. (2010). Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Information sciences, 180(10), 2044-2064. https://doi.org/10.1016/j.ins.2009.12.010
-
Guha, P., Bandyopadhyaya, G., Polumuri, S. K., Chumsri, S., Gade, P., Kalvakolanu, D. V., & Ahmed, H. (2014). Nicotine promotes apoptosis resistance of breast cancer cells and enrichment of side population cells with cancer stem cell-like properties via a signaling cascade involving galectin-3, α9 nicotinic acetylcholine receptor and STAT3. Breast cancer research and treatment, 145(1), 5-22. https://doi.org/10.1007/s10549-014-2912-z
-
Gulsoy, N., & Kulluk, S. (2019). A data mining application in credit scoring processes of small and medium enterprises commercial corporate customers. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(3), e1299. https://doi.org/10.1002/widm.1299
-
Han, J., Pei, J., & Tong, H. (2022). Data mining: concepts and techniques. Morgan kaufmann.
-
Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression. John Wiley & Sons.
-
Islam, M. J., Wu, Q. J., Ahmadi, M., & Sid-Ahmed, M. A. (2007). Investigating the performance of naive-bayes classifiers and k-nearest neighbor classifiers. 2007 international conference on convergence information technology (ICCIT 2007),
-
Ivaylov, I. (2021). EEG classification for BCI using genetic algorithm and k-fold cross validation. Електротехника и електроника, 56(3-4), 38-45.
-
Kaplan, A., Kavadar, E., & Altuncu, M. A. (2025). Radyomik özellikler ve makine öğrenmesi teknikleriyle meme tümörlerinin sınıflandırılması. Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi, 28(1), 38-50. https://doi.org/10.17780/ksujes.1512476
-
Khan, R., Qian, Y., & Naeem, S. (2019). Extractive based text summarization using k-means and tf-idf. International Journal of Information Engineering and Electronic Business, 13(3), 33. https://doi.org/10.5815/ijieeb.2019.03.05
-
Kumari, K., Das, B., Adhya, A., Chaudhary, S., Senapati, S., & Mishra, S. K. (2018). Nicotine associated breast cancer in smokers is mediated through high level of EZH2 expression which can be reversed by methyltransferase inhibitor DZNepA. Cell death & disease, 9(2), 152. https://doi.org/10.1038/s41419-017-0224-z
-
Maimon, O. Z., & Rokach, L. (2014). Data mining with decision trees: theory and applications (Vol. 81). World scientific.
-
Menard, S. (2001). Applied logistic regression analysis. SAGE publications.
-
Miao, Y., & Xu, Y. (2024). Random Forest-Based Analysis of Variability in Feature Impacts. 2024 IEEE 2nd International Conference on Image Processing and Computer Applications (ICIPCA),
-
Mitchell, R., & Frank, E. (2017). Accelerating the XGBoost algorithm using GPU computing. PeerJ Computer Science, 3, e127. https://doi.org/10.7717/peerj-cs.127
-
Mugisha, S., Labhsetwar, S., Dave, D., Klemke, R., & Desgrosellier, J. S. (2025). A dataset of chronic nicotine-induced genes in breast cancer cells. Data in Brief, 60, 111573. https://doi.org/10.1016/j.dib.2025.111573
-
Pierce, J. P., Patterson, R. E., Senger, C. M., Flatt, S. W., Caan, B. J., Natarajan, L., Nechuta, S. J., Poole, E. M., Shu, X.-O., & Chen, W. Y. (2014). Lifetime cigarette smoking and breast cancer prognosis in the After Breast Cancer
Pooling Project. Journal of the National Cancer Institute, 106(1), djt359. https://doi.org/10.1093/jnci/djt359
Raschka, S. (2018). Model evaluation, model selection, and algorithm selection in machine learning. arXiv preprint arXiv:1811.12808. https://doi.org/10.48550/arXiv.1811.12808
-
Ray, S. (2019). A quick review of machine learning algorithms. 2019 International conference on machine learning, big data, cloud and parallel computing (COMITCon),
-
Rish, I. (2001). An empirical study of the naive Bayes classifier. IJCAI 2001 workshop on empirical methods in artificial intelligence,
-
Saquib, N., Stefanick, M. L., Natarajan, L., & Pierce, J. P. (2013). Mortality risk in former smokers with breast cancer: Pack‐years vs. smoking status. International Journal of Cancer, 133(10), 2493-2497. https://doi.org/10.1002/ijc.28241
-
Singh, A., Thakur, N., & Sharma, A. (2016). A review of supervised machine learning algorithms. 2016 3rd international conference on computing for sustainable global development (INDIACom),
-
Tyagi, A., Sharma, S., Wu, K., Wu, S.-Y., Xing, F., Liu, Y., Zhao, D., Deshpande, R. P., D’Agostino Jr, R. B., & Watabe, K. (2021). Retracted article: Nicotine promotes breast cancer metastasis by stimulating N2 neutrophils and generating pre-metastatic niche in lung. Nature communications, 12(1), 474. https://doi.org/10.1038/s41467-025-59975-w
-
Zhang, H. (2004). The optimality of naive Bayes. Aa, 1(2), 3.
-
Zhang, X., Niu, M., Li, T., Wu, Y., Gao, J., Yi, M., & Wu, K. (2023). S100A8/A9 as a risk factor for breast cancer negatively regulated by DACH1. Biomarker research, 11(1), 106. https://doi.org/0.1186/s40364-023-00548-8
KÜMELEME VE GÖZETİMLİ ÖĞRENME YÖNTEMLERİ KULLANILARAK MEME KANSERİ HÜCRELERİNDE NİKOTİNLE TETİKLENEN GEN EKSPRESYON DEĞİŞİKLİKLERİNİN ÇOK KATMANLI ANALİZİ
Yıl 2025,
Cilt: 28 Sayı: 3, 1558 - 1573, 03.09.2025
Taybe Alabed
,
Sema Servi
Öz
Nikotin, yalnızca bağımlılık yapıcı etkisiyle değil, aynı zamanda kanser hücrelerinin genetik yapısını değiştirme potansiyeliyle de bilinmektedir. Bu çalışma, HCC38 meme kanseri hücrelerinde kronik nikotin maruziyetinin yol açtığı genetik değişiklikleri çok katmanlı hesaplamalı analizle incelemiştir. K-means kümeleme ve yedi denetimli makine öğrenimi algoritması kullanılarak, değişen genler üç kümeye ayrılmış ve bu kümeler sınıf etiketi olarak sınıflandırma modellerinde kullanılmıştır. Lojistik Regresyon algoritması %98,76 doğruluk ve 0,9869 F1 skoru ile en yüksek performansı göstermiştir. Friedman testiyle sınıflandırıcılar arasındaki farkların anlamlılığı değerlendirilmiş, çok sınıflı ROC eğrileri ile sınıfların ayırt edici gücü gösterilmiştir. Bulgular, nikotin maruziyetinin genetik düzeyde yeniden yapılanma ve inflamasyonla ilişkili gen yollarının aktivasyonuna yol açtığını ortaya koymuştur. Sonuçlar, makine öğrenimi yöntemlerinin biyolojik olarak anlamlı gen ekspresyon desenlerini ortaya çıkarmadaki rolünü vurgulamaktadır.
Kaynakça
-
Ali, F., Kwak, K.-S., & Kim, Y.-G. (2016). Opinion mining based on fuzzy domain ontology and Support Vector Machine: A proposal to automate online review classification. Applied Soft Computing, 47, 235-250. https://doi.org/10.1016/j.asoc.2016.06.003
-
Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3), 175-185.
-
Amasyali, M. F., & Ersoy, O. (2008). The performance factors of clustering ensembles. 2008 IEEE 16th Signal Processing, Communication and Applications Conference,
-
Arlot, S., & Celisse, A. (2010). A survey of cross-validation procedures for model selection. https://doi.org/10.1214/09-SS054
-
Arslan, R. U., Yapıcı, İ. Ş., & Erkaymaz, O. (2024). Diyabet risk durumunun belirlenmesinde siniflandirma algoritmalarinin performanslarinin kapsamli bir şekilde karşilaştirilmasi. Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi, 27(4), 1320-1333. https://doi.org/10.17780/ksujes.1465177
-
Aydın, C. (2018). Makine öğrenmesi algoritmaları kullanılarak itfaiye istasyonu ihtiyacının sınıflandırılması. Avrupa Bilim ve Teknoloji Dergisi(14), 169-175. https://doi.org/10.31590/ejosat.458613
-
Başer, B. Ö., Yangın, M., & Sarıdaş, E. S. (2021). Makine öğrenmesi teknikleriyle diyabet hastalığının sınıflandırılması. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 25(1), 112-120. https://doi.org/10.19113/sdufenbed.842460
-
Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
-
Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining,
-
Coşkun, C., & Kuncan, F. (2022). Evaluation of performance of classification algorithms in prediction of heart failure disease. Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi, 25(4), 622-632. https://doi.org/10.17780/ksujes.1144570
-
ÇINAR, A. (2019). Veri madenciliğinde sınıflandırma algoritmalarının performans değerlendirmesi ve R dili ile bir uygulama. Öneri, 14(51), 90. https://doi.org/10.14783/maruoneri.vi.522168
-
Dasgupta, P., Rizwani, W., Pillai, S., Kinkade, R., Kovacs, M., Rastogi, S., Banerjee, S., Carless, M., Kim, E., & Coppola, D. (2009). Nicotine induces cell proliferation, invasion and epithelial‐mesenchymal transition in a variety of human cancer cell lines. International Journal of Cancer, 124(1), 36-45. https://doi.org/10.1002/ijc.23894
-
Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine learning research, 7(Jan), 1-30.
-
Dinçer, E. (2006). Veri madenciliğinde K-means algoritması ve tıp alanında uygulanması. Kocaeli Üniversitesi, Fen Bilimleri Enstitüsü, Yüksek Lisans Tezi.
-
Evans, R. S., Lloyd, J. F., Stoddard, G. J., Nebeker, J. R., & Samore, M. H. (2005). Risk factors for adverse drug events: a 10-year analysis. Annals of Pharmacotherapy, 39(7-8), 1161-1168. https://doi.org/10.1345/aph.1E642
-
García, S., Fernández, A., Luengo, J., & Herrera, F. (2010). Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Information sciences, 180(10), 2044-2064. https://doi.org/10.1016/j.ins.2009.12.010
-
Guha, P., Bandyopadhyaya, G., Polumuri, S. K., Chumsri, S., Gade, P., Kalvakolanu, D. V., & Ahmed, H. (2014). Nicotine promotes apoptosis resistance of breast cancer cells and enrichment of side population cells with cancer stem cell-like properties via a signaling cascade involving galectin-3, α9 nicotinic acetylcholine receptor and STAT3. Breast cancer research and treatment, 145(1), 5-22. https://doi.org/10.1007/s10549-014-2912-z
-
Gulsoy, N., & Kulluk, S. (2019). A data mining application in credit scoring processes of small and medium enterprises commercial corporate customers. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(3), e1299. https://doi.org/10.1002/widm.1299
-
Han, J., Pei, J., & Tong, H. (2022). Data mining: concepts and techniques. Morgan kaufmann.
-
Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression. John Wiley & Sons.
-
Islam, M. J., Wu, Q. J., Ahmadi, M., & Sid-Ahmed, M. A. (2007). Investigating the performance of naive-bayes classifiers and k-nearest neighbor classifiers. 2007 international conference on convergence information technology (ICCIT 2007),
-
Ivaylov, I. (2021). EEG classification for BCI using genetic algorithm and k-fold cross validation. Електротехника и електроника, 56(3-4), 38-45.
-
Kaplan, A., Kavadar, E., & Altuncu, M. A. (2025). Radyomik özellikler ve makine öğrenmesi teknikleriyle meme tümörlerinin sınıflandırılması. Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi, 28(1), 38-50. https://doi.org/10.17780/ksujes.1512476
-
Khan, R., Qian, Y., & Naeem, S. (2019). Extractive based text summarization using k-means and tf-idf. International Journal of Information Engineering and Electronic Business, 13(3), 33. https://doi.org/10.5815/ijieeb.2019.03.05
-
Kumari, K., Das, B., Adhya, A., Chaudhary, S., Senapati, S., & Mishra, S. K. (2018). Nicotine associated breast cancer in smokers is mediated through high level of EZH2 expression which can be reversed by methyltransferase inhibitor DZNepA. Cell death & disease, 9(2), 152. https://doi.org/10.1038/s41419-017-0224-z
-
Maimon, O. Z., & Rokach, L. (2014). Data mining with decision trees: theory and applications (Vol. 81). World scientific.
-
Menard, S. (2001). Applied logistic regression analysis. SAGE publications.
-
Miao, Y., & Xu, Y. (2024). Random Forest-Based Analysis of Variability in Feature Impacts. 2024 IEEE 2nd International Conference on Image Processing and Computer Applications (ICIPCA),
-
Mitchell, R., & Frank, E. (2017). Accelerating the XGBoost algorithm using GPU computing. PeerJ Computer Science, 3, e127. https://doi.org/10.7717/peerj-cs.127
-
Mugisha, S., Labhsetwar, S., Dave, D., Klemke, R., & Desgrosellier, J. S. (2025). A dataset of chronic nicotine-induced genes in breast cancer cells. Data in Brief, 60, 111573. https://doi.org/10.1016/j.dib.2025.111573
-
Pierce, J. P., Patterson, R. E., Senger, C. M., Flatt, S. W., Caan, B. J., Natarajan, L., Nechuta, S. J., Poole, E. M., Shu, X.-O., & Chen, W. Y. (2014). Lifetime cigarette smoking and breast cancer prognosis in the After Breast Cancer
Pooling Project. Journal of the National Cancer Institute, 106(1), djt359. https://doi.org/10.1093/jnci/djt359
Raschka, S. (2018). Model evaluation, model selection, and algorithm selection in machine learning. arXiv preprint arXiv:1811.12808. https://doi.org/10.48550/arXiv.1811.12808
-
Ray, S. (2019). A quick review of machine learning algorithms. 2019 International conference on machine learning, big data, cloud and parallel computing (COMITCon),
-
Rish, I. (2001). An empirical study of the naive Bayes classifier. IJCAI 2001 workshop on empirical methods in artificial intelligence,
-
Saquib, N., Stefanick, M. L., Natarajan, L., & Pierce, J. P. (2013). Mortality risk in former smokers with breast cancer: Pack‐years vs. smoking status. International Journal of Cancer, 133(10), 2493-2497. https://doi.org/10.1002/ijc.28241
-
Singh, A., Thakur, N., & Sharma, A. (2016). A review of supervised machine learning algorithms. 2016 3rd international conference on computing for sustainable global development (INDIACom),
-
Tyagi, A., Sharma, S., Wu, K., Wu, S.-Y., Xing, F., Liu, Y., Zhao, D., Deshpande, R. P., D’Agostino Jr, R. B., & Watabe, K. (2021). Retracted article: Nicotine promotes breast cancer metastasis by stimulating N2 neutrophils and generating pre-metastatic niche in lung. Nature communications, 12(1), 474. https://doi.org/10.1038/s41467-025-59975-w
-
Zhang, H. (2004). The optimality of naive Bayes. Aa, 1(2), 3.
-
Zhang, X., Niu, M., Li, T., Wu, Y., Gao, J., Yi, M., & Wu, K. (2023). S100A8/A9 as a risk factor for breast cancer negatively regulated by DACH1. Biomarker research, 11(1), 106. https://doi.org/0.1186/s40364-023-00548-8