Research Article
BibTex RIS Cite

KOKLEAGRAM ÖZELLİKLERİ İLE DERİN ÖĞRENME TABANLI SES BİRLEŞTİRME SAHTECİLİĞİ TESPİTİ

Year 2024, , 1477 - 1489, 03.12.2024
https://doi.org/10.17780/ksujes.1508050

Abstract

Günümüzde ses kayıtları üzerinde yapılan oynamalardan Ses birleştirme (Audio Splicing) sahteciliği veri bütünlüğünü ihlal eden, etkili, gerçekleştirmesi kolay ve oldukça yaygın olarak gerçekleştirilen bir sahteciliktir. İki farklı ses kaydının birleştirilmesiyle gerçekleştirilen bu sahteciliğin, saldırganlar tarafından sahtecilik izlerini gizlemek için uygulanan son işlem operasyonları ile tespitini oldukça zordur. Bu amaçla ses birleştirme sahteciliğini tespit etmek için kokleagram görüntülerini kullanan CNN tabanlı yeni bir yöntem önerilmiştir. Önerilen CNN mimarisine giriş olarak sesin kokleagram görüntüsü verilmektedir. Kokleagram görüntüleriyle eğitilen mimari, şüpheli bir test dosyası verildiğinde, ses dosyasını sahte/orijinal olarak etiketlemektedir. Ayrıca, literatürde genel bir veri tabanı bulunmadığından, bu çalışmada önerilen yöntemin performansını test etmek için TIMIT veri tabanı kullanılarak 2 sn ve 3 sn’lik iki ayrı ses birleştirme sahteciliği veri tabanı SET2 ve SET3 oluşturulmuştur. Önerilen yöntemle SET2 veri seti üzerinde 0.95 Doğruluk, 0.97 Kesinlik, 0.93 Duyarlılık ve 0.95 F1-skor, SET3 veri setinde 0.98 Doğruluk, 0.98 Kesinlik, 0.97 Duyarlılık ve 0.97 F1-skor değerleri alınmıştır. Ayrıca önerilen yöntem, NOIZEUS-4 veri seti üzerinde de test edilmiş ve oldukça yüksek sonuçlar elde edilmiştir. Elde edilen sonuçlar önerilen yöntemin gürültüye karşı dayanıklı ve ses birleştirme sahteciliği tespitini literatürdeki diğer çalışmalara göre oldukça etkin bir şekilde gerçekleştirdiğini göstermektedir.

References

  • Chuchra, A., Kaur, M., & Gupta, S. (2022, July). A deep learning approach for splicing detection in digital audios. In Congress on Intelligent Systems: Proceedings of CIS 2021, Volume 1 (pp. 543-558). Singapore: Springer Nature Singapore.
  • Cooper, A. J. (2010, June). Detecting butt-spliced edits in forensic digital audio recordings. In Audio Engineering Society Conference: 39th International Conference: Audio Forensics: Practices and Challenges. Audio Engineering Society.
  • Cuccovillo, L., Mann, S., Tagliasacchi, M., & Aichroth, P. (2013, September). Audio tampering detection via microphone classification. In 2013 IEEE 15th International Workshop on Multimedia Signal Processing (MMSP) (pp. 177-182). IEEE.
  • Esquef, P. A., Apolinário, J. A., & Biscainho, L. W. (2015, November). Improved edit detection in speech via ENF patterns. In 2015 IEEE International Workshop on Information Forensics and Security (WIFS) (pp. 1-6). IEEE.
  • Garofolo, J., S. (1993). TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1, [online] Available: https://catalog.ldc.upenn.edu/LDC93S1.
  • Greenwood, D. D. (1990). A cochlear frequency‐position function for several species—29 years later. The Journal of the Acoustical Society of America, 87(6), 2592-2605.
  • Hu, Y., & Loizou, P. C. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech communication, 49(7-8), 588-601.
  • Jadhav, S., Patole, R., & Rege, P. (2019, July). Audio splicing detection using convolutional neural network. In 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (pp. 1-5). IEEE.
  • Lin, X., & Kang, X. (2017a). Exposing speech tampering via spectral phase analysis. Digital Signal Processing, 60, 63-74.
  • Lin, X., & Kang, X. (2017b). Supervised audio tampering detection using an autoregressive model. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 2142-2146). IEEE.
  • Mang, L. D., Cañadas-Quesada, F. J., Carabias-Orti, J. J., Combarro, E. F., & Ranilla, J. (2023). Cochleogram-based adventitious sounds classification using convolutional neural networks. Biomedical Signal Processing and Control, 82, 104555.
  • Mang, L. D., González Martínez, F. D., Martinez Muñoz, D., García Galán, S., & Cortina, R. (2024). Classification of Adventitious Sounds Combining Cochleogram and Vision Transformers. Sensors, 24(2), 682.
  • Mao, M., Xiao, Z., Kang, X., Li, X., & Xiao, L. (2020). Electric network frequency based audio forensics using convolutional neural networks. In Advances in Digital Forensics XVI: 16th IFIP WG 11.9 International Conference, New Delhi, India, January 6–8, 2020, Revised Selected Papers 16 (pp. 253-270). Springer International Publishing.
  • Meng, X., Li, C., & Tian, L. (2018, November). Detecting audio splicing forgery algorithm based on local noise level estimation. In 2018 5th international conference on systems and informatics (ICSAI) (pp. 861-865). IEEE.
  • Pan, X., Zhang, X., & Lyu, S. (2012, March). Detecting splicing in digital audios using local noise level estimation. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1841-1844). IEEE.
  • Patterson, R. D., Robinson, K. E. N., Holdsworth, J., McKeown, D., Zhang, C., & Allerhand, M. (1992). Complex sounds and auditory images. In Auditory physiology and perception (pp. 429-446). Pergamon.
  • Russo, M., Kraljević, L., Stella, M., & Sikora, M. (2020). Cochleogram-based approach for detecting perceived emotions in music. Information Processing & Management, 57(5), 102270
  • Rouniyar, S. K., Yingjuan, Y., & Hu, Y. (2018, April). Channel response based multi-feature audio splicing forgery detection and localization. In Proceedings of the 2018 International Conference on E-Business, Information Management and Computer Science (pp. 46-53).
  • Sharan, R. V., & Moir, T. J. (2015, July). Cochleagram image feature for improved robustness in sound recognition. In 2015 IEEE international conference on digital signal processing (DSP) (pp. 441-444). IEEE. Slaney, M. (1998). Auditory toolbox. Interval Research Corporation, Tech. Rep, 10(1998), 1194.
  • Su, Z., Fang, Z., Lian, C., Zhang, G., & Li, M. (2024). Audio splicing detection and localization using multistage filterbank spectral sketches and decision fusion. Multimedia Systems, 30(2), 92.
  • Ustubioglu, B., Dincer, S., Ustubioglu, A., & Ulutas, G. (2024, July). ArCapsNet for Audio Splicing Forgery Detection. In 2024 47th International Conference on Telecommunications and Signal Processing (TSP) (pp. 298-301). IEEE.
  • Yang, R., Qu, Z., & Huang, J. (2008, September). Detecting digital audio forgeries by checking frame offsets. In Proceedings of the 10th ACM Workshop on Multimedia and Security (pp. 21-26).
  • Zeng, Z., & Wu, Z. (2022, December). Audio Splicing Localization: Can We Accurately Locate the Splicing Tampering?. In 2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) (pp. 120-124). IEEE.
  • Zhang, Z., Zhao, X., & Yi, X. (2022). Aslnet: An encoder-decoder architecture for audio splicing detection and localization. Security and Communication Networks, 2022.
  • Zhao, H., Chen, Y., Wang, R., & Malik, H. (2017). Audio splicing detection and localization using environmental signature. Multimedia Tools and Applications, 76, 13897-13927.
  • Zhao, H., Chen, Y., Wang, R., & Malik, H. (2014, June). Audio source authentication and splicing detection using acoustic environmental signature. In Proceedings of the 2nd ACM workshop on Information hiding and multimedia security (pp. 159-164).

DETECTION OF AUDIO SPLICING ON THE BASIS OF DEEP LEARNING WITH COCHLEOGRAM FEATURES

Year 2024, , 1477 - 1489, 03.12.2024
https://doi.org/10.17780/ksujes.1508050

Abstract

Audio splicing is an effective, easy-to-perform and widespread forgery that violates data integrity. This forgery, which is performed by combining two different audio recordings, is very difficult to detect with the post-processing operations applied by the attackers to hide the forgery traces. For this purpose, a new CNN-based method using cochleagram images is proposed to detect audio fusion forgery. The cochleagram image of the audio is given as input to the proposed CNN architecture. The architecture trained with the cochleagram images, given a suspicious test file, labels the audio file as forged/original. In addition, since there is no general database in the literature, two separate 2 s and 3 s audio merging forgery databases SET2 and SET3 are created using the TIMIT database to test the performance of the proposed method in this study. With the proposed method, 0.95 Accuracy, 0.97 Precision, 0.93 Sensitivity and 0.95 F1-score were obtained on SET2 dataset, while 0.98 Accuracy, 0.98 Precision, 0.97 Sensitivity and 0.97 F1-score were obtained on SET3 dataset. In addition, the proposed method was also tested on the NOIZEUS-4 dataset and very high results were obtained. The results obtained show that the proposed method is robust to noise and performs audio splicing forgery detection in a very effective way compared to other studies in the literature.

References

  • Chuchra, A., Kaur, M., & Gupta, S. (2022, July). A deep learning approach for splicing detection in digital audios. In Congress on Intelligent Systems: Proceedings of CIS 2021, Volume 1 (pp. 543-558). Singapore: Springer Nature Singapore.
  • Cooper, A. J. (2010, June). Detecting butt-spliced edits in forensic digital audio recordings. In Audio Engineering Society Conference: 39th International Conference: Audio Forensics: Practices and Challenges. Audio Engineering Society.
  • Cuccovillo, L., Mann, S., Tagliasacchi, M., & Aichroth, P. (2013, September). Audio tampering detection via microphone classification. In 2013 IEEE 15th International Workshop on Multimedia Signal Processing (MMSP) (pp. 177-182). IEEE.
  • Esquef, P. A., Apolinário, J. A., & Biscainho, L. W. (2015, November). Improved edit detection in speech via ENF patterns. In 2015 IEEE International Workshop on Information Forensics and Security (WIFS) (pp. 1-6). IEEE.
  • Garofolo, J., S. (1993). TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1, [online] Available: https://catalog.ldc.upenn.edu/LDC93S1.
  • Greenwood, D. D. (1990). A cochlear frequency‐position function for several species—29 years later. The Journal of the Acoustical Society of America, 87(6), 2592-2605.
  • Hu, Y., & Loizou, P. C. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech communication, 49(7-8), 588-601.
  • Jadhav, S., Patole, R., & Rege, P. (2019, July). Audio splicing detection using convolutional neural network. In 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (pp. 1-5). IEEE.
  • Lin, X., & Kang, X. (2017a). Exposing speech tampering via spectral phase analysis. Digital Signal Processing, 60, 63-74.
  • Lin, X., & Kang, X. (2017b). Supervised audio tampering detection using an autoregressive model. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 2142-2146). IEEE.
  • Mang, L. D., Cañadas-Quesada, F. J., Carabias-Orti, J. J., Combarro, E. F., & Ranilla, J. (2023). Cochleogram-based adventitious sounds classification using convolutional neural networks. Biomedical Signal Processing and Control, 82, 104555.
  • Mang, L. D., González Martínez, F. D., Martinez Muñoz, D., García Galán, S., & Cortina, R. (2024). Classification of Adventitious Sounds Combining Cochleogram and Vision Transformers. Sensors, 24(2), 682.
  • Mao, M., Xiao, Z., Kang, X., Li, X., & Xiao, L. (2020). Electric network frequency based audio forensics using convolutional neural networks. In Advances in Digital Forensics XVI: 16th IFIP WG 11.9 International Conference, New Delhi, India, January 6–8, 2020, Revised Selected Papers 16 (pp. 253-270). Springer International Publishing.
  • Meng, X., Li, C., & Tian, L. (2018, November). Detecting audio splicing forgery algorithm based on local noise level estimation. In 2018 5th international conference on systems and informatics (ICSAI) (pp. 861-865). IEEE.
  • Pan, X., Zhang, X., & Lyu, S. (2012, March). Detecting splicing in digital audios using local noise level estimation. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1841-1844). IEEE.
  • Patterson, R. D., Robinson, K. E. N., Holdsworth, J., McKeown, D., Zhang, C., & Allerhand, M. (1992). Complex sounds and auditory images. In Auditory physiology and perception (pp. 429-446). Pergamon.
  • Russo, M., Kraljević, L., Stella, M., & Sikora, M. (2020). Cochleogram-based approach for detecting perceived emotions in music. Information Processing & Management, 57(5), 102270
  • Rouniyar, S. K., Yingjuan, Y., & Hu, Y. (2018, April). Channel response based multi-feature audio splicing forgery detection and localization. In Proceedings of the 2018 International Conference on E-Business, Information Management and Computer Science (pp. 46-53).
  • Sharan, R. V., & Moir, T. J. (2015, July). Cochleagram image feature for improved robustness in sound recognition. In 2015 IEEE international conference on digital signal processing (DSP) (pp. 441-444). IEEE. Slaney, M. (1998). Auditory toolbox. Interval Research Corporation, Tech. Rep, 10(1998), 1194.
  • Su, Z., Fang, Z., Lian, C., Zhang, G., & Li, M. (2024). Audio splicing detection and localization using multistage filterbank spectral sketches and decision fusion. Multimedia Systems, 30(2), 92.
  • Ustubioglu, B., Dincer, S., Ustubioglu, A., & Ulutas, G. (2024, July). ArCapsNet for Audio Splicing Forgery Detection. In 2024 47th International Conference on Telecommunications and Signal Processing (TSP) (pp. 298-301). IEEE.
  • Yang, R., Qu, Z., & Huang, J. (2008, September). Detecting digital audio forgeries by checking frame offsets. In Proceedings of the 10th ACM Workshop on Multimedia and Security (pp. 21-26).
  • Zeng, Z., & Wu, Z. (2022, December). Audio Splicing Localization: Can We Accurately Locate the Splicing Tampering?. In 2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) (pp. 120-124). IEEE.
  • Zhang, Z., Zhao, X., & Yi, X. (2022). Aslnet: An encoder-decoder architecture for audio splicing detection and localization. Security and Communication Networks, 2022.
  • Zhao, H., Chen, Y., Wang, R., & Malik, H. (2017). Audio splicing detection and localization using environmental signature. Multimedia Tools and Applications, 76, 13897-13927.
  • Zhao, H., Chen, Y., Wang, R., & Malik, H. (2014, June). Audio source authentication and splicing detection using acoustic environmental signature. In Proceedings of the 2nd ACM workshop on Information hiding and multimedia security (pp. 159-164).
There are 26 citations in total.

Details

Primary Language Turkish
Subjects Computer Forensics
Journal Section Computer Engineering
Authors

Arda Üstübioğlu 0000-0002-8656-8697

Publication Date December 3, 2024
Submission Date July 1, 2024
Acceptance Date August 28, 2024
Published in Issue Year 2024

Cite

APA Üstübioğlu, A. (2024). KOKLEAGRAM ÖZELLİKLERİ İLE DERİN ÖĞRENME TABANLI SES BİRLEŞTİRME SAHTECİLİĞİ TESPİTİ. Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilimleri Dergisi, 27(4), 1477-1489. https://doi.org/10.17780/ksujes.1508050