DOI QR코드

DOI QR Code

Coding History Detection of Speech Signal using Deep Neural Network

심층 신경망을 이용한 음성 신호의 부호화 이력 검출

  • Cho, Hyo-Jin (Dept. of Electronics Engineering, Kwangwoon University) ;
  • Jang, Won (Dept. of Electronics Engineering, Kwangwoon University) ;
  • Shin, Seong-Hyeon (Dept. of Electronics Engineering, Kwangwoon University) ;
  • Park, Hochong (Dept. of Electronics Engineering, Kwangwoon University)
  • 조효진 (광운대학교 전자공학과) ;
  • 장원 (광운대학교 전자공학과) ;
  • 신성현 (광운대학교 전자공학과) ;
  • 박호종 (광운대학교 전자공학과)
  • Received : 2017.09.08
  • Accepted : 2017.10.31
  • Published : 2018.01.30

Abstract

In this paper, we propose a method for coding history detection of digital speech signal. In digital speech communication and storage, the signal is encoded to reduce the number of bits. Therefore, when a speech signal waveform is given, we need to detect its coding history so that we can determine whether the signal is an original or an coded one, and if coded, determine the number of times of coding. In this paper, we propose a coding history detection method for 12.2kbps AMR codec in terms of original, single coding, and double coding. The proposed method extracts a speech-specific feature vector from the given speech, and models the feature vector using a deep neural network. We confirm that the proposed feature vector provides better performance in coding history detection than the feature vector computed from the general spectrogram.

본 논문에서는 디지털 음성 신호의 부호화 이력을 검출하는 방법을 제안한다. 음성 신호를 디지털 방식으로 전송 또는 저장할 때 데이터양을 줄이기 위해 부호화한다. 따라서 음성 신호 파형이 주어질 때, 해당 신호가 원본인지 부호화된 신호인지 판단하고, 만일 부호화 되었다면 부호화 횟수를 검출하는 부호화 이력 검출 과정이 필요하다. 본 논문에서는 12.2kbps 비트율의 AMR 부호화기에 대하여 원본, 단일 부호화, 이중 부호화 여부를 판단하는 부호화 이력 검출 방법을 제안한다. 제안한 방법은 입력 음성 신호에서 음성 고유의 특성 벡터를 추출하고, 해당 특성 벡터를 심층 신경망으로 모델링 하는 방법을 사용한다. 본 논문에서 제안하는 특성 벡터가 일반적인 스펙트로그램으로부터 추출한 특성 벡터보다 우수한 부호화 이력 검출 성능을 제공하는 것을 확인하였다.

Keywords

References

  1. B. D'Alessandro and Y. Q. Shi, "MP3 bit rate quality detection through frequency spectrum analysis," Proc. 11th ACM Workshop on Multimedia and Security, pp. 57-61, 2009.
  2. T. Bianchi, A. De Rosa, M. Fontani, G. Rocciolo and A. Piva, "Detection and classification of double compressed MP3 audio tracks," Proc. 1st ACM Workshop on Information Hiding and Multimedia Security, pp. 159-164, 2013.
  3. D. Luo, W. Luo, R. Yang and J. Huang, "Identifying compression history of wave audio and its applications," ACM Trans. on Multimedia Computing, Communications, and Applications, vol. 10, no. 3, pp. 30:1-30:19, 2014.
  4. D. Seichter, L. Cuccovillo and P. Aichroth, "AAC encoding detection and bitrate estimation using a convolutional neural network," Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, pp. 2069-2073, 2016.
  5. D. Luo, R. Yang, B. Li and J. Huang, "Detection of Double Compressed AMR Audio Using Stacked Autoencoder," IEEE Trans. on Information Forensics and Security, vol. 12, no. 2, pp. 432-444, 2017. https://doi.org/10.1109/TIFS.2016.2622012
  6. Y. LeCun, Y. Bengio and G. Hinton, "Deep learning," Nature, 521.7553: 436-444, 2015. https://doi.org/10.1038/nature14539
  7. K. L. Priddy and P. E. Keller, Artificial neural networks: an introduction, SPIE Press, 2005.
  8. S. Ioffe and C. Szegedy, "Batch normalization: accelerating deep network training by reducing internal covariate shift," Int. Conf. on Machine Learning(ICML), pp. 448-456, 2015.
  9. H.-W. Yun, S.-H. Shin, W.-J. Jang and H. Park, "On-line audio genre classification using spectrogram and deep neural network," J. of Broadcast Engineering, vol. 21, no. 6, pp. 977-985, Nov. 2016. https://doi.org/10.5909/JBE.2016.21.6.977