Word Boundary Detection of Voice Signal Using Recurrent Fuzzy Associative Memory

순환 퍼지연상기억장치를 이용한 음성경계 추출

  • Published : 2004.09.01

Abstract

We describe word boundary detection that extracts the boundary between speech and non-speech. The proposed method uses two features. One is the normalized root mean square of speech signal, which is insensitive to white noises and represents temporal information. The other is the normalized met-frequency band energy of voice signal, which is frequency information of the signal. Our method detects word boundaries using a recurrent fuzzy associative memory(RFAM) that extends FAM by adding recurrent nodes. Hebbian learning method is employed to establish the degree of association between an input and output. An error back-propagation algorithm is used for teaming the weights between the consequent layer and the recurrent layer. To confirm the effectiveness, we applied the suggested system to voice data obtained from KAIST.

본 논문에서는 음성인식의 전처리 단계로서 음성 영역과 비음성 영역 사이의 경계를 검출하는 음성경계 추출에 대하여 기술한다. 본 논문에서는 음성경계 추출을 위해 두 가지의 특징벡터를 사용한다. 첫 번째는 백색잡음(white noise)에 강건한 시간 영역의 정보인 정규화된 RMS이고, 두 번째는 주파수 영역의 정보인 정규화된 멜주파수 대역 최대 에너지(met-frequency band maximum energy)이다. 본 논문에서 사용하는 음성경계 추출 알고리즘은 학습을 통해 규칙을 생성하고 음성의 시간 정보를 적용하기 위해 순환노드를 추가한 순환 퍼지연상기억장치이다. 퍼지부의 가중치 학습은 헤비안 학습 방법을 사용하고, 순환부의 가중치 학습을 위해서는 오류 역전파(error back-propagation) 알고리즘을 사용한다. 실험에서는 KAIST에서 제공한 연령과 성별로 구분된 음성 자료를 사용하였다.

Keywords

References

  1. Fabien Gouyon, Francois Pachet, Olivier Delerue, 'On The Use of Zero-Crossing Rate for an Application of Classification of Percussive Sounds,' Conference on Digital Audio Effects, pp. 1-6, 2000
  2. Ramana Rao G.V., Srichand J., 'Word Boundary Detection Using Pitch Variations,' Fourth International Conference on Spoken Language Processing, pp. 813-816, 1996 https://doi.org/10.1109/ICSLP.1996.607725
  3. Gin-Der Wu, Chin-Teng Lin, 'Word Boundary Detection with Mel-Scale Frequency Bank in Noisy Environment,' IEEE Speech and Audio Processing, Vol. 8, No.5, pp. 541-554, 2000 https://doi.org/10.1109/89.861373
  4. Sirko Molau, Michael Pitz, Ralf Schliiter, Hermann Ney, 'Computing Mel-Frequency Cepstral Coefficients on The Spectrum,' IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 73-76, 2001 https://doi.org/10.1109/ICASSP.2001.940770
  5. Alain Biem, Shigeru Katagiri, Biing-Hwang Juang, 'An Application of Discriminative Feature Extraction of Filter-Bank-Based Speech Recognition,' IEEE Transaction on Speech and Audio Processing, pp. 96-110, 2001 https://doi.org/10.1109/89.902277
  6. Mark Marzinzik, Birger Kollmeier, 'Speech Pause Detection for Noise Spectrum Estimation by Tracking Power Envelope Dynamics,' IEEE Speech and Audio Processing, pp. 109-118, 2002 https://doi.org/10.1109/89.985548
  7. 석종원, 배건성, '웨이블렛 변환을 이용한 음성신호의 끝점 검출', 한국음향학회지, 18권, 6호, pp. 57-64, 1999
  8. D. O. Hebb, 'The Organization of Behavior,' John Wiley & Sons, New York, 1949
  9. F. Beritelli, 'Robust word boundary detection using fuzzy logic,' Electronics Letters, Vol. 36, No.9, pp, 846-848, 2000 https://doi.org/10.1049/el:20000601
  10. Tong Zhao, Peng-Yung Woo, 'Fuzzy Speech Recognition,' International Joint Conference on Neural Networks, pp. 2959-2961, 1999 https://doi.org/10.1109/IJCNN.1999.835990
  11. Vittorio Gorrini, Hugues Bersini, 'Recurrent Fuzzy Systems,' IEEE World Congress on Computational Intelligence, pp. 193-198, 1994
  12. Gin-Der Wu, Chin-Teng Lin, 'A Recurrent Neural Fuzzy Network for Word Boundary Detection in Variable Noise-Level Environments,' IEEE Systems, Man and Cybernetics, Vol. 31, No. 1, pp. 84-97, 2001 https://doi.org/10.1109/3477.907566
  13. Doroteo Torre Toledano, 'Neural Network Boundary Refining for Automatic Speech Segmentation,' IEEE International Conference on Acoustics, Speech, and Signal Processing, pp, 3438-3441, 2000 https://doi.org/10.1109/ICASSP.2000.860140
  14. 배명진, 이상효, '디지털 음성분석', 동영출판사, 1998
  15. 장대식, '퍼지연상기억장치에 기반한 퍼지 추론 시스템', 숭실대학교 석사청구논문, 1995
  16. Martin T. Hagan, Howard B. Demuth, 'Neural Network Design,' PWS Publishing Company, 1995