An Improvement of the MLP Based Speaker Verification System through Improving the learning Speed and Reducing the Learning Data

학습속도 개선과 학습데이터 축소를 통한 MLP 기반 화자증명 시스템의 등록속도 향상방법

  • 이백영 (한국항공대학교 항공전자공학과) ;
  • 이태승 (한국항공대학교 항공전자공학과) ;
  • 황병원 (한국항공대학교 항공전자공학과)
  • Published : 2002.05.01

Abstract

The multilayer perceptron (MLP) has several advantages against other pattern recognition methods, and is expected to be used as the learning and recognizing speakers of speaker verification system. But because of the low learning speed of the error backpropagation (EBP) algorithm that is used for the MLP learning, the MLP learning requires considerable time. Because the speaker verification system must provide verification services just after a speaker's enrollment, it is required to solve the problem. So, this paper tries to make short of time required to enroll speakers with the MLP based speaker verification system, using the method of improving the EBP learning speed and the method of reducing background speakers which adopts the cohort speakers method from the existing speaker verification.

MLP(multilayer perceptron)는 다른 패턴인식 방법에 비해 몇 가지 유리한 이점을 지니고 있어 화자증명 시스템의 화자학습 및 인식 방법으로서 사용이 기대된다. 그러나 MLP의 학습은 학습에 이용되는 EBP(error backpropagation) 알고리즘의 저속 때문에 상당한 시간을 소요한다. 이 점은 화자증명 시스템에서 높은 화자인식률을 달성하기 위해서는 많은 배경화자가 필요하다는 점과 맞물려 시스템에 화자를 등록하기 위해 많은 시간이 걸린다는 문제를 낳는다. 화자증명 시스템은 화자 등록후 곧바로 증명 서비스를 제공해야 하기 때문에 이 문제를 해결해야 한다. 본 논문에서는 이 문제를 해결하기 위해 EBP의 학습속도를 개선하는 방법과, 기존의 화자증명 방법에서 화자군집 방법을 도입한 배경화자 축소방법을 사용하여 MLP 기반 화자증명 시스템에서 화자등록에 필요한 시간의 단축을 시도한다.

Keywords

References

  1. Q. Li et al., 'Recent Advancements in Automatic Speaker Authentication,' IEEE Robotics & Automation Magazine, Vol. 6, pp. 24-34, Mar 1999 https://doi.org/10.1109/100.755812
  2. S. Furui, 'An Overview of Speaker Recognition Technology,' Automatic Speech and Speaker Recognition, Kluwer Academic Publishers, 1996
  3. N. Morgan and H. Bourlard, 'Hybrid Connectionist Models for Continuous Speech Recognition,' Automatic Speech and Speaker Recognition, Kluwer Academic Publishers, 1996
  4. S. Haykin, Neural Networks, Prentice Hall, 1999
  5. Y. Bennani and P. Gallinari, 'A Modular Connectionist Architecture for Text-Independent Talker Identification,' International Joint Conference on Neural Networks, Vol. 2, pp. 857-860, Seattle, USA, 1991 https://doi.org/10.1109/IJCNN.1991.155446
  6. N. Fakotakis and J. Sirigos, 'A High Performance Text Independent Speaker Recognition System Based on Vowel Spotting and Neural Nets,' IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, pp. 661-664, Atlanta, USA, 1996 https://doi.org/10.1109/ICASSP.1996.543207
  7. A. E. Rosenberg, and S. Parthasarathy, 'Speaker Background Models for Connected Digit Password Speaker Verification,' IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, pp. 81-84, Atlanta, USA, 1996 https://doi.org/10.1109/ICASSP.1996.540295
  8. H. Demuth, M. Beale, Neural Network Toolbox, The MathWorks, 2001
  9. M. Riedmiller, and H. Braun, 'A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm,' IEEE International Conference on Neural Networks, pp. 586-591, Vol. 1, San Francisco, USA, 1993 https://doi.org/10.1109/ICNN.1993.298623
  10. R. Fletcher, Practical Methods of Optimization, Wiley, 1987
  11. M. Moller, 'Supervised Learning on Large Redundant Training Sets,' Proceedings of the 1992 IEEE-SP Workshop Neural Networks for Signal Processing, pp. 79-89, Helsingoer, Denmark, 1992 https://doi.org/10.1109/NNSP.1992.253705
  12. S. Becker and Y. LeCun, 'Improving the Convergence of Back-Propagation Learning with Second-Order Methods,' Proceedings of the 1988 Connectionist Models Summer School, pp. 29-37, 1988
  13. Y. Bengio, Neural Networks for Speech and Sequence Recognition, International Thomson Computer Press, 1995
  14. Y. LeCun, 'Generalization and Network Design Strategies,' Technical Report CRG-TR-89-4, Department of Computer Science, University of Toronto, 1989
  15. T. Matsui and S. Furui, 'Likelihood Normalization for Speaker Verification Using a Phoneme-and Speaker-Independent Model,' Speech Communication, Vol. 17, pp. 109-116, Aug 1995 https://doi.org/10.1016/0167-6393(95)00011-C
  16. A. L. Higgins et al., 'Speaker Verification Using Randomized Phrase Prompting,' Digital Signal Processing,' Digital Signal Processing, Vol. 1, pp. 89-106, 1991 https://doi.org/10.1016/1051-2004(91)90098-6
  17. H. Gish, 'A Probabilistic Approach to the Understanding and Training of Neural Network Classifiers,' IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 3, pp. 1361-1364, Albuquerque, USA, 1990 https://doi.org/10.1109/ICASSP.1990.115636
  18. D. R. Wilson and T. R. Martinez, 'The Need for Small Learning Rates on Large Problems,' International Joint Conference on Neural Networks, Vol. 1, pp. 115-119, Washington, USA, 2001 https://doi.org/10.1109/IJCNN.2001.939002
  19. C. Becchetti, L. P. Ricotti, Speech Recognition, John Wiley & Sons, 1999
  20. P. Cristea and Z. Valsan, 'New Cepstrum Frequency Scale for Neural Network Speaker Verification,' IEEE International Conference on Electronics, Circuits and Systems, Vol. 3, pp. 1573-1576, Pafos, Cyprus, 1999 https://doi.org/10.1109/ICECS.1999.814472
  21. M. Savic and J. Sorensen, 'Phoneme Based Speaker Verification,'IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, pp. 165-168, San Francisco, USA, 1992 https://doi.org/10.1109/ICASSP.1992.226094
  22. R. P. Lippmann, 'An Introduction to Computing with Neural Nets,'IEEE Acoustics, Speech, and Signal Processing Magazine, Vol. 4, pp. 4-22, Apr 1987
  23. D. P. Delacretaz and J. Hennebert, 'Text-Prompted Speaker Verification Experiments with Phoneme Specific MLPs,' IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, pp. 777-780, Seattle, USA, 1998 https://doi.org/10.1109/ICASSP.1998.675380