DOI QR코드

DOI QR Code

Weighted Finite State Transducer-Based Endpoint Detection Using Probabilistic Decision Logic

  • Received : 2014.01.28
  • Accepted : 2014.06.24
  • Published : 2014.10.01

Abstract

In this paper, we propose the use of data-driven probabilistic utterance-level decision logic to improve Weighted Finite State Transducer (WFST)-based endpoint detection. In general, endpoint detection is dealt with using two cascaded decision processes. The first process is frame-level speech/non-speech classification based on statistical hypothesis testing, and the second process is a heuristic-knowledge-based utterance-level speech boundary decision. To handle these two processes within a unified framework, we propose a WFST-based approach. However, a WFST-based approach has the same limitations as conventional approaches in that the utterance-level decision is based on heuristic knowledge and the decision parameters are tuned sequentially. Therefore, to obtain decision knowledge from a speech corpus and optimize the parameters at the same time, we propose the use of data-driven probabilistic utterance-level decision logic. The proposed method reduces the average detection failure rate by about 14% for various noisy-speech corpora collected for an endpoint detection evaluation.

Keywords

References

  1. T. Fukuda, O. Ichikawa, and M. Nishimura, "Long-Term Spectro-Temporal and Static Harmonic Features for Voice Activity Detection," IEEE J. Sel. Topics Signal Process., vol. 4, no. 5, Oct. 2010, pp. 834-844. https://doi.org/10.1109/JSTSP.2010.2069750
  2. S.J. Lee et al., "Intra- and Inter-frame Features for Automatic Speech Recognition," ETRI J., vol. 36, no. 3, June 2014, pp. 514-517. https://doi.org/10.4218/etrij.14.0213.0181
  3. M. Fujimoto, K. Ishizuka, and T. Nakatani, "A Voice Activity Detection Based on the Adaptive Integration of Multiple Speech Features and a Signal Decision Scheme," IEEE Int. Conf. Acoust., Speech, Signal Process., Las Vegas, NV, USA, Mar. 31-Apr. 4, 2008, pp. 4441-4444.
  4. J. Sohn, N.S. Kim, and W. Sung, "A Statistical Model-Based Voice Activity Detection," IEEE Signal Process. Lett., vol. 6, no. 1, Jan. 1999, pp. 1-3.
  5. J. Ramirez et al., "Statistical Voice Activity Detection Using a Multiple Observation Likelihood Ratio Test," IEEE Signal Process. Lett., vol. 12, no. 10, Oct. 2005, pp. 689-692. https://doi.org/10.1109/LSP.2005.855551
  6. T. Hughes and K. Mierle, "Recurrent Neural Networks for Voice Activity Detection," IEEE Int. Conf. Acoust., Speech, Signal Process., Vancouver, Canada, May 26-31, 2013, pp. 7378-7382.
  7. Q.H. Joe et al., "Statistical Model-Based Voice Activity Detection Using Support Vector Machine," IET Signal Process., vol. 3, no. 3, May 2009, pp. 205-210. https://doi.org/10.1049/iet-spr.2008.0128
  8. D. Enqing et al., "Applying Support Vector Machines to Voice Activity Detection," IEEE Int. Conf. Signal Process., Beijing, China, vol. 2, Aug. 26-30, 2002, pp. 1124-1127.
  9. C.Y. Park et al., "Integration of Sporadic Noise Model in POMDP-Based Voice Activity Detection," IEEE Int. Conf. Acoust., Speech, Signal Process., Dallas, TX, USA, Mar. 14-19, 2010, pp. 4486-4489.
  10. H. Chung, S.J. Lee, and Y.K. Lee, "Endpoint Detection Using Weighted Finite State Transducer," Proc. INTERSPEECH, Lyon, France, Sept. 25-29, 2013, pp. 700-703.
  11. M. Mohri, F. Pereira, and M. Riley, "Weighted Automata in Text and Speech Processing," European Conf. AI. Intell., Budapest, Hungary, Aug. 13, 1996, pp. 228-231.
  12. C. Allauzen et al., "A General and Efficient Weighted Finite-State Transducer Library," Proc. CIAA, Prague, Czech Republic, July 16-18, 2007, pp. 11-23.