Browse > Article
http://dx.doi.org/10.4218/etrij.14.2214.0030

Weighted Finite State Transducer-Based Endpoint Detection Using Probabilistic Decision Logic  

Chung, Hoon (SW.Content Research Laboratory, ETRI)
Lee, Sung Joo (SW.Content Research Laboratory, ETRI)
Lee, Yun Keun (SW.Content Research Laboratory, ETRI)
Publication Information
ETRI Journal / v.36, no.5, 2014 , pp. 714-720 More about this Journal
Abstract
In this paper, we propose the use of data-driven probabilistic utterance-level decision logic to improve Weighted Finite State Transducer (WFST)-based endpoint detection. In general, endpoint detection is dealt with using two cascaded decision processes. The first process is frame-level speech/non-speech classification based on statistical hypothesis testing, and the second process is a heuristic-knowledge-based utterance-level speech boundary decision. To handle these two processes within a unified framework, we propose a WFST-based approach. However, a WFST-based approach has the same limitations as conventional approaches in that the utterance-level decision is based on heuristic knowledge and the decision parameters are tuned sequentially. Therefore, to obtain decision knowledge from a speech corpus and optimize the parameters at the same time, we propose the use of data-driven probabilistic utterance-level decision logic. The proposed method reduces the average detection failure rate by about 14% for various noisy-speech corpora collected for an endpoint detection evaluation.
Keywords
Endpoint detection; speech recognition; Weighted Finite State Transducer;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 T. Hughes and K. Mierle, "Recurrent Neural Networks for Voice Activity Detection," IEEE Int. Conf. Acoust., Speech, Signal Process., Vancouver, Canada, May 26-31, 2013, pp. 7378-7382.
2 T. Fukuda, O. Ichikawa, and M. Nishimura, "Long-Term Spectro-Temporal and Static Harmonic Features for Voice Activity Detection," IEEE J. Sel. Topics Signal Process., vol. 4, no. 5, Oct. 2010, pp. 834-844.   DOI   ScienceOn
3 S.J. Lee et al., "Intra- and Inter-frame Features for Automatic Speech Recognition," ETRI J., vol. 36, no. 3, June 2014, pp. 514-517.   DOI   ScienceOn
4 M. Fujimoto, K. Ishizuka, and T. Nakatani, "A Voice Activity Detection Based on the Adaptive Integration of Multiple Speech Features and a Signal Decision Scheme," IEEE Int. Conf. Acoust., Speech, Signal Process., Las Vegas, NV, USA, Mar. 31-Apr. 4, 2008, pp. 4441-4444.
5 J. Sohn, N.S. Kim, and W. Sung, "A Statistical Model-Based Voice Activity Detection," IEEE Signal Process. Lett., vol. 6, no. 1, Jan. 1999, pp. 1-3.
6 J. Ramirez et al., "Statistical Voice Activity Detection Using a Multiple Observation Likelihood Ratio Test," IEEE Signal Process. Lett., vol. 12, no. 10, Oct. 2005, pp. 689-692.   DOI   ScienceOn
7 Q.H. Joe et al., "Statistical Model-Based Voice Activity Detection Using Support Vector Machine," IET Signal Process., vol. 3, no. 3, May 2009, pp. 205-210.   DOI   ScienceOn
8 D. Enqing et al., "Applying Support Vector Machines to Voice Activity Detection," IEEE Int. Conf. Signal Process., Beijing, China, vol. 2, Aug. 26-30, 2002, pp. 1124-1127.
9 C.Y. Park et al., "Integration of Sporadic Noise Model in POMDP-Based Voice Activity Detection," IEEE Int. Conf. Acoust., Speech, Signal Process., Dallas, TX, USA, Mar. 14-19, 2010, pp. 4486-4489.
10 H. Chung, S.J. Lee, and Y.K. Lee, "Endpoint Detection Using Weighted Finite State Transducer," Proc. INTERSPEECH, Lyon, France, Sept. 25-29, 2013, pp. 700-703.
11 M. Mohri, F. Pereira, and M. Riley, "Weighted Automata in Text and Speech Processing," European Conf. AI. Intell., Budapest, Hungary, Aug. 13, 1996, pp. 228-231.
12 C. Allauzen et al., "A General and Efficient Weighted Finite-State Transducer Library," Proc. CIAA, Prague, Czech Republic, July 16-18, 2007, pp. 11-23.