[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.4218/etrij.14.2214.0030

Weighted Finite State Transducer-Based Endpoint Detection Using Probabilistic Decision Logic

Chung, Hoon (SW.Content Research Laboratory, ETRI)
Lee, Sung Joo (SW.Content Research Laboratory, ETRI)
Lee, Yun Keun (SW.Content Research Laboratory, ETRI)

Publication Information

ETRI Journal / v.36, no.5, 2014 , pp. 714-720 More about this Journal

Abstract

In this paper, we propose the use of data-driven probabilistic utterance-level decision logic to improve Weighted Finite State Transducer (WFST)-based endpoint detection. In general, endpoint detection is dealt with using two cascaded decision processes. The first process is frame-level speech/non-speech classification based on statistical hypothesis testing, and the second process is a heuristic-knowledge-based utterance-level speech boundary decision. To handle these two processes within a unified framework, we propose a WFST-based approach. However, a WFST-based approach has the same limitations as conventional approaches in that the utterance-level decision is based on heuristic knowledge and the decision parameters are tuned sequentially. Therefore, to obtain decision knowledge from a speech corpus and optimize the parameters at the same time, we propose the use of data-driven probabilistic utterance-level decision logic. The proposed method reduces the average detection failure rate by about 14% for various noisy-speech corpora collected for an endpoint detection evaluation.

Keywords

Endpoint detection; speech recognition; Weighted Finite State Transducer;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	T. Hughes and K. Mierle, "Recurrent Neural Networks for Voice Activity Detection," IEEE Int. Conf. Acoust., Speech, Signal Process., Vancouver, Canada, May 26-31, 2013, pp. 7378-7382.
2	T. Fukuda, O. Ichikawa, and M. Nishimura, "Long-Term Spectro-Temporal and Static Harmonic Features for Voice Activity Detection," IEEE J. Sel. Topics Signal Process., vol. 4, no. 5, Oct. 2010, pp. 834-844. DOI ScienceOn
3	S.J. Lee et al., "Intra- and Inter-frame Features for Automatic Speech Recognition," ETRI J., vol. 36, no. 3, June 2014, pp. 514-517. DOI ScienceOn
4	M. Fujimoto, K. Ishizuka, and T. Nakatani, "A Voice Activity Detection Based on the Adaptive Integration of Multiple Speech Features and a Signal Decision Scheme," IEEE Int. Conf. Acoust., Speech, Signal Process., Las Vegas, NV, USA, Mar. 31-Apr. 4, 2008, pp. 4441-4444.
5	J. Sohn, N.S. Kim, and W. Sung, "A Statistical Model-Based Voice Activity Detection," IEEE Signal Process. Lett., vol. 6, no. 1, Jan. 1999, pp. 1-3.
6	J. Ramirez et al., "Statistical Voice Activity Detection Using a Multiple Observation Likelihood Ratio Test," IEEE Signal Process. Lett., vol. 12, no. 10, Oct. 2005, pp. 689-692. DOI ScienceOn
7	Q.H. Joe et al., "Statistical Model-Based Voice Activity Detection Using Support Vector Machine," IET Signal Process., vol. 3, no. 3, May 2009, pp. 205-210. DOI ScienceOn
8	D. Enqing et al., "Applying Support Vector Machines to Voice Activity Detection," IEEE Int. Conf. Signal Process., Beijing, China, vol. 2, Aug. 26-30, 2002, pp. 1124-1127.
9	C.Y. Park et al., "Integration of Sporadic Noise Model in POMDP-Based Voice Activity Detection," IEEE Int. Conf. Acoust., Speech, Signal Process., Dallas, TX, USA, Mar. 14-19, 2010, pp. 4486-4489.
10	H. Chung, S.J. Lee, and Y.K. Lee, "Endpoint Detection Using Weighted Finite State Transducer," Proc. INTERSPEECH, Lyon, France, Sept. 25-29, 2013, pp. 700-703.
11	M. Mohri, F. Pereira, and M. Riley, "Weighted Automata in Text and Speech Processing," European Conf. AI. Intell., Budapest, Hungary, Aug. 13, 1996, pp. 228-231.
12	C. Allauzen et al., "A General and Efficient Weighted Finite-State Transducer Library," Proc. CIAA, Prague, Czech Republic, July 16-18, 2007, pp. 11-23.