Search | Korea Science

Lee, Chae-Won;Chang, Joon-Hyuk
- The Journal of the Acoustical Society of Korea
- /
- v.41 no.3
- /
- pp.335-341
- /
- 2022
In this paper, we adopt a transformer that shows remarkable performance in natural language processing as an acoustic model of hybrid speech recognition. The transformer acoustic model uses attention structures to process sequential data and shows high performance with low computational cost. This paper proposes a method to improve the performance of transformer AM by applying each of the four algorithms of sequence discriminative training, a weighted finite-state transducer (wFST)-based learning used in the existing DNN-HMM model. In addition, compared to the Cross Entropy (CE) learning method, sequence discriminative method shows 5 % of the relative Word Error Rate (WER).
https://doi.org/10.7776/ASK.2022.41.3.335 인용 PDF KSCI

Chung, Yong-Joo;Un, Chong-Kwan
- The Journal of the Acoustical Society of Korea
- /
- v.15 no.4E
- /
- pp.21-27
- /
- 1996
In this paper, we propose a discriminative training algorithm for the stochastic segment model (SSM) in continuous speech recognition. As the SSM is usually trained by maximum likelihood estimation (MLE), a discriminative training algorithm is required to improve the recognition performance. Since the SSM does not assume the conditional independence of observation sequence as is done in hidden Markov models (HMMs), the search space for decoding an unknown input utterance is increased considerably. To reduce the computational complexity and starch space amount in an iterative training algorithm for discriminative SSMs, a hybrid architecture of SSMs and HMMs is programming using HMMs. Given the segment boundaries, the parameters of the SSM are discriminatively trained by the minimum error classification criterion based on a generalized probabilistic descent (GPD) method. With the discriminative training of the SSM, the word error rate is reduced by 17% compared with the MLE-trained SSM in speaker-independent continuous speech recognition.
PDF

Kim, Minyoung
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.14 no.3
- /
- pp.209-215
- /
- 2014
Sequence tagging is the task of predicting frame-wise labels for a given input sequence and has important applications to diverse domains. Conventional methods such as maximum likelihood (ML) learning matches global features in empirical and model distributions, rather than local features, which directly translates into frame-wise prediction errors. Recent probabilistic sequence models such as conditional random fields (CRFs) have achieved great success in a variety of situations. In this paper, we introduce a novel discriminative CRF learning algorithm to minimize local feature mismatches. Unlike overall data fitting originating from global feature matching in ML learning, our approach reduces the total error over all frames in a sequence. We also provide an efficient gradient-based learning method via gradient forward-backward recursion, which requires the same computational complexity as ML learning. For several real-world sequence tagging problems, we empirically demonstrate that the proposed learning algorithm achieves significantly more accurate prediction performance than standard estimators.
https://doi.org/10.5391/IJFIS.2014.14.3.209 인용 PDF KSCI