• Title/Summary/Keyword: speaker dependent

Search Result 139, Processing Time 0.028 seconds

A Study on the Speech Recognition using Advanced Competitive Learning (개선된 경쟁학습을 이용한 음성인식)

  • Song, Joon-Gyu;Lee, Dong-Wook;Kim, Young-T.
    • Proceedings of the KIEE Conference
    • /
    • 1997.11a
    • /
    • pp.594-596
    • /
    • 1997
  • This paper presents the speaker-dependent Korean isolated digit recognition system using advanced competitive learning. Since competitive learning algorithms are easy and simple to implement, they are used in various fields. The proposed recognition algorithm consists of three procedures: comparing winning number of codebook vectors, selecting the representative vector out of codebook vectors, and generating a new codebook with the representative vectors. In this paper, we use a sound blaster 16 for obtaining speech data. Speech data are sampled by 16 bits and 11 kHz sampling rate.

  • PDF

A Tow-stage Recognition Approach Based on Error Pattern Hypotheses for Connected Digit Recognition

  • Oh, Wook-Kwon;Un, Chong-Kwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.3E
    • /
    • pp.31-36
    • /
    • 1996
  • In this paper, a two-stage recognition approach based on error pattern hypotheses is proposed to reduce errors of a connected digit recognizer. In the approach, a conventional recognizer is first used to produce N-best candidate strings, and then error patterns are hypothesized by examining the candidate strings. For substitution error pattern hypotheses, error-pattern-dependent classifiers having more discriminative power than the first-stage classifier are used ; and for insertion and deletion errors, word duration and energy contour information are exploited are exploited to discriminated confusing pairs. Simulation results showed that the proposed approach achieves 15% decrease in word error rate for speaker-independent Korean connected digit recognition when a hidden Markov model-based recognizer is used for the first-stage classifier.

  • PDF

Noise Robust Emotion Recognition Feature : Frequency Range of Meaningful Signal (음성의 특정 주파수 범위를 이용한 잡음환경에서의 감정인식)

  • Kim Eun-Ho;Hyun Kyung-Hak;Kwak Yoon-Keun
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.23 no.5 s.182
    • /
    • pp.68-76
    • /
    • 2006
  • The ability to recognize human emotion is one of the hallmarks of human-robot interaction. Hence this paper describes the realization of emotion recognition. For emotion recognition from voice, we propose a new feature called frequency range of meaningful signal. With this feature, we reached average recognition rate of 76% in speaker-dependent. From the experimental results, we confirm the usefulness of the proposed feature. We also define the noise environment and conduct the noise-environment test. In contrast to other features, the proposed feature is robust in a noise-environment.

Study on the Recognition of Spoken Korean Continuous Digits Using Phone Network (음성망을 이용한 한국어 연속 숫자음 인식에 관한 연구)

  • Lee, G.S.;Lee, H.J.;Byun, Y.G.;Kim, S.H.
    • Proceedings of the KIEE Conference
    • /
    • 1988.07a
    • /
    • pp.624-627
    • /
    • 1988
  • This paper describes the implementation of recognition of speaker - dependent Korean spoken continuous digits. The recognition system can be divided into two parts, acoustic - phonetic processor and lexical decoder. Acoustic - phonetic processor calculates the feature vectors from input speech signal and the performs frame labelling and phone labelling. Frame labelling is performed by Bayesian classification method and phone labelling is performed using labelled frame and posteriori probability. The lexical decoder accepts segments (phones) from acoustic - phonetic processor and decodes its lexical structure through phone network which is constructed from phonetic representation of ten digits. The experiment carried out with two sets of 4continuous digits, each set is composed of 35 patterns. An evaluation of the system yielded a pattern accuracy of about 80 percent resulting from a word accuracy of about 95 percent.

  • PDF

Hidden Markov Models Containing Durational Information of States (상태의 고유시간 정보를 포함하는 Hidden Markov Model)

  • 조정호;홍재근;김수중
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.27 no.4
    • /
    • pp.636-644
    • /
    • 1990
  • Hidden Markov models(HMM's) have been known to be useful representation for speech signal and are used in a wide variety of speech systems. For speech recognition applications, it is desirable to incorporate durational information of states in model which correspond to phonetic duration of speech segments. In this paper we propose duration-dependent HMM's that include durational information of states appropriately for the left-to-right model. Reestimation formulae for the parameters of the proposed model are derived and their convergence is verified. Finally, the performance of the proposed models is verified by applying to an isolated word, speaker independent speech recognition system.

  • PDF

Beyond Politeness: A Spoken Discourse Approach to Korean Address Reference Terms

  • Hong, Jin-Ok
    • English Language & Literature Teaching
    • /
    • v.15 no.2
    • /
    • pp.93-119
    • /
    • 2009
  • Internalized Confucian cultural scripts trigger meta-pragmatic thinking in Korean communication. Commonly shared cultural knowledge acts as a powerful constraint upon the behavioral patterns of each participant and this knowledge can be strategically manipulated to avoid confrontations. The strategic use of address reference terms utilizes cultural values as a face-redress mechanism to achieve situation-specific goals. This paper offers a view of Korean address reference terms that rests on four revisions of politeness theory (Brown & Levinson, 1978, 1987). First, the notion of discernment - or 'wakimae' - as a culture-specific mechanism is reanalyzed. Secondly, culture-specific values as another R (ranking of imposition) variable are introduced. Thirdly, a reevaluation of the notion of positive face (respect) is discussed. Finally, the address reference terms in combination with other honorifics by the speaker that can be strategically applied either to threaten or to enhance the face of the hearer is observed. Because Confucianism is embedded in Korean cultural identity, teaching cultural values integrated and their roles in situation-dependent politeness is required in order to understand interactional nature of politeness occurring from particular discourse contexts.

  • PDF

An Isolated Word Recognition Using the Mellin Transform (Mellin 변환을 이용한 격리 단어 인식)

  • 김진만;이상욱;고세문
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.24 no.5
    • /
    • pp.905-913
    • /
    • 1987
  • This paper presents a speaker dependent isolated digit recognition algorithm using the Mellin transform. Since the Mellin transform converts a scale information into a phase information, attempts have been made to utilize this scale invariance property of the Mellin transform in order to alleviate a time-normalization procedure required for a speech recognition. It has been found that good results can be obtained by taking the Mellin transform to the features such as a ZCR, log energy, normalized autocorrelation coefficients, first predictor coefficient and normalized prediction error. We employed a difference function for evaluating a similarity between two patterns. When the proposed algorithm was tested on Korean digit words, a recognition rate of 83.3% was obtained. The recognition accuracy is not compatible with the other technique such as LPC distance however, it is believed that the Mellin transform can effectively perform the time-normalization processing for the speech recognition.

  • PDF

Speech Recognition Using Recurrent Neural Prediction Models (회귀신경예측 모델을 이용한 음성인식)

  • 류제관;나경민;임재열;성경모;안성길
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.32B no.11
    • /
    • pp.1489-1495
    • /
    • 1995
  • In this paper, we propose recurrent neural prediction models (RNPM), recurrent neural networks trained as a nonlinear predictor of speech, as a new connectionist model for speech recognition. RNPM modulates its mapping effectively by internal representation, and it requires no time alignment algorithm. Therefore, computational load at the recognition stage is reduced substantially compared with the well known predictive neural networks (PNN), and the size of the required memory is much smaller. And, RNPM does not suffer from the problem of deciding the time varying target function. In the speaker dependent and independent speech recognition experiments under the various conditions, the proposed model was comparable in recognition performance to the PNN, while retaining the above merits that PNN doesn't have.

  • PDF

MODELING QUANTITATIVE VARIATION - In the Kyungnam Dialect of Korean -

  • Cho, Yong-Hyung
    • Speech Sciences
    • /
    • v.1
    • /
    • pp.137-152
    • /
    • 1997
  • The objectives of this paper are to see how the declination is realized in the different positions/lengths of the utterance, to see if the $F_0$ value throughout the utterance changes in a predictable way, and if so, to find out the best quantitative model which fits the declination. The experiment results are as follows. First, the peak value over the utterance can be affected by the position of the peak and length of the utterance. Second, the choice of quantitative models is dependent on the different list lengths. Third, in everyone's speech, there is a baseline (the lowest $F_0$ value a speaker can use), and the $F_0$ will not fall below the baseline. Forth, the peak $F_0$ of the last word in each list shows little variation in pitch value (target $F_0$) while the number of words in the list affects the starting $F_0$ values.

  • PDF

Effective Acoustic Model Clustering via Decision Tree with Supervised Decision Tree Learning

  • Park, Jun-Ho;Ko, Han-Seok
    • Speech Sciences
    • /
    • v.10 no.1
    • /
    • pp.71-84
    • /
    • 2003
  • In the acoustic modeling for large vocabulary speech recognition, a sparse data problem caused by a huge number of context-dependent (CD) models usually leads the estimated models to being unreliable. In this paper, we develop a new clustering method based on the C45 decision-tree learning algorithm that effectively encapsulates the CD modeling. The proposed scheme essentially constructs a supervised decision rule and applies over the pre-clustered triphones using the C45 algorithm, which is known to effectively search through the attributes of the training instances and extract the attribute that best separates the given examples. In particular, the data driven method is used as a clustering algorithm while its result is used as the learning target of the C45 algorithm. This scheme has been shown to be effective particularly over the database of low unknown-context ratio in terms of recognition performance. For speaker-independent, task-independent continuous speech recognition task, the proposed method reduced the percent accuracy WER by 3.93% compared to the existing rule-based methods.

  • PDF