• Title/Summary/Keyword: 고립단어

Search Result 127, Processing Time 0.029 seconds

Speech recognition in car noise environments using multiple models according to noise masking levls (잡음 마스킹 레벨에 따른 복수 모델을 이용한 자동차 소음환경에서의 음성인식)

  • 정회인
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.08a
    • /
    • pp.60-64
    • /
    • 1998
  • 음성인식 시스템의 실용화 과정에서 훈련환경과 테스트 환경의 불일치로 인한 인식성능의 저하는 반드시 극복되어야 할 문제이다. 본 논문에서는 잡음 tR인 입력음성의 비음성구간에서 잡음레벨을 추정하여 음성 스펙트럼에서 추정된 잡음레벨을 빼는 스펙트럼 차감법고 스펙트럼 영역에서 미리 정해진 마스킹 레벨보다 낮은 에너지 값을 마스킹 레벨로 올려주는 잡음 마스킹을 함께 사용함으로써 훈련 환경과 테스트환경의 불일치를 줄이는 방법을 제안한다. 그리고 복수의 마스킹 레벨에 대한 모델들을 미리 만들어 두고 추정된 잡음 레벨에 따라 적합한 마스킹 레벨의 보델을 사용하여 인식을 수해?는 다중 모델 방법을 적용하였다. 자동차 소음환경에서 두 가지 마스킹 레벨에 대한 모델을 이용한 화자독립고립단어 인식 실험을 통하여 본 논문에서 제안한 방식은 정차중 무시동 환경에서 95.8%, 정차중 시동 환경에서 95.6%, 한적한 도로환경에서 92.8%, 복잡한 시내도로 환경에서 89.6%, 고속도로 환경에서 74.4%의 인식성능을 나타내었으며, 평균 90.7%의 성능을 얻을 수 있다.

  • PDF

A New Hidden Error Function for Layer-By-Layer Training of Multi layer Perceptrons (다층 퍼셉트론의 층별 학습을 위한 중간층 오차 함수)

  • Oh Sang-Hoon
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2005.11a
    • /
    • pp.364-370
    • /
    • 2005
  • LBL(Layer-By-Layer) algorithms have been proposed to accelerate the training speed of MLPs(Multilayer Perceptrons). In this LBL algorithms, each layer needs a error function for optimization. Especially, error function for hidden layer has a great effect to achieve good performance. In this sense, this paper proposes a new hidden layer error function for improving the performance of LBL algorithm for MLPs. The hidden layer error function is derived from the mean squared error of output layer. Effectiveness of the proposed error function was demonstrated for a handwritten digit recognition and an isolated-word recognition tasks and very fast learning convergence was obtained.

  • PDF

Isolated-Word Recognition Using Adaptively Partitioned Multisection Codebooks (음성적응(音聲適應) 구간분할(區間分割) 멀티섹션 코드북을 이용(利用)한 고립단어인식(孤立單語認識))

  • Ha, Kyeong-Min;Jo, Jeong-Ho;Hong, Jae-Kuen;Kim, Soo-Joong
    • Proceedings of the KIEE Conference
    • /
    • 1988.07a
    • /
    • pp.10-13
    • /
    • 1988
  • An isolated-word recognition method using adaptively partitioned multisection codebooks is proposed. Each training utterance was divided into several sections according to its pattern extracted by labeling technique. For each pattern, reference codebooks were generated by clustering the training vectors of the same section. In recognition procedure, input speech was divided into the sections by the same method used in codebook generation procedure, and recognized to the reference word whose codebook represented the smallest average distortion. The proposed method was tested for 100 Korean words and attained recognition rate about 96 percent.

  • PDF

Optimal Learning Rates in Gradient Descent Training of Multilayer Perceptrons (다층퍼셉트론의 강하 학습을 위한 최적 학습률)

  • 오상훈
    • The Journal of the Korea Contents Association
    • /
    • v.4 no.3
    • /
    • pp.99-105
    • /
    • 2004
  • This paper proposes optimal learning rates in the gradient descent training of multilayer perceptrons, which are a separate learning rate for weights associated with each neuron and a separate one for assigning virtual hidden targets associated with each training pattern Effectiveness of the proposed error function was demonstrated for a handwritten digit recognition and an isolated-word recognition tasks and very fast learning convergence was obtained.

  • PDF

A study on real-time implementation of speech recognition and speech control system using dSPACE board (dSPACE 보드를 이용한 음성인식 명령처리시스템 실시간 구현에 관한 연구)

  • 김재웅;정원용
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2000.12a
    • /
    • pp.173-176
    • /
    • 2000
  • 음성은 인간이 가진 가장 편리한 제어전송수단으로 이를 통한 제어는 인간에게 많은 편리함을 제공할 것이다. 본 논문에서는 다층구조 신경망(Multi-Layer Perceptron)을 이용하여 간단한 음성인식 명령처리시스템을 Matlab 상에서 구성해 보았다. 음성인식을 통한 제어의 목적을 위해 화자종속, 고립단어인식기를 목표로 설정하여 연구를 수행하였다. 음성의 시작점과 끝점을 검출하기 위해 단구간 에너지와 영교차율(ZCR)을 이용하였고 인식기의 특징파라미터로는 12차 LPC켑스트럼 계수를 사용하였다. 그리고 신경망의 출력값을 기동, 정지시에 활성화되도록 3개의 계층으로 하였고, 신경망의 뉴런의 개수를 각각 12, 12, 2으로 설정하였다. 먼저 기준음성패턴으로 학습시킨 후에 Matlab 환경하에 동작하는 dSPACE 실시간처리보드에 변환된 C프로그램을 다운로드하고, 음성을 입력하여 인식 후 dSPACE보드의 D/A컨버터의 출력단에 연결된 DC모터를 기동, 정지제어를 수행하였다. 실시간 음성인식 명령처리 시스템 구현을 통하여 원격제어와 같은 음성명령을 통한 제어가 가능함을 확인할 수 있었다.

  • PDF

Aspects of the word-final stop releasing and its phonetic correlates in reading the English isolated words enumerated (영어 나열형 고립 단어 읽기에서 어말 폐쇄음의 파열 양상 및 그 음성적 상관성)

  • Rhee Seok-Chae;Kang Sooha;Park Jihyun;Hwang Sunmin
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.61-68
    • /
    • 2003
  • This experimental research shows that, in reading of the English isolated words that are enumerated, the releasing of the word-final stop is employed for signaling enumeration in company with the well-known intonational pattern for it. Furthermore, this study tries to find the conceivable phonetic correlates of the releasing of the stop in word-final position, focusing on the association of the stop releasing/nonreleasing with i) the POA (Place of Articulation) distinction of the word-final stop, ii) the various qualities of the preceding vowel placed before the final stop, and iii) the voice distinction of the stop in the word-final position.

  • PDF

A Single-End-Point DTW Algorithm for Keyword Spotting (핵심어 검출을 위한 단일 끝점 DTW알고리즘)

  • 최용선;오상훈;이수영
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.3
    • /
    • pp.209-219
    • /
    • 2004
  • In order to implement a real time hardware for keyword spotting, we propose a Single-End-Point DTW(SEP-DTW) algorithm which is simple and less complex for computation. The SEP-DTW algorithm only needs a single end point which enables efficient applications, and it has a small wont of computations because the global search area is divided into successive local search areas. Also, we adopt new local constraints and a new distance measure for a better performance of the SEP-DTW algorithm. Besides, we make a normalization of feature same vectors so that they have the same variance in each frequency bin, and each frame has the same energy levels. To construct several reference patterns for each keyword, we use a clustering algorithm for all training patterns, and mean vectors in every cluster are taken as reference patterns. In order to detect a key word for input streams of speech, we measure the distances between reference patterns and input pattern, and we make a decision whether the distances are smaller than a pre-defined threshold value. With isolated speech recognition and keyword spotting experiments, we verify that the proposed algorithm has a better performance than other methods.

A Study on the Pitch Contour Generator with Neural Network in the Isolated Words (신경망을 이용한 고립단어에서의 피치변화곡선 발생기에 관한 연구)

  • Lim Unchun;Kwak Jingu;Chang Sokwang
    • Proceedings of the KSPS conference
    • /
    • 1996.02a
    • /
    • pp.137-155
    • /
    • 1996
  • The purpose of this paper is to generate a pitch contour which is affected by tile phonetic environment and the number of syllables in each Korean isolated word using a neural network. To do this, we analyzed a set of 513 Korean isolated words, consisting of 1-4 syllables and extracted the pitch contour and the duration of each phoneme in all the words. The total number of phonemes we analyzed is about 3800. After that we approximated the pitch contour with a 1st order polynominal by a regression analysis. We could get the slope, the initial pitch and the duration of each phoneme. We used these 3 parameters as the target pattern of the neural network and let the neural network learn the rule of the variation of the pitch and duration, which was affected by the phonetic environment of each phoneme. We used 7 consecutive phoneme strings as an input pattern for a neural network to make the network learn the effect of phonetic environment around the center phoneme. In the learning phase, we used 3545 items(463 words) as target patterns which contained the phonetic environment of front and rear 3 phonemes and the neural network showed the correctness rate of 98.43%, 98.59%, 97.7% in the estimation of the duration, the slope, the initial pitch. In the recall phase, we tested the performance of tile neural network with 251 items(50 words) which weren't need as learning data and we could get the good correctness rate of 97.34%, 95.45%, 96.3% in the generation of the duration, the slope, and the initial pitch of each phoneme.

  • PDF

Improvement of the Linear Predictive Coding with Windowed Autocorrelation (윈도우가 적용된 자기상관에 의한 선형예측부호의 개선)

  • Lee, Chang-Young;Lee, Chai-Bong
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.6 no.2
    • /
    • pp.186-192
    • /
    • 2011
  • In this paper, we propose a new procedure for improvement of the linear predictive coding. To reduce the error power incurred by the coding, we interchanged the order of the two procedures of windowing on the signal and linear prediction. This scheme corresponds to LPC extraction with windowed autocorrelation. The proposed method requires more calculational time because it necessitates matrix inversion on more parameters than the conventional technique where an efficient Levinson-Durbin recursive procedure is applicable with smaller parameters. Experimental test over various speech phonemes showed, however, that our procedure yields about 5 % less power distortion compared to the conventional technique. Consequently, the proposed method in this paper is thought to be preferable to the conventional technique as far as the fidelity is concerned. In a separate study of speaker-dependent speech recognition test for 50 isolated words pronounced by 40 people, our approach yielded better performance too.

Noisy Speech Recognition using Probabilistic Spectral Subtraction (확률적 스펙트럼 차감법을 이용한 잡은 환경에서의 음성인식)

  • Chi, Sang-Mun;Oh, Yung-Hwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.6
    • /
    • pp.94-99
    • /
    • 1997
  • This paper describes a technique of probabilistic spectral subtraction which uses the knowledge of both noise and speech so as to reduce automatic speech recognition errors in noisy environments. Spectral subtraction method estimates a noise prototype in non-speech intervals and the spectrum of clean speech is obtained from the spectrum of noisy speech by subtracting this noise prototype. Thus noise can not be suppressed effectively using a single noise prototype in case the characteristics of the noise prototype are different from those of the noise contained in input noisy speech. To modify such a drawback, multiple noise prototypes are used in probabilistic subtraction method. In this paper, the probabilistic characteristics of noise and the knowledge of speech which is embedded in hidden Markov models trained in clean environments are used to suppress noise. Futhermore, dynamic feature parameters are considered as well as static feature parameters for effective noise suppression. The proposed method reduced error rates in the recognition of 50 Korean words. The recognition rate was 86.25% with the probabilistic subtraction, 72.75% without any noise suppression method and 80.25% with spectral subtraction at SNR(Signal-to-Noise Ratio) 10 dB.

  • PDF