• 제목/요약/키워드: Isolated word

검색결과 156건 처리시간 0.025초

신경망을 이용한 고립단어에서의 피치변화곡선 발생기에 관한 연구 (A Study on the Pitch Contour Generator with Neural Network in the Isolated Words)

  • 임운천;곽진구;장석왕
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 1996년도 2월 학술대회지
    • /
    • pp.137-155
    • /
    • 1996
  • The purpose of this paper is to generate a pitch contour which is affected by tile phonetic environment and the number of syllables in each Korean isolated word using a neural network. To do this, we analyzed a set of 513 Korean isolated words, consisting of 1-4 syllables and extracted the pitch contour and the duration of each phoneme in all the words. The total number of phonemes we analyzed is about 3800. After that we approximated the pitch contour with a 1st order polynominal by a regression analysis. We could get the slope, the initial pitch and the duration of each phoneme. We used these 3 parameters as the target pattern of the neural network and let the neural network learn the rule of the variation of the pitch and duration, which was affected by the phonetic environment of each phoneme. We used 7 consecutive phoneme strings as an input pattern for a neural network to make the network learn the effect of phonetic environment around the center phoneme. In the learning phase, we used 3545 items(463 words) as target patterns which contained the phonetic environment of front and rear 3 phonemes and the neural network showed the correctness rate of 98.43%, 98.59%, 97.7% in the estimation of the duration, the slope, the initial pitch. In the recall phase, we tested the performance of tile neural network with 251 items(50 words) which weren't need as learning data and we could get the good correctness rate of 97.34%, 95.45%, 96.3% in the generation of the duration, the slope, and the initial pitch of each phoneme.

  • PDF

On-Line Blind Channel Normalization for Noise-Robust Speech Recognition

  • Jung, Ho-Young
    • IEIE Transactions on Smart Processing and Computing
    • /
    • 제1권3호
    • /
    • pp.143-151
    • /
    • 2012
  • A new data-driven method for the design of a blind modulation frequency filter that suppresses the slow-varying noise components is proposed. The proposed method is based on the temporal local decorrelation of the feature vector sequence, and is done on an utterance-by-utterance basis. Although the conventional modulation frequency filtering approaches the same form regardless of the task and environment conditions, the proposed method can provide an adaptive modulation frequency filter that outperforms conventional methods for each utterance. In addition, the method ultimately performs channel normalization in a feature domain with applications to log-spectral parameters. The performance was evaluated by speaker-independent isolated-word recognition experiments under additive noise environments. The proposed method achieved outstanding improvement for speech recognition in environments with significant noise and was also effective in a range of feature representations.

  • PDF

성대신호 명령어 인식기를 위한 음운자질에 기반한 성대신호 연구 (Vocal-cord Signal Study based on Phonological Feature for Vocal-cord Signal Isolated-Word recognizer)

  • 정영규;한문성;조관현
    • 한국HCI학회:학술대회논문집
    • /
    • 한국HCI학회 2006년도 학술대회 1부
    • /
    • pp.565-570
    • /
    • 2006
  • 웨어러블 환경에서 가장 유용한 사용자 인터페이스는 음성이다. 그러나 현재 노이즈 제거 기술로는 웨어러블 환경과 같은 고소음 환경에서 음성 인식기의 실제적인 응용은 거의 불가능하다. 본 논문은 환경노이즈를 원천적으로 차단하는 성대 마이크를 이용한 명령어 인식기를 개발한다. 이를 위해 성대신호를 한국어 음운자질 이론을 기반으로 설명하고, 입력신호를 분석하여 이러한 접근방법의 타당성을 검증한다. 이러한 성대신호의 분석을 위해 스펙트럼과, FFT 결과를 사용하고, MFCC 알고리즘을 이용하여 주파수 영역내의 정보량이 인식에 미치는 영향을 분석한다. 그리고 분석결과를 바탕으로 성대신호 명령어 인식기를 위한 특징벡터로 유/무성음 분리에 사용되는 특징벡터가 유용함을 ZCPA 알고리즘을 이용한 성대신호 명령어 인식기를 개발하여 검증한다. 실험결과 ZCPA 를 사용한 것이 MFCC 에 비해 16%높은 인식률을 보인다.

  • PDF

Adaptive Channel Normalization Based on Infomax Algorithm for Robust Speech Recognition

  • Jung, Ho-Young
    • ETRI Journal
    • /
    • 제29권3호
    • /
    • pp.300-304
    • /
    • 2007
  • This paper proposes a new data-driven method for high-pass approaches, which suppresses slow-varying noise components. Conventional high-pass approaches are based on the idea of decorrelating the feature vector sequence, and are trying for adaptability to various conditions. The proposed method is based on temporal local decorrelation using the information-maximization theory for each utterance. This is performed on an utterance-by-utterance basis, which provides an adaptive channel normalization filter for each condition. The performance of the proposed method is evaluated by isolated-word recognition experiments with channel distortion. Experimental results show that the proposed method yields outstanding improvement for channel-distorted speech recognition.

  • PDF

A Production and Perception Experiment of Korean Alveolar Fricatives

  • Yoon, Kyu-Chul
    • 음성과학
    • /
    • 제9권3호
    • /
    • pp.169-184
    • /
    • 2002
  • Korean has two types of voiceless alveolar fricatives: a non-tense fricative /$S^{h}$ and a tense fricative /s'/. Twenty native speakers of Korean produced five pairs of isolated words containing word initial $S^{h}V$ and /s'V/ sequences where V was any one of five (/a, e, i, o, u/) of Korean vowels. Acoustic measures such as duration, fricative noise prominent frequency, energy change of following vowel, and fundamental frequency at vowel onset were examined. Results showed that among the parameters, aspiration noise duration of /s'/ in mid and low vowel contexts was less than 21 ms. In a perception experiment, where only the aspiration noise interval of the /$S^{h}$/ tokens was incrementally reduced, some listeners shifted perception from /$S^{h}$/ to /s'/.

  • PDF

다층퍼셉트론의 출력 노드 수 증가에 의한 성능 향상 (Performance Improvement of Multilayer Perceptrons with Increased Output Nodes)

  • 오상훈
    • 한국콘텐츠학회논문지
    • /
    • 제9권1호
    • /
    • pp.123-130
    • /
    • 2009
  • 일반적으로 다층퍼셉트론을 패턴인식 문제에 적용할 경우 클래스 당 하나의 출력 노드를 배정하고, 이 출력 노드의 인덱스가 입력 패턴의 클래스를 뜻하도록 한다. 이 논문에서는 이와 달리 다층퍼셉트론의 성능 향상을 위하여 클래스 당 출력노드 수를 증가시키는 방법을 제안한다. 두 개의 클래스 문제를 대상으로 클래스 발생확률이 동일하고 각 클래스 내에서 출력노드가 균일분포를 지닌다는 가정 하에, 이 방법의 효용성을 확률론적인 유도를 통하여 증명하였다. 그리고, 50개의 고립단어 인식의 시뮬레이션으로 출력노드를 증가 시킬 경우 성능이 향상됨을 확인하였다.

비선형 변환에 의한 중간층 뉴런 상관계수 감소 (Decreasing of Correlations Among Hidden Neurons of Multilayer Perceptrons)

  • 오상훈
    • 한국콘텐츠학회논문지
    • /
    • 제3권3호
    • /
    • pp.98-102
    • /
    • 2003
  • 다층퍼셉트론의 중간층 뉴런이 지닌 역할을 정보처리의 관점에서 밝혀내기 위해서, 이 논문에서는 중간층 뉴런의 가중치 합들 간의 상관계수가 비선형 변환에 의해 감소하게 됨을 증명하였다. 고립단어 인식을 다층퍼셉트론에 학습시킨 경우의 시뮬레이션으로 이러한 증명이 맞음도 보였다. 이 결과로부터 중간층 뉴런이 지닌 비선형 변환은 정보의 중복을 감소시키는 효과가 있음을 알 수 있다.

  • PDF

성대마이크를 이용한 ASR 시스템 개발을 위한 인식기 최적화 (Recognizer Optimization for a Isolated-word Recognition system using Throat Microphone)

  • 정영규;한문성;이상조
    • 한국HCI학회:학술대회논문집
    • /
    • 한국HCI학회 2007년도 학술대회 1부
    • /
    • pp.406-410
    • /
    • 2007
  • 성대마이크는 디바이스의 특성상 환경 잡음을 최소화하는 장점이 있다. 그러나 고주파정보의 손실과 부분적인 포먼트 정보의 손실 때문에, 성대마이크를 이용한 명령어 인식기는 표준마이크를 이용한 명령어 인식기보다 낮은 성능을 보인다. 본 논문은 한국어 음운자질의 특성을 적용한 특징추출 알고리즘과 최적화된 인식모델을 이용하여 높은 성능을 갖는 명령어 인식시스템을 제안한다. 성대 울림 특성이 한국어 내의 분포 분석하여 성대 울림 정보만으로 명령어 인식기 개발이 가능함을 보이고 음성인식에 높은 성능을 보이는 Time Delay Neural Network(TDNN)[1]을 성대신호 명령어 인식에 최적화한 구조를 제안한다. 실험을 통해 찾은 최적 TDNN 구조를 성대신호에 적용한 했을 때 약 87%의 높은 성능을 보였다.

  • PDF

Verification of Normalized Confidence Measure Using n-Phone Based Statistics

  • Kim, Byoung-Don;Kim, Jin-Young;Na, Seung-You;Choi, Seung-Ho
    • 음성과학
    • /
    • 제12권1호
    • /
    • pp.123-134
    • /
    • 2005
  • Confidence measure (CM) is used for the rejection of mis-recognized words in an automatic speech recognition (ASR) system. Rahim, Lee, Juang and Cho's confidence measure (RLJC-CM) is one of the widely-used CMs [1]. The RLJC-CM is calculated by averaging phone-level CMs. An extension of the RLJC-CM was achieved by Kim et al [2]. They devised the normalized CM (NCM), which is a statistically normalized version of the RLJC-CM by using the tri-phone based CM normalization. In this paper we verify the NCM by generalizing tri-phone to n-phone unit. To apply various units for the normalization, mono-phone, tri-phone, quin-phone and $\infty$-phone are tested. By the experiments in the domain of the isolated word recognition we show that tri-phone based normalization is sufficient enough to enhance the rejection performance of the ASR system. Also we explain the NCM in regard to two class pattern classification problems.

  • PDF

An Adaptive Learning Rate with Limited Error Signals for Training of Multilayer Perceptrons

  • Oh, Sang-Hoon;Lee, Soo-Young
    • ETRI Journal
    • /
    • 제22권3호
    • /
    • pp.10-18
    • /
    • 2000
  • Although an n-th order cross-entropy (nCE) error function resolves the incorrect saturation problem of conventional error backpropagation (EBP) algorithm, performance of multilayer perceptrons (MLPs) trained using the nCE function depends heavily on the order of nCE. In this paper, we propose an adaptive learning rate to markedly reduce the sensitivity of MLP performance to the order of nCE. Additionally, we propose to limit error signal values at out-put nodes for stable learning with the adaptive learning rate. Through simulations of handwritten digit recognition and isolated-word recognition tasks, it was verified that the proposed method successfully reduced the performance dependency of MLPs on the nCE order while maintaining advantages of the nCE function.

  • PDF