• 제목/요약/키워드: Phonetic Approach

검색결과 78건 처리시간 0.028초

강인한 음성 인식을 위한 선형 로그 함수 기반의 MFCC 특징 표현 연구 (Representation of MFCC Feature Based on Linlog Function for Robust Speech Recognition)

  • 윤영선
    • 대한음성학회지:말소리
    • /
    • 제59호
    • /
    • pp.13-25
    • /
    • 2006
  • In previous study, the linlog(linear log) RASTA(J-RASTA) approach based on PLP was proposed to deal with both the channel effect and the additive noise. The extraction of PLP required generally more steps and computation than the extraction of widely used MFCC. Thus, in this paper, we apply the linlog function to the MFCC for investigating the possibility of simple compensation method that removes both distortion. With the experimental results, the proposed method shows the similar tendency to the linlog RASTA-PLP_ When the J value is set to le-6, the best ERR(Error Reduction Rate) of 33% is obtained. For applying the linlog function to the feature extraction process, the J value plays a very important role in compensating the corruption. Thus, the study for the adaptive J or noise dependent J estimation is further required.

  • PDF

하품한숨 접근법이 구개열 아동의 음질개선에 미치는 효과 (The Effect of Yawn-Sigh Approach on Voice Quality of a Child with Cleft Palate: A Case Study)

  • 이은선;정옥란;석동일
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2005년도 춘계 학술대회 발표논문집
    • /
    • pp.81-84
    • /
    • 2005
  • This purpose of the present study was to determine the effects of yawn-sigh technique in voice quality of a cleft palate child. A 9-year old cleft palate child participated in the study 3 times a week for a month. The assessments were done by Dr. Speech (Version 4.0, Tiger DRS) on $F_{0}$, jitter, shimmer and NNE. The results showed that there was a tendency that the voice improved in terms of NNE. However, it did not reach a statistical significance.

  • PDF

Subspace distribution clustering hidden Markov model을 위한 codebook design (Codebook design for subspace distribution clustering hidden Markov model)

  • 조영규;육동석
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2005년도 춘계 학술대회 발표논문집
    • /
    • pp.87-90
    • /
    • 2005
  • Today's state-of the-art speech recognition systems typically use continuous distribution hidden Markov models with the mixtures of Gaussian distributions. To obtain higher recognition accuracy, the hidden Markov models typically require huge number of Gaussian distributions. Such speech recognition systems have problems that they require too much memory to run, and are too slow for large applications. Many approaches are proposed for the design of compact acoustic models. One of those models is subspace distribution clustering hidden Markov model. Subspace distribution clustering hidden Markov model can represent original full-space distributions as some combinations of a small number of subspace distribution codebooks. Therefore, how to make the codebook is an important issue in this approach. In this paper, we report some experimental results on various quantization methods to make more accurate models.

  • PDF

Viterbi 탐색 특성을 이용한 미등록어휘 제거에 대한 연구 (A Study on OOV Rejection Using Viterbi Search Characteristics)

  • 김규홍;김회린
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2005년도 춘계 학술대회 발표논문집
    • /
    • pp.95-98
    • /
    • 2005
  • Many utterance verification (UV) algorithms have been studied to reject out-of-vocabulary (OOV) in speech recognition systems. Most of conventional confidence measures for UV algorithms are mainly based on log likelihood ratio test, but these measures take much time to evaluate the alternative hypothesis or anti-model likelihood. We propose a novel confidence measure which makes use of a momentary best scored state sequence during Viterbi search. Our approach is more efficient than conventional LRT-based algorithms because it does not need to build anti-model or to calculate the alternative hypothesis. The proposed confidence measure shows better performance in additive noise-corrupted speech as well as clean speech.

  • PDF

음소인식 오류에 강인한 N-gram 기반 음성 문서 검색 (N-gram Based Robust Spoken Document Retrievals for Phoneme Recognition Errors)

  • 이수장;박경미;오영환
    • 대한음성학회지:말소리
    • /
    • 제67호
    • /
    • pp.149-166
    • /
    • 2008
  • In spoken document retrievals (SDR), subword (typically phonemes) indexing term is used to avoid the out-of-vocabulary (OOV) problem. It makes the indexing and retrieval process independent from any vocabulary. It also requires a small corpus to train the acoustic model. However, subword indexing term approach has a major drawback. It shows higher word error rates than the large vocabulary continuous speech recognition (LVCSR) system. In this paper, we propose an probabilistic slot detection and n-gram based string matching method for phone based spoken document retrievals to overcome high error rates of phone recognizer. Experimental results have shown 9.25% relative improvement in the mean average precision (mAP) with 1.7 times speed up in comparison with the baseline system.

  • PDF

영교차점과 켑스트럼 전처리 기술을 이용한 반향환경에서의 음원방향 추정 (Zero-Crossing-Based Source Direction Estimation Using a Cepstral Prefiltering Technique)

  • 박용진;이수연;박형민
    • 대한음성학회지:말소리
    • /
    • 제67호
    • /
    • pp.121-133
    • /
    • 2008
  • To estimate directions of multi-sound sources, we consider an approach based on zero crossings which provided more robust results to diffuse noise than the conventional cross-correlation-based method [6][7]. In reverberant environments, the performance of source direction estimation can be improved by using signal components through direct paths from sources to microphones. Since a cepstral prefiltering technique [8] removes the effect of reverberation, we propose a source direction estimation method which can find out intervals of the direct-path components by comparing original and cepstral-prefiltered envelopes. Simulations demonstrate that the proposed method can improve the performance of source direction estimation in reverberant environments.

  • PDF

신뢰성 높은 서브밴드 선택을 이용한 잡음에 강인한 화자식별 (Noise Robust Speaker Identification using Reliable Sub-Band Selection in Multi-Band Approach)

  • 김성탁;지미경;김희린
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2007년도 한국음성과학회 공동학술대회 발표논문집
    • /
    • pp.127-130
    • /
    • 2007
  • The conventional feature recombination technique is very effective in the band-limited noise condition, but in broad-band noise condition, the conventional feature recombination technique does not produce notable performance improvement compared with the full-band system. To cope with this drawback, we introduce a new technique of sub-band likelihood computation in the feature recombination, and propose a new feature recombination method by using this sub-band likelihood computation. Furthermore, the reliable sub-band selection based on the signal-to-noise ratio is used to improve the performance of this proposed feature recombination. Experimental results shows that the average error reduction rate in various noise condition is more than 27% compared with the conventional full-band speaker identification system.

  • PDF

탠덤 구조를 이용한 강인한 음성 인식 시스템 설계 (Design of Robust Speech Recognition System Using Tandem Architecture)

  • 윤영선;이윤근
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2007년도 한국음성과학회 공동학술대회 발표논문집
    • /
    • pp.323-326
    • /
    • 2007
  • The various studies of combining neural network and hidden Markov models within a single system are done with expectations that it may potentially combine the advantages of both systems. With the influence of these studies, tandem approach was presented to use neural network as the classifier and hidden Markov models as the decoder. In this paper, we applied the trend information of segmental features to tandem architecture and used posterior probabilities, which are the output of neural network, as inputs of recognition system. The experiments are performed on Aurora2 database to examine the potentiality of the trend feature based tandem architecture. The proposed method shows the better results than the baseline system on very low SNR environments.

  • PDF

Analysis and Interpretation of Intonation Contours of Slovene

  • Ales Dobnikar
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 1996년도 10월 학술대회지
    • /
    • pp.542-547
    • /
    • 1996
  • Prosodic characteristics of natural speech, especially intonation, in many cases represent specific feelings of the speaker at the time of the utterance, with relatively vast variations of speaking styles over the same text. We analyzed a collected speech corpus, recorded with ten Slovene speakers. Interpretation of observed intonation contours was done for the purpose of modelling the intonation contour in synthesis process. We devised a scheme for modeling the intonation contour for different types of intonation units based on the results of analyzing intonation contours. The intonation scheme uses a superpositional approach, which defines the intonation contour as the sum of global (intonation unit) and local (accented syllables or syntactic boundaries) components. Near-to-natural intonation contour was obtained by rules, using only the text of the utterance as input.

  • PDF

영어 복합명사의 강세형 (Stress Patterns of Compound Nouns in English)

  • 이영길
    • 대한음성학회지:말소리
    • /
    • 제42호
    • /
    • pp.25-36
    • /
    • 2001
  • Stress assignment has been much discussed in the literature on English compound nouns. The general view of the stress pattern of English compound nouns is that a main stress falls on the first element and a secondary stress on the second element; however, a stress pattern is often employed that provides counterevidence to the traditional pedagogical approach. A new idea is suggested by Ladd(1984) that 'compound stress represents the deaccenting of the head of the compound.' Recent studies show that initial stressing does not indicate compounds and syntactic phrases are not always characterized by final stressing. In his pilot test Pennanen comments on the frequent variation of stress patterns on individual items, on the basis of which Bauer confirms Pennanen's results with different informants. This paper is an attempt to justify Bauer's analysis with the same data as Bauer's and different subjects. It turns out that the competences of native-speaker informants do not rovide clear-cut answers. Some factors should be taken into account in assigning appropirate stress to compound nouns.

  • PDF