• Title/Summary/Keyword: Speech rate

Search Result 1,246, Processing Time 0.026 seconds

Perception of Korean Prosody by Native Speakers of English and Native Speakers of Korean (영어 원어민과 한국어 원어민의 한국어운율 인식)

  • Yi, So-Pae
    • MALSORI
    • /
    • no.65
    • /
    • pp.1-11
    • /
    • 2008
  • This study explored the perception of transplanted Korean prosody by NE (Native speakers of English) and NK (Native speakers of Korean) listeners. The Korean utterances of various sentence types produced by NE and NK were employed to transplant the original Korean prosody contours to the Korean utterances read by NE. Then, other NE and NK were instructed to rate the transplanted prosodic components. Results showed that the interactions between the two rater groups with the three factors (e.g., transplantation types & rater groups, sentence types & rater groups, sentence length & rater groups) turned out to be meaningful. Both rater groups preferred the combined effect of transplanted prosodic components (e.g. DP, DPI) to that of individual transplantation (e.g. I, D, P). Compared to NK, NE were more sensitive to duration change than pitch change whereas NK showed equal preference to the both. In sentence types such as De, Ex, Im, and Ta, NE perceived higher similarity than NK.

  • PDF

A New Distance Measure for a Variable-Sized Acoustic Model Based on MDL Technique

  • Cho, Hoon-Young;Kim, Sang-Hun
    • ETRI Journal
    • /
    • v.32 no.5
    • /
    • pp.795-800
    • /
    • 2010
  • Embedding a large vocabulary speech recognition system in mobile devices requires a reduced acoustic model obtained by eliminating redundant model parameters. In conventional optimization methods based on the minimum description length (MDL) criterion, a binary Gaussian tree is built at each state of a hidden Markov model by iteratively finding and merging similar mixture components. An optimal subset of the tree nodes is then selected to generate a downsized acoustic model. To obtain a better binary Gaussian tree by improving the process of finding the most similar Gaussian components, this paper proposes a new distance measure that exploits the difference in likelihood values for cases before and after two components are combined. The mixture weight of Gaussian components is also introduced in the component merging step. Experimental results show that the proposed method outperforms MDL-based optimization using either a Kullback-Leibler (KL) divergence or weighted KL divergence measure. The proposed method could also reduce the acoustic model size by 50% with less than a 1.5% increase in error rate compared to a baseline system.

A Study on the Adaptive Delta Modulation Algorithm (어댑티브 델타 변조 앨고리즘 연구)

  • 심수보
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.8 no.3
    • /
    • pp.113-119
    • /
    • 1983
  • In this paper, a method of the step size adaption is studied on the delta modulation coding of speech signals. Exponential adaption processes are reserched by a new circuit model. It is presented a shorten error recovery in decoder step size. Practical considerations favor one algorithm, and its digital implementation has been adapted for the illustration of above method, using the rate multipliers and the validity is verified by laboratory experiment.

  • PDF

A Study on the Low Noise Delta Codec System (저잡음 델타변조방식에 관한 연구)

  • 심수보
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.9 no.3
    • /
    • pp.120-126
    • /
    • 1984
  • In this paper, there is presented the novel encoder circuit design method in the realization of exponential adaption process on the delta modulation coding of speech signals. The digital implementation has been adapted for the illustration of above, especially using a rate multiplier end a double integration circuit. The use of a double integration of the local decoder included in the ADM encoder in prove the undesirable characteristics which the low switching speed of the ratemultiplier couses the SQNR to decreuse, and the SQNR of the decoding signal by above realization is relatively uniformed in wide range of signal levels. The validity of the above design is verified by laboratory experiments.

  • PDF

A Perceptual Study of the Temporal Cues of English Plosives for Leveled Groups of Korean English Learners (다양한 수준의 한국인 영어 학습자의 영어 파열음의 구간 신호 지각 연구)

  • Kang Seok-han;Park Hansang
    • MALSORI
    • /
    • no.56
    • /
    • pp.49-73
    • /
    • 2005
  • This study explores the most important temporal cues in the perception of the voiced/voiceless distinction of English plosives in terms of newly defined measures of perception: original signal to response agreement, unit signal to response agreement, and robustness. Seven native speakers of English and three leveled groups of Korean English learners participated in the present study. The results showed that both native speakers of English and Korean groups failed to successfully perceive the voiced/voiceless distinction of English plosives, particularly alveolar plosives, in word-medial trochaic positions. The results also showed that in word-initial and word-medial iambic positions both native speakers of English and Korean groups employ the information in the release burst and aspiration in the perception of the voiced/voiceless distinction, of English plosives, and that in word-final positions native speakers of English employ the information in the preceding vowel, while Korean groups employ the information in the closure interval.

  • PDF

한국어 자음약화현상과 인접모음의 고저성

  • Lee Suk-Hyang
    • MALSORI
    • /
    • no.33_34
    • /
    • pp.43-55
    • /
    • 1997
  • This study examined one of the hypotheses on the consonant reduction in Korean inferred from the Articulatory Phonology framework through phonetic experiments: Degree of consonant reduction depends on the height of the neighboring vowels--the lower the height of the neighboring vowel is, the higher the degree of reduction of stop closure period is. The results of this study, in general, turned out to support the hypothesis with some cases requiring other phonetic considerations, e.g., rate of some tongue tip movement in the case of dental lenis stop /t/ or the facts that bilabial lenis stop /p/ share its primary articulators, lips, with the neighboring vowel /u/ and that for bilabial closure, the upper lip lowers more for compensation of little movement of lower lip when its raising gets disturbed for some reasons.

  • PDF

Performance Improvement of Classification Between Pathological and Normal Voice Using HOS Parameter (HOS 특징 벡터를 이용한 장애 음성 분류 성능의 향상)

  • Lee, Ji-Yeoun;Jeong, Sang-Bae;Choi, Hong-Shik;Hahn, Min-Soo
    • MALSORI
    • /
    • no.66
    • /
    • pp.61-72
    • /
    • 2008
  • This paper proposes a method to improve pathological and normal voice classification performance by combining multiple features such as auditory-based and higher-order features. Their performances are measured by Gaussian mixture models (GMMs) and linear discriminant analysis (LDA). The combination of multiple features proposed by the frame-based LDA method is shown to be an effective method for pathological and normal voice classification, with a 87.0% classification rate. This is a noticeable improvement of 17.72% compared to the MFCC-based GMM algorithm in terms of error reduction.

  • PDF

Korean Phoneme Recognition Using duration-dependent 3-State Hidden Markov Model (음소길이를 고려한 3-State Hidden Markov Model 에 의한 한국어 음소인식)

  • Yoo, H.-C.;Lee, H.-J.;Park, B.-C.
    • The Journal of the Acoustical Society of Korea
    • /
    • v.8 no.1
    • /
    • pp.81-87
    • /
    • 1989
  • This paper discribes the method associated with modeling of Korean phonemes. Hidden Markov models(HMM's) may be viewed as an effective technique for modeling the inherent nonstationarity of speech signal. We propose a 3-state phoneme model to represent the sequentially changing characteristics of phonemes, i.e., transition-to-stationary-to-transition. Also we clarify that the duration of a phoneme is an important factor to have an effect in recognition accuracy and show that improvement in recognition rate can be obtained by using duration-dependent 3-state hidden Markov models.

  • PDF

Real-Time Recognition of the Korean Spingle Vowels Using the Speech Spectrum Anaysis (음성 스펙트럼 분석에 의한 한국어 단모음 실시간 인식)

  • 김엄준;성미영
    • Proceedings of the Korea Multimedia Society Conference
    • /
    • 1998.10a
    • /
    • pp.226-231
    • /
    • 1998
  • 본 연구에서는 짧은 시간에 계산이 가능하며, 음성을 특징 지울 수 있는 파라미터로서 영 교차율(zero crossing rate), 단 구간 에너지(short-term, energy) 그리고 포만트(formant)를 사용하였다. 특정 화자의 음성을 입력 받아서 단모음인 'ㅏ, ㅐ, ㅓ, ㅔ, ㅗ, ㅜ, ㅡ. ㅣ'에 대한 인식을 위해 위의 세가지 파라미터를 측정하였다. 영 교차율과 단 구간 에너지 파라미터는 유성음과 무성음의 구별과 음성인지 아닌지를 판별하는데 사용하였다. 포만트 파라미터는 10차 켑스트럼(cepstrum)을 이용하여 구하였으며, 각 단모음을 판별하기 위해서 사용하였다. 하나의 단모음을 입력받아 처리하여 텍스트로 출력하는데 평균 0.065sec에 처리하며, 각각의 단모음에 대해 93%, 10개의 테스트 문장에 대해 72%의 인식률을 보이고 있다.

  • PDF

Polyphase Representation of the Relationships Among Fullband, Subband, and Block Adaptive Filters

  • Tsai, Chimin
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2005.06a
    • /
    • pp.1435-1438
    • /
    • 2005
  • In hands-free telephone systems, the received speech signal is fed back to the microphone and constitutes the so-called echo. To cancel the effect of this time-varying echo path, it is necessary to device an adaptive filter between the receiving and the transmitting ends. For a typical FIR realization, the length of the fullband adaptive filter results in high computational complexity and low convergence rate. Consequently, subband adaptive filtering schemes have been proposed to improve the performance. In this work, we use deterministic approach to analyze the relationship between fullband and subband adaptive filtering structures. With block adaptive filtering structure as an intermediate stage, the analysis is divided into two parts. First, to avoid aliasing, it is found that the matrix of block adaptive filters is in the form of pseudocirculant, and the elements of this matrix are the polyphase components of the fullband adaptive filter. Second, to transmit the near-end voice signal faithfully, the analysis and the synthesis filter banks in the subband adaptive filtering structure must form a perfect reconstruction pair. Using polyphase representation, the relationship between the block and the subband adaptive filters is derived.

  • PDF