• Title/Summary/Keyword: speech distortion

Search Result 227, Processing Time 0.027 seconds

A study on a fast algorithm for the LSP coefficient quantization of G. 723.1 speech codec (G.723.1 음성 부호화기의 LSE 계수 양자화를 위한 고속화 알고리즘 연구)

  • Son Chang-yong;Sung Ho-sang;Kang Sang-won;Sung Yu-na
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.153-156
    • /
    • 2000
  • 본 논문에서는 멀티미디어 서비스들 중에서 음성 또는 오디오 신호를 저속으로 압축할 때 사용되는 G.723.1 부호화기의 line spectral frequency(LSF) 계수 양자화 방식을 고속으로 처리하는 알고리즘을 제안하였다. 제안된 고속탐색 방법은 LSF 계수의 순서성질을 이용하여 코드북의 탐색 범위를 줄임으로써 계산량을 크게 감소시킨다. 제안된 고속탐색 방법을 predictive split VQ(PSVQ) 구조를 갖는 G.723.1 에 적용한 결과 spectral distortion(SD) 성능 감쇄 및 추가적인 메모리 증가 없이 최적 코드벡터를 찾기 위한 코드북 탐색 과정에서 코드북의 평균 탐색 범위가 $20.1\%$ 감소했으며, 이는 additions, subtractions, multiplies 및 comparisons 수가 각각 $19.1\%$, $20.1\%$, $19.4\%$$12.2\% 감소하는 결과를 얻었다.

  • PDF

Phonetic characteristics of Korean lax, fortis, and aspirated stops in apraxic patients (한국어 파열음에 나타나는 실행증 환자의 음성적 특성 연구)

  • Kim Sujung;Kim Yunjung;Hong Jongseon
    • MALSORI
    • /
    • no.38
    • /
    • pp.125-136
    • /
    • 1999
  • This study examined the perception and production of Korean lax, fortis and aspirated stops in three apraxic patients. All of tile subjects made more production errors than perception errors. This indicates that apraxic patients have problems in phonetic execution rather than phonological representation. Additionally, in both production and perception, there were more errors in non-word-initial consonants than in word-initial consonants. These findings contradict those of the previous studies which report more errors in word-initial consonants. This study also found that, unlike previous studies in the types of errors made, distortion errors were high in both non-word-initial and word-initial consonants in apraxic patients. Generally, VOT of the stops showed significant differences among lax, fortis, and aspirated stops, which indicates that there is a failure not in choosing the appropriate stop but in positioning or motor planning at the articulation stage.

  • PDF

A Study on the Analysis and Recognition of Korean Speech Signal using the Phoneme (음소를 이용한 한국어 음성 신호의 분석과 인식에 관한 연구)

  • Kim Y. I.;Hwang Y. S.;Youn D. H.;Cha I. W.
    • The Journal of the Acoustical Society of Korea
    • /
    • v.8 no.5
    • /
    • pp.70-77
    • /
    • 1989
  • In this paper, Korean language recognition using the phoneme is studied. The experiment is carried out by dividing 545 isolated words into phonemes. Using linear prediction coefficients the recognition rate of consonants, vowels, and end-consonants are $87.3(\%), 91.0(\%), 91.7(\%)$, respectively. Recognition rate of isolated words combined with the phonemes is $71.4(\%)$. Itakura-saito distortion measure is used to phoneme segmentation and phoneme recognition.

  • PDF

An efficient method of spatial cues and compensation method of spectrums on multichannel spatial audio coding (멀티채널 Spatial Audio Coding에서의 효율적인 Spatial Cues 사용과 그에 따른 Spectrum 보상방법)

  • Lee, Byong-Hwa;Beack, Seung-Kwon;Seo, Jeong-Gil;Han, Min-Soo
    • MALSORI
    • /
    • no.53
    • /
    • pp.157-169
    • /
    • 2005
  • This paper proposes an efficiently representing method of spatial cues on multichannel spatial audio coding. The Binaural Cue Coding (BCC) method introduced recently represents multichannel audio signals by means of Inter Channel Level Difference (ICLD) or Source Index (SI). We tried to express more efficiently ICLD and SI information based on Inter Channel Correlation in this paper. We adopt different spatial cues according to ICC and propose a compensation method of empty spectrums created by using SI. We performed a MOS test and measuring spectral distortion. The results show that the proposed method can reduce the bitrate of side information without large degradation of the audio quality.

  • PDF

An Analysis of Pronunciation Errors in Word-initial Onglides in English and a Suggestion of Teaching Method (어두에 나타나는 상향 이중모음의 오류분석 및 지도방안 연구)

  • Choi, Ju-Young;Park, Han-Sang
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.183-186
    • /
    • 2007
  • This study analyzes Korean high school students' pronunciation errors in word-initial onglides in English. For this study, 24 Korean high school students read 34 English words including glide-vowel sequences in word-initial positions and vowel-initial words in a frame sentence. The results showed 2 different error types: glide deletion and vowel distortion. After the analysis of the first recording, the subjects were taught how to pronounce glide-vowel sequences properly in a 60-minute class. Comparison of the analyses of the first and second recordings showed that the subjects improved on the pronunciation of glide-vowel sequences. After the training, the pronunciation errors of diphthongs unique to English, [$j_I$], decreased substantially. However, most subjects still had difficulties in pronouncing [$w{\mho}$], [wu], and [wo]. There was no significant correlation between English course grade and error reduction.

  • PDF

Efficient quantization of LPC parameters for vocoder of mobile communications (이동통신 음성 부화화기를 위한 선형 예측 계수(LPC)의 효율적 양자화 방법)

  • 이인성;우홍채
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.34S no.4
    • /
    • pp.50-56
    • /
    • 1997
  • In this paper, efficient quantization methods of line spectrum pairs (LSP) which has good performances and low complexity and memory are proosed for vocoder of mobile communication system. The adaptive quantization method utilizing the ordering property of LSP parameters is used in a scalar quantizer and a vector-scalar hybrid quantizer. The proposed scalar quantization algorithm needs 31 bits/frame to maintain the transparent quality of speech. The improved vector-scalar quantizer achieves an average spectral distortion of 1dB using 26 bits/frame. The proposed methods are evaluated in the channel errors and changed the predictor structure to maintain the robustness to channel errors.

  • PDF

Isolated-Word Recognition Using Adaptively Partitioned Multisection Codebooks (음성적응(音聲適應) 구간분할(區間分割) 멀티섹션 코드북을 이용(利用)한 고립단어인식(孤立單語認識))

  • Ha, Kyeong-Min;Jo, Jeong-Ho;Hong, Jae-Kuen;Kim, Soo-Joong
    • Proceedings of the KIEE Conference
    • /
    • 1988.07a
    • /
    • pp.10-13
    • /
    • 1988
  • An isolated-word recognition method using adaptively partitioned multisection codebooks is proposed. Each training utterance was divided into several sections according to its pattern extracted by labeling technique. For each pattern, reference codebooks were generated by clustering the training vectors of the same section. In recognition procedure, input speech was divided into the sections by the same method used in codebook generation procedure, and recognized to the reference word whose codebook represented the smallest average distortion. The proposed method was tested for 100 Korean words and attained recognition rate about 96 percent.

  • PDF

A Low Rate VQ Speech Coding Algorithm with Variable Transmission Frame Length (가변 전송 Frame 길이를 갖는 저 전송속도 VQ 음성부호화 알고리즘에 대한 연구)

  • 좌정우;이성로;이황수
    • The Journal of the Acoustical Society of Korea
    • /
    • v.12 no.1E
    • /
    • pp.32-38
    • /
    • 1993
  • 본 논문에서는 저 전송속도의 음성 부호화기를 제안하였고 컴퓨터 시뮬레이션을 통하여 성능분석과 유연성을 입증하였다. 제안된 부호화 방식은 입력 음성신호의 Stationarity에 따라 전송 프레임의 길이를 가변하고, 전송 프레임의 대표적인 특징 벡터를 Vector Quatization으로 부호화하였다. 제안된 부호화 방식에서 특징 벡터열은 입력 음성신호를 샘플단위로 Prewindowed RLS Lattice 알고리즘을 통해 구한 PARCOR 계수로 구성된다. 입력 음성신호는 Subsegment로 분할되고, 각 Subsegment에서 대표적인 PARCOR 계수를 구한다. Likelihood Ratio Distortion Measure를 사용하여 유사도에 따라 Subsegment를 병합함으로써 전송프레임을 결정한다. 컴퓨터 시뮬레이션 결과로부터 제안된 VTEL 음성 부호화 방식은 좋은 음질을 유지하면서 전체 전송속도를 크게 줄일 수 있다.

  • PDF

Text-to-Speech Synthesizer with the Process of Minimizing Concatenation Distortion (접합 왜곡의 최소화 과정이 포함된 음성합성기)

  • 박훈재;김상훈;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.17 no.4
    • /
    • pp.38-44
    • /
    • 1998
  • 대용량의 음성합성용 데이터베이스를 용이하게 구축하기 위해 음성인식 시스템을 이용한 음소 경계 분할이 이루어지고 있다. 그러나 자동 분할 결과를 직접 이용하여 합성음 을 생성할 경우 음소 경계 에러로 인하여 접합 왜곡이 많이 발생하게 된다. 이러한 문제를 해결하기 위해서, 본 연구에서는 단위 접합시 경계 에러를 고려하여 적합한 접합 위치를 찾 고자 하였다. 여기서 적합한 접합 위치는 스펙트럼의 불연속이 최소화된 접합점을 의미한다. 합성음에 대한 MOS(Mean Opinion Score) 테스트와 스펙트로그램(spectrogram)의 모양을 비교하므로써 제안된 방법의 성능을 평가하였다. 제안된 방법은 두 단계로 이루어져 있다. 첫째, 레퍼런스 패턴(reference pattern)과 두 개의 테스트 패턴(test pattern)을 선택하는 단 계와, 둘째, 앞과 뒤 테스트 패턴 사이의 적합한 접합위치를 찾는 단계이다. 본 연구에서는 패턴 사이의 스펙트로그램 비교를 위해 켑스트럼(cepstrum) 피라미터와 패턴 분류기 (pattern classifier)인 DTW(Dynamic Time Warping) 알고리즘을 사용하였다. 제안된 알고 리즘을 평가한 청취 테스트의 결과에서 제안된 알고리즘을 적용하여 합성된 합성음의 음질 이 자동 분절로 생성된 단위를 그대로 이용한 경우의 음질보다 우수함을 보였다.

  • PDF

A Study on Reduction of Computation Time through Adjustment the Frequency Interval Information in the G.723.1 Vocoder (G.723.1 보코더에서 주파수 간격 정보조절을 통한 계산량 감소에 관한 연구)

  • 민소연;김영규;배명진
    • Proceedings of the IEEK Conference
    • /
    • 2002.06d
    • /
    • pp.405-408
    • /
    • 2002
  • LSP(Line Spectrum Pairs) Parameter is used for speech analysis in vocoders or recognizers since it has advantages of constant spectrum sensitivity. low spectrum distortion and easy linear interpolation. However the method of transforming LPC(Linear Predictive Coding) into LSP is so complex that it takes much time to compute. Among conventional methods, the real root method is considerably simpler than others, but nevertheless, it still suffers from its jndeterministic computation time because the root searching is processed sequentially in frequency region. We suggest a method of reducing the LSP transformation time using voice characteristics The proposed method is to apply search order and interval differently according to the distribution of LSP parameters. in comparison with the conventional real root method, the proposed method results in about 46.5% reduction. And, the total computation time is reduce to about 5% in the G.723.1 vocoder.

  • PDF