• Title/Summary/Keyword: speech quality

Search Result 807, Processing Time 0.024 seconds

Encoding of Speech Spectral Parameters Using Adaptive Quantization Range Method

  • Lee, In-Sung;Hong, Chae-Woo
    • ETRI Journal
    • /
    • v.23 no.1
    • /
    • pp.16-22
    • /
    • 2001
  • Efficient quantization methods of the line spectrum pairs (LSP) which have good performances, low complexity and memory are proposed. The adaptive quantization range method utilizing the ordering property of LSP parameters is used in a scalar quantizer and a vector-scalar hybrid quantizer. As the maximum quantization range of each LSP parameter is varied adaptively on the quantized value of the previous order's LSP parameter, efficient quantization methods can be obtained. The proposed scalar quantization algorithm needs 31 bits/frame, which is 3 bits less per frame than in the conventional scalar quantization method with interframe prediction to maintain the transparent quality of speech. The improved vector-scalar quantizer achieves an average spectral distortion of 1 dB using 26 bits/frame. The performances of proposed quantization methods are also evaluated in the transmission errors.

  • PDF

The Relationship Between Voice and the Image Triggered by the Voice: American Speakers and American Listeners (목소리를 듣고 감지하는 인상에 대한 연구: 미국인화자와 미국인청자)

  • Moon, Seung-Jae
    • Phonetics and Speech Sciences
    • /
    • v.1 no.2
    • /
    • pp.111-118
    • /
    • 2009
  • The present study aims at investigating the relationship between voices and the physical images triggered by the voices. It is the final part of a four-part series and the results reported in the present study are limited to those of American speakers and American listeners. Combined with the results from previous studies (Moon, 2000; Moon, 2002; Tak, 2005), the results suggest that (1) there is a very strong, much higher than chance-level relationship between voices and the pictures chosen for the voices by the perception experiment subjects; (2) the more physical characteristics that are given, the better the chance for correctly matching voices with pictures; and (3) culture (in the present, language environment) seems to play a role in conjuring up the mental images from voices.

  • PDF

A Phonetic Study of 'Sasang Constitution' (음성학적으로 본 사상체질)

  • Moon Seung-Jae;Tak Ji-Hyun;Hwang Hyejeong
    • MALSORI
    • /
    • v.55
    • /
    • pp.1-14
    • /
    • 2005
  • Sasang Constitution, one branch of oriental medicine, claims that people can be classified into four different 'constitutions:' Taeyang, Taeum, Soyang, and Soeum. This study investigates whether the classification of the constitutions could be accurately made solely based on people's voice by analyzing the data from 46 different voices whose constitutions were already determined. Seven source-related parameters and four filter-related parameters were phonetically analyzed and the GMM(Gaussian mixture model) was tried on the data. Both the results from phonetic analyses and GMM showed that all the parameters (except one) failed to distinguish the constitutions of the people successfully. And even the single exception, B2 (the bandwidth of the second formant) did not provide us with sufficient reasons to be the source of distinction. This result seems to suggest one of the two conclusions: either the Sasang Constitutions cannot be substantiated with phonetic characteristics of peoples' voices with reliable accuracy, or we need to find yet some other parameters which haven't been conventionally proposed.

  • PDF

An Algorithm to Reduce the Pitch Computational Complexity Using Modified Delta Searching in G.723.1 Vocoder (CELP 보코더에서 델타 피치 검색 방법 개선에 대한 연구)

  • Min, So-Yeon;Bae, Myung-Jin
    • Speech Sciences
    • /
    • v.11 no.3
    • /
    • pp.165-172
    • /
    • 2004
  • In this paper, we propose the computational complexity reduction methods of delta pitch search that is used in G.723.1 vocoder. In order to decrease the computational complexity in delta pitch search the characteristic of proposed algorithms is as the following. First, scheme to reduce the computational complexity in delta pitch search uses NAMDF. Developed the second scheme is the skipping technique of lags in pitch searching by using the threshold value. By doing so, we can reduce the computational amount of pitch searching more than 64% with negligible quality degradation.

  • PDF

A Study on English Reduced Vowels Produced by Korean Learners and Native Speakers of English (한국인 영어학습자와 영어원어민이 발화한 영어 약화모음에 관한 연구)

  • Shin, Seung-Hoon;Yoon, Nam-Hee;Yoon, Kyu-Chul
    • Phonetics and Speech Sciences
    • /
    • v.3 no.4
    • /
    • pp.45-53
    • /
    • 2011
  • Flemming and Johnson (2007) claim that there is a fundamental distinction between the mid central vowel [ə] and the high central vowel [?] in that [ə] occurs in an unstressed word-final position while [?] appears elsewhere. Compared to English counterparts, Korean [ə] and [?] are full vowels and they have phonemic contrast. The purpose of this paper is to explore the acoustic quality of two English reduced vowels produced by Korean learners and native speakers of English in terms of their two formant frequencies. Sixteen Korean learners of English and six native speakers of English produced four types of English words and two types of Korean words with different phonological and morphological patterns. The results show that Korean learners of English produced the two reduced vowels of English and their Korean counterparts differently in Korean and English words.

  • PDF

A Study on Realizations of English Stress and Vowel Formant Frequency by Korean Learners (한국인 학습자의 영어 강세 실현과 모음 포먼트에 관한 연구)

  • Kim, Ji-Eun
    • Phonetics and Speech Sciences
    • /
    • v.6 no.1
    • /
    • pp.39-45
    • /
    • 2014
  • This study investigates twenty four Korean females' production of English front vowels focusing on the distinction in /i/ vs /ɪ/ and /ɛ/ vs /${\ae}$/ and formant values of stressed and unstressed vowels compared with those of native English speakers. The Korean learners were asked to read a textbook passage which includes ten sentences including target vowels. The major results indicate that: (1) Korean learners have trouble producing a distinct version (tense and lax) of front vowels in the paragraph reading; (2) The vowel space of the stressed vowels in a paragraph is smaller than that of embedded sentences; and (3) The vowel quality of the unstressed vowels produced by the Korean learners is similar to that of the native English speakers. The findings from this study can be applied to the pronunciation teaching for the Korean learners of English vowels and realization of English stress.

A Correlation Study between Acoustic and Perceptual Parameters of the Singing Voice in Singing Students (성악 전공 학생의 가창 시 음성의 음향학적 매개 변수와 지각적 매개 변수사이의 상관 연구)

  • Jo, Sung-Mi;Lee, Sang-Ouk;Jeong, Ok-Ran
    • Proceedings of the KSPS conference
    • /
    • 2004.05a
    • /
    • pp.219-222
    • /
    • 2004
  • The purpose of this study was to determine a correlation between acoustic and perceptual parameters of the singing voice in singing students and compare them with the results with previous studies, and a more sensitive parameters in analyzing professional vocal usage. This study measured acoustic and perceptual parameters in 41 singing students. Digital audio recordings were made in sung vowels acoustic analysis. Each sample was judged by 1 experienced singing teacher and 1 voice pathologist on two semantic bipolar 7-point scales (ringing-dull, rich-thin). The results showed that SPP1 (p<0.01), SPP2 (p<0.01), and P1(p<0.01) had significant correlations with ringing and richness quality.

  • PDF

Analyzing the element of emotion recognition from speech (음성으로부터 감성인식 요소 분석)

  • 박창현;심재윤;이동욱;심귀보
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2001.12a
    • /
    • pp.199-202
    • /
    • 2001
  • 일반적으로 음성신호로부터 사람의 감정을 인식할 수 있는 요소는 (1)대화의 내용에 사용한 단어, (2)톤 (Tone), (3)음성신호의 피치(Pitch), (4)포만트 주파수(Formant Frequency), 그리고 (5)말의 빠르기(Speech Speed) (6)음질(Voice Quality) 등이다. 사람의 경우는 주파수 같은 분석요소 보다는 론과 단어, 빠르기, 음질로 감정을 받아들이게 되는 것이 자연스러운 방법이므로 당연히 후자의 요소들이 감정을 분류하는데 중요한 인자로 쓰일 수 있다. 그리고, 종래는 주로 후자의 요소들을 이용하였는데, 기계로써 구현하기 위해서는 조금 더 공학적인 포만트 주파수를 사용할 수 있게 되는 것이 도움이 된다. 그러므로, 본 연구는 음성 신호로부터 피치와 포만트, 그리고 말의 빠르기 등을 이용하여 감성 인식시스템을 구현하는 것을 목표로 연구를 진행하고 있는데, 그 1단계 연구로서 본 논문에서는 화가 나서 내뱉는 알과 기쁠 때 간단하게 사용하는 말들을 기반으로 하여 극단적인 두 가지 감정의 독특한 특성을 찾아낸다.

  • PDF

Identifying Friendly and Foe Using a Watermarking Technique During Military Communication (군 통신상에서 워터마킹 기술을 이용한 피아식별 방법)

  • Lee, Jong-Kwan;Choi, Hyun-Joo
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.9 no.4
    • /
    • pp.81-89
    • /
    • 2006
  • In this paper, a watermark technique for identifying friendly and foe is proposed during communication. The speech signal is processed in several stages. First, speech signal is partitioned into small time frames and the frames are transformed into frequency domain using DFT(Discrete Frequency Transform). The DFT coefficients are quantized and the watermark signal is embedded into the quantized DFT coefficients. At the destination channel quantization errors of received signal are regarded as the watermark signal. Identification of friendly and foe are done by correlating the detected watermark and the original watermark. As in most other watermark techniques, this method has a trade off between noise robustness and quality. However, this is solved by a partial quantization and a noise level dependent quantization step. Simulation results in the various noisy environments show that the proposed method is reliable for identification between friendly and foe.

Improvement of Packet Loss Concealment Algorithm by Using state gain control and fixed codebook estimation (상태별 이득 제어 및 fixed codebook estimation을 이용한 G.729에서의 Packet Loss Concealment 알고리즘 개선)

  • Moon Kwang;Hahn Minsoo
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.109-112
    • /
    • 2003
  • In real time packetized voice applications, missing frames is a major source of voice quality degradation. Thus packet loss concealment(PLC) algorithms are needed to guarantee the QoS of the VoIP. Still current speech codecs for VoIP work poor when consecutive packet losses are issued. In this paper, we proposed a new PLC algorithm for the G.729 codec. Our algorithm works better especially when the consecutive packet loss occurs mainly because it adopts an adaptive gain controller utilizing the number of missing packet information combined with a fixed codebook vector estimation algorithm and LPC bandwidth expansion.

  • PDF