• Title/Summary/Keyword: speech quality

Search Result 807, Processing Time 0.022 seconds

Frequency Bin Alignment Using Covariance of Power Ratio of Separated Signals in Multi-channel FD-ICA (다채널 주파수영역 독립성분분석에서 분리된 신호 전력비의 공분산을 이용한 주파수 빈 정렬)

  • Quan, Xingri;Bae, Keunsung
    • Phonetics and Speech Sciences
    • /
    • v.6 no.3
    • /
    • pp.149-153
    • /
    • 2014
  • In frequency domain ICA, the frequency bin permutation problem falls off the quality of separated signals. In this paper, we propose a new algorithm to solve the frequency bin permutation problem using the covariance of power ratio of separated signals in multi-channel FD-ICA. It makes use of the continuity of the spectrum of speech signals to check if frequency bin permutation occurs in the separated signal using the power ratio of adjacent frequency bins. Experimental results have shown that the proposed method could fix the frequency bin permutation problem in the multi-channel FD-ICA.

Perceptual Experiment on Number Production for Speaker Identification

  • Yang, Byung-Gon
    • Speech Sciences
    • /
    • v.8 no.1
    • /
    • pp.7-19
    • /
    • 2001
  • The acoustic parameters of nine Korean numbers were analyzed by Praat, a speech analysis software, and synthesized by SenSynPPC, a Klatt formant synthesizer. The overall intensity, pitch and formant values of the numbers were modified dynamically by a step of 1 dB, 1 Hz and 2.5% respectively. The study explored the sensitivity of listeners to changes in the three acoustic parameters. Twelve subjects (male and female) listened to 390 pairs of synthesized numbers and judged whether the given pair sounded the same or different. Results showed that subjects perceived the same sound quality within the range of 6.6 dB of intensity variation, 10.5 Hz of pitch variation and 5.9% of the first three formant variations. The male and female groups showed almost the same perceptual ranges. Also, an asymmetrical structure of high and low boundary was observed. The ranges may be applicable to the development of a speaker identification system while the method of synthesis modification may apply to its evaluation data.

  • PDF

On Altering the Pitch of Speech Signals in Waveform Coding -(Altering Method by the LPC and the Pitch Halving)- (음성 파형코딩의 음원피치 변경에 관한 연구 - LPC와 주기반분법에 의한 피치변경법 -)

  • 민경중
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1991.06a
    • /
    • pp.45-49
    • /
    • 1991
  • In area of the speech synthesis, the waveform coding with high quality are mainly used to the synthesis by analysis. However, it is difficult to applying the waveform coding to the synthesis by rule, because the parameters of this coding are not classified as either excitation parameters and vocal tract parameters. In this paper, we proposed a new pitch change method that can alter the pitch periods in the waveform coding. The proposed method expands the pitch period by the LPC synthesis method, and then the period is compressed by the waveform halving technique. Thus, it is possible that the waveform coding is carried out the synthesis by rule in speech processing.

  • PDF

The Relationship between Acoustic Characteristics and Voice Handicap Index in Esophageal Speakers (식도발성 환자의 음향학적 특성과 음성장애지수의 상관성)

  • Jang, Hyo-Ryung;Shim, Hee-Jeong;Shin, Hee-Baek;Ko, Do-Heung;Kim, Hyun-Ki
    • Phonetics and Speech Sciences
    • /
    • v.6 no.2
    • /
    • pp.115-121
    • /
    • 2014
  • This paper investigates the relationship between acoustic characteristics and voice handicap index for 29 males with esophageal speakers. Acoustic characteristics were measured by using a sustained vowel /a/ three times. The stable vocalization for 2 seconds was analyzed by MDVP program. Specifically, relationships between four VHI scores (total, functional, physical, and emotional) and three acoustic characteristics (jitter, shimmer, and NHR) were investigated using the Pearson correlation coefficient. As results, we found no relationship between NHR and VHI scores. However, both jitter and shimmer had statistically significant correlations with all four VHI scores. This research will contribute to establishing a baseline related to speech characteristics in voice rehabilitation with esophageal speakers. Further research could be done to examine the overall quality of life survey, which is widely used as a subjective measure about voice for patients with esophageal speakers.

POSTTS : Corpus Based Korean TTS based on Natural Language Analysis (POSTTS : 자연어 분석을 통한 코퍼스 기반 한국어 TTS)

  • Ha Ju-Hong;Zheng Yu;Kim Byeongchang;Lee Geunbae Lee
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.87-90
    • /
    • 2003
  • In order to produce high quality synthesized speech, it is very important to get an accurate grapheme-to-phoneme conversion and prosody model from texts using natural language processing. Robust preprocessing for non-Korean characters should also be required. In this paper, we analyzed Korean texts using a morphological analyzer, part-of-speech tagger and syntactic chunker. We present a new grapheme-to-phoneme conversion method, i.e. a dictionary-based and rule-based hybrid method, for unlimited vocabulary Korean TTS. We constructed a prosody model using a probabilistic method and decision tree-based method.

  • PDF

GMM based Nonlinear Transformation Methods for Voice Conversion

  • Vu, Hoang-Gia;Bae, Jae-Hyun;Oh, Yung-Hwan
    • Proceedings of the KSPS conference
    • /
    • 2005.11a
    • /
    • pp.67-70
    • /
    • 2005
  • Voice conversion (VC) is a technique for modifying the speech signal of a source speaker so that it sounds as if it is spoken by a target speaker. Most previous VC approaches used a linear transformation function based on GMM to convert the source spectral envelope to the target spectral envelope. In this paper, we propose several nonlinear GMM-based transformation functions in an attempt to deal with the over-smoothing effect of linear transformation. In order to obtain high-quality modifications of speech signals our VC system is implemented using the Harmonic plus Noise Model (HNM)analysis/synthesis framework. Experimental results are reported on the English corpus, MOCHA-TlMlT.

  • PDF

Acoustic characteristics of the sustained vowel phonation according to age groups (모음 연장 발성이 보이는 연령대별 음향음성학적 특성 연구)

  • Seo, Yoon-Jeong;Shin, Jiyoung
    • Phonetics and Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.67-76
    • /
    • 2018
  • This study was performed to investigate acoustic characteristics of sustained vowels produced by Seoul Korean speakers. For this study, three hundred nine healthy adults were chosen as participants from Korean Standard Speech Database. These subjects were divided into five chronological age groups (20s, 30s, 40s, 50s, 60-70s) and two gender groups (male and female). Fundamental frequency (f0), jitter, shimmer, and NHR (noise-to-harmonics ratio) was measured with 8 Korean vowels (/ɑ/, /æ/, /ʌ/, /e/, /o/, /u/, /ɯ/, /i/) by using Praat. The results showed that the vowel type significantly affected all acoustic parameters. Gender affected f0, jitter, and NHR significantly. The mean female speakers' f0 was greater than the males', and the mean jitter and NHR of male speakers was greater than the females'. Moreover, age affected shimmer and NHR significantly; in particular, the shimmer and NHR of elderly speakers was greater than the young speakers.

A New Wideband Speech/Audio Coder Interoperable with ITU-T G.729/G.729E (ITU-T G.729/G.729E와 호환성을 갖는 광대역 음성/오디오 부호화기)

  • Kim, Kyung-Tae;Lee, Min-Ki;Youn, Dae-Hee
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.45 no.2
    • /
    • pp.81-89
    • /
    • 2008
  • Wideband speech, characterized by a bandwidth of about 7 kHz (50-7000 Hz), provides a substantial quality improvement in terms of naturalness and intelligibility. Although higher data rates are required, it has extended its application to audio and video conferencing, high-quality multimedia communications in mobile links or packet-switched transmissions, and digital AM broadcasting. In this paper, we present a new bandwidth-scalable coder for wideband speech and audio signals. The proposed coder spits 8kHz signal bandwidth into two narrow bands, and different coding schemes are applied to each band. The lower-band signal is coded using the ITU-T G.729/G.729E coder, and the higher-band signal is compressed using a new algorithm based on the gammatone filter bank with an invertible auditory model. Due to the split-band architecture and completely independent coding schemes for each band, the output speech of the decoder can be selected to be a narrowband or wideband according to the channel condition. Subjective tests showed that, for wideband speech and audio signals, the proposed coder at 14.2/18 kbit/s produces superior quality to ITU-T 24 kbit/s G.722.1 with the shorter algorithmic delay.

The Acoustic Severity Index in the Pathologic Voice (음성장애에 대한 음향학적 중등도 지표)

  • Hong, Ki-Hwan;Kim, Hyun-Ki;Yang, Yoon-Soo
    • Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.201-219
    • /
    • 2003
  • Background: The perceptual assessment is generally performed by the voice specialist. The objective evaluation is performed in a voice laboratory. Research in voice laboratories has generated a variety of different objective tests and parameters. The perceptual evaluation is one of the most controversial topics in voice research. Review of literature reveals a wide variety of rating scales and reliability data fluctuating from study to study. Unfortunately, there is no widely accepted valid method for classifying voice disorders and assessing outcome after voice treatment. Objectives: The goals of this research were to identify important objective acoustic parameters of vocal quality, and to establish an objective and quantitative correlate of the perceived vocal quality. Materials and Methods : We evaluated the voice analyzed data from 122 dysphonic patients and 20 normal volunteers. A computerized speech lab. 4300B(CSL) was used to carry out the analysis of each voice sample. Results: Three dysphonia severity indices(DSI) were created using discriminant analysis. DSI is based on the weighted combination of the following selected set of acoustic parameters: absolute jitter(Jita in us), smoothed pitch period perturbation (sPPQ in %), amplitude perturbation quotient(APQ in %), soft phonation index(SPI), average fundamental frequency(Fo in Hz), lowest fundamental frequency(Flo in Hz), and smoothed amplitude perturbation quotient(sAPQ in %). The DSI, being the discriminating rule calculated by the logistic regression, consists of three equation based on statistically significant acoustic parameters. Three DSI were created to reflects best the degree of hoarseness as expressed by G from the GRBAS scale. The more positive this DSI is for a patient, the worse the vocal quality. The more it is negative, the better it is. The effect of sex is included implicitly in the DSI-1 and DSI-2, so that a separate DSI-1 and DSI-2 for males and females need not be used. The DSI is objective because no perceptual input is required for its calculation. Conculsion : This research demonstrates that the voice function values calculated from three different multivariate objective dysphonia severity indices are significantly associated with subjective voice assessments. These multivariate objective dysphonia severity indices may be appropriate for use in clinical trials and outcomes research on treatment effectiveness for voice disorders.

  • PDF