• 제목/요약/키워드: speech quality

검색결과 807건 처리시간 0.021초

A Study on LMS-MPC Method Considering Low Bit Rate (Low Bit Rate을 고려한 LMS-MPC 방식에 관한 연구)

  • Lee, See-Woo
    • Journal of Digital Convergence
    • /
    • 제10권5호
    • /
    • pp.233-238
    • /
    • 2012
  • In a speech coding system using excitation source of voiced and unvoiced, it would be a distortion of speech waveform in case of exist a voiced and an unvoiced consonants in a frame. To solve this problem, this paper present a method of LMS-MPC uses individual pitch and LMS(Least Mean Square). I evaluate the MPC and LMS-MPC using LMS. As a result, SNRseg of LMS-MPC was improved 1.5dB for female voice and 1.3dB for male voice respectively. Compared to the MPC, SNRseg of LMS-MPC has been improved that I was able to control the distortion of the speech waveform finally. And so, I expect to be able to this method for cellular phone and smart phone using excitation source of low bit rate.

The V/UV Decision Algorithm for a Reduction of the Transmission Bit Rate in the CELP Vocoder (CELP 음성부호화기 전송률 감소를 위한 음성신호의 V/UV 결정 알고리즘)

  • Min, So-Yeon;Kim, Hyun-Chul
    • Journal of Advanced Navigation Technology
    • /
    • 제11권1호
    • /
    • pp.87-92
    • /
    • 2007
  • The conventional CELP(code excited linear prediction) type vocoder has no V/UV(voiced/unvoiced) classifier. So, the unvoiced speech is processed like the voiced speech. In this paper, to reduce the bit rate, we propose a new V/UV decision algorithm minimized error rate and preprocessing computation. This V/UV classifier use the LSP(line spectrum pair) parameter which is acquired spectrum analysis process in CELP vocoders. Applying this method to the 5.3kbps ACELP(algebraic code excited linear prediction) in the G.723.1, we can get the transmission bits rate reduction of 6% approximately without degradation of speech quality.

  • PDF

Design of EVRC LSP Codebooks with Korean (한국어에 의한 EVRC LSP 코드북 설계)

  • 이진걸
    • The Journal of the Acoustical Society of Korea
    • /
    • 제21권2호
    • /
    • pp.167-172
    • /
    • 2002
  • The EVRC (Enhanced Variable Rate Codec) is currently in service as a speech cosec in digital cellular systems in North America and Korea. In the EVRC, the LSP (Line Spectral Pairs) related to energy distribution of speech signals in the frequency domain are coded by weighted split vector quantization. Considering that the LSP codebooks might be trained with the language of the develop country of the codebooks or English, it is expected that codebooks trained with Korean provide the performance improvements in the communication in Korean. In this paper, the EVRC LSP codebooks are designed with korean adopting the LBG algorithm based vector quantization, and the performance improvement of the vector quantization and the accompanying speech quality improvement are demonstrated by spectral distortion, SNR and SegSNR measurements, respectively.

A New Morphological Analysis for the Spoken Language Translation System (음성언어 번역 시스템을 위한 새로운 형태소 분석)

  • 양승원;김재훈
    • The Journal of the Acoustical Society of Korea
    • /
    • 제18권4호
    • /
    • pp.17-22
    • /
    • 1999
  • It is difficult to integrate the speech processing systems and machine translation system in the spoken language translation system by reason that each system uses its own data and basic processing unit. So, we need a common I/O unit which is used in the whole system. In this paper, we propose a Pscudo-Morpheme as the interface between speech processing systems and language translation system. We implement a morphological analysis system for Pseudo-morpheme. The speech processing system using this pseudo-morpheme can get better result than other systems using the phrase or the general morpheme. So, the quality of the whole spoken language translation system can be improved. The analysis-ratio of our implemented system is 98.9%. This is similar to the common morphological analysis systems.

  • PDF

Computer Codes for Korean Sounds: K-SAMPA

  • Kim, Jong-mi
    • The Journal of the Acoustical Society of Korea
    • /
    • 제20권4E호
    • /
    • pp.3-16
    • /
    • 2001
  • An ASCII encoding of Korean has been developed for extended phonetic transcription of the Speech Assessment Methods Phonetic Alphabet (SAMPA). SAMPA is a machine-readable phonetic alphabet used for multilingual computing. It has been developed since 1987 and extended to more than twenty languages. The motivating factor for creating Korean SAMPA (K-SAMPA) is to label Korean speech for a multilingual corpus or to transcribe native language (Ll) interfered pronunciation of a second language learner for bilingual education. Korean SAMPA represents each Korean allophone with a particular SAMPA symbol. Sounds that closely resemble it are represented by the same symbol, regardless of the language they are uttered in. Each of its symbols represents a speech sound that is spectrally and temporally so distinct as to be perceptually different when the components are heard in isolation. Each type of sound has a separate IPA-like designation. Korean SAMPA is superior to other transcription systems with similar objectives. It describes better the cross-linguistic sound quality of Korean than the official Romanization system, proclaimed by the Korean government in July 2000, because it uses an internationally shared phonetic alphabet. It is also phonetically more accurate than the official Romanization in that it dispenses with orthographic adjustments. It is also more convenient for computing than the International Phonetic Alphabet (IPA) because it consists of the symbols on a standard keyboard. This paper demonstrates how the Korean SAMPA can express allophonic details and prosodic features by adopting the transcription conventions of the extended SAMPA (X-SAMPA) and the prosodic SAMPA(SAMPROSA).

  • PDF

Analysis of acoustical characteristic changes in voice after drinking and singing (음주 및 가창 후 음성의 음향학적 특성 변화 분석)

  • Hwang, Bo-Myung;Noh, Dong-Woo;Paik, Eun-A;Jeong, Ok-Ran
    • Speech Sciences
    • /
    • 제8권2호
    • /
    • pp.39-48
    • /
    • 2001
  • The purpose of this study was to examine changes in acoustic characteristics after drinking alcoholic beverages and singing in order to establish guidelines for vocal hygiene of both singers and non-singers. 21 university students (10 males and 11 females) vocalized /a/ before drinking, after drinking and after singing. Changes in vocal range and acoustic characteristics were analyzed by Dr. Speech 4.0 (Tigers Electronics). No significant difference was observed in vocal range following drinking. However, there was statistically significant changes in vocal range after singing. We may infer that appropriate amount of singing functioning as vocal warm-up, rather than drinking alone, resulted in improvement in their abilities to lengthen vocal folds. This is directly related to the ability to produce high-pitched sounds. Changes in jitter in female voices after singing was the only acoustic factor that was significant. Changes in Shimmer and NNE was not significant either after drinking nor singing. Subjects who were judged to perform better in singing were marked by minimum acoustic changes, which may due to their well-trained vocal fold function. The results of this study may address the necessity for vocal function exercises for the patients with neurogenic voice disorders including dysarthria. The need for more extensive research with a larger number of subjects including professional voice users is also addressed.

  • PDF

Comparative Analysis of Performance of Established Pitch Estimation Methods in Sustained Vowel of Benign Vocal Fold Lesions (양성후두 질환의 지속모음을 대상으로 한 기존 피치 추정 방법들의 성능 비교 분석)

  • Jang, Seung-Jin;Kim, Hyo-Min;Choi, Seong-Hee;Park, Young-Cheol;Choi, Hong-Shik;Yoon, Young-Ro
    • Speech Sciences
    • /
    • 제14권4호
    • /
    • pp.179-200
    • /
    • 2007
  • In voice pathology, various measurements calculated from pitch values are proposed to show voice quality. However, those measurements frequently seem to be inaccurate and unreliable because they are based on some wrong pitch values determined from pathological voice data. In order to solve the problem, we compared several pitch estimation methods to propose a better one in pathological voices. From the database of 99 pathological voice and 30 normal voice data, errors derived from pitch estimation were analyzed and compared between pathological and normal voice data or among the vowels produced by patients with benign vocal fold lesions. Results showed that gross pitch errors were observed in the cases of pathological voice data. From the types of pathological voices classified by the degree of aperiodicity in the speech signals, we found that pitch errors were closely related to the number of aperiodic segments. Also, the autocorrelation approach was found to be the most robust pitch estimation in the pathological voice data. It is desirable to conduct further research on the more severely pathological voice data in order to reduce pitch estimation errors.

  • PDF

Robust, Low Delay Multi-tree Speech Coding at 9.6Kbits/sec (견실, 저지연 멀티트리 9.6Kbits/s 음성부호기에 관한 연구)

  • 우홍체;문병현;이채욱
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • 제18권3호
    • /
    • pp.348-354
    • /
    • 1993
  • In this research, a multi-tree coder at 9.6Kbits/sec using a novel scheme for adaptation of the short-term coefficients is developed. The overall delay of the tree coder is maintained at 2.5 msec(16 samples at the 6.4KHz sampling frequency). This coder produces good quality speech over ideal channels, and it is very robust to channel errors up to a bit error rate (BER) of $10^{-3}$. This robustness is achieved by using a parallel adaptation scheme in combination with the use of a smoothed version of the received excitation sequence for adaptation of the short-term prediction coefficients. For the multi-tree coder, reconstructed output speech is evaluated using signal-to-quantization noise ratios (SNR), segmental SNRs, and informal listening tests.

  • PDF

Effects of Lax Vox voice therapy in a patient with spasmodic dysphonia: A case report (연축성 발성장애 환자의 Lax Vox 음성치료 효과)

  • Lim, Hye Jin;Choi, Seong Hee;Kim, Jeong Kyu;Choi, Chul-Hee
    • Phonetics and Speech Sciences
    • /
    • 제8권2호
    • /
    • pp.57-63
    • /
    • 2016
  • Recently, the Lax Vox voice therapy has been used as one of the SOVTE(Semi-Occluded Vocal Tracts Exercise). The purpose of this study was to explore the effect of Lax Vox voice therapy for a patient with Spasmodic dysphonia on voice improvement. One female spasmodic dysphonia patient(age=27) who had been diagnosed by a laryngologist received Lax Vox voice therapy. The Lax Vox protocol was configured as 5 steps (1 warm-up and 4 steps : bubbling without / with phonation/ gliding with phonation/ generalization) in this study. A total of 11 sessions were performed by a certified speech language pathologist. The present study evaluated the acoustic, aerodynamic, auditory perceptual, and patient's self-rating between pre-, mid-, and post- voice therapy. All objective and subjective parameters were improved after voice therapy; Reduced frequency variation, increased maximum phonation time, enlarged voice range, improved 'G' and 'S' in GRBAS & USDRS, and reduced VHI were observed. Especially, decreased $f_0$ and remarkably reduced voice tremor were also demonstrated following Lax Vox voice therapy. Accordingly, Lax Vox voice therapy technique can be useful for improving voice and quality of life in patients with spasmodic dysphonia.

Speech Enhancement Based on IMCRA Incorporating noise classification algorithm (잡음 환경 분류 알고리즘을 이용한 IMCRA 기반의 음성 향상 기법)

  • Song, Ji-Hyun;Park, Gyu-Seok;An, Hong-Sub;Lee, Sang-Min
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • 제61권12호
    • /
    • pp.1920-1925
    • /
    • 2012
  • In this paper, we propose a novel method to improve the performance of the improved minima controlled recursive averaging (IMCRA) in non-stationary noisy environment. The conventional IMCRA algorithm efficiently estimate the noise power by averaging past spectral power values based on a smoothing parameter that is adjusted by the signal presence probability in frequency subbands. Since the minimum of smoothing parameter is defined as 0.85, it is difficult to obtain the robust estimates of the noise power in non-stationary noisy environments that is rapidly changed the spectral characteristics such as babble noise. For this reason, we proposed the modified IMCRA, which adaptively estimate and updata the noise power according to the noise type classified by the Gaussian mixture model (GMM). The performances of the proposed method are evaluated by perceptual evaluation of speech quality (PESQ) and composite measure under various environments and better results compared with the conventional method are obtained.