• Title/Summary/Keyword: speech quality

Search Result 807, Processing Time 0.026 seconds

Very Low Bit Rate Speech Coder of Analysis by Synthesis Structure Using ZINC Function Excitation (ZINC 함수 여기신호를 이용한 분석-합성 구조의 초 저속 음성 부호화기)

  • Seo, Sang-Won;Kim, Young-Jun;Kim, Jong-Hak;Kim, Young-Ju;Lee, In-Sung
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.349-350
    • /
    • 2006
  • This paper presents very low bit rate speech coder, ZFE-CELP(ZINC Function Excitation-Code Excited Linear Prediction). The ZFE-CELP speech codec is based on a ZINC function and CELP modeling of the excitation signal respectively according to the frame characteristic such as a voiced speech and an unvoiced speech. And this paper suggest strategies to improve the speech quality of the very low bit rate speech coder.

  • PDF

The Robot Speech Recognition using TMS320VC5510 DSK (TMS320VC5510 DSK를 이용한 음성인식 로봇)

  • Choi, Ji-Hyun;Chung, Ik-Joo
    • Journal of Industrial Technology
    • /
    • v.27 no.A
    • /
    • pp.211-218
    • /
    • 2007
  • As demands for interaction of humans and robots are increasing, robots are expected to be equipped with intelligibility which humans have. Especially, for natural communication, hearing capabilities are so essential that speech recognition technology for robot is getting more important. In this paper, we implement a speech recognizer suitable for robot applications. One of the major problem in robot speech recognition is poor speech quality captured when a speaker talks distant from the microphone a robot is mounted with. To cope with this problem, we used wireless transmission of commands recognized by the speech recognizer implemented using TMS320VC5510 DSK. In addition, as for implementation, since TMS320VC5510 DSP is a fixed-point device, we represent efficient realization of HMM algorithm using fixed-point arithmetic.

  • PDF

Implementation of Formant Speech Analysis/Synthesis System (포만트 분석/합성 시스템 구현)

  • Lee, Joon-Woo;Son, Ill-Kwon;Bae, Keuo-Sung
    • Speech Sciences
    • /
    • v.1
    • /
    • pp.295-314
    • /
    • 1997
  • In this study, we will implement a flexible formant analysis and synthesis system. In the analysis part, the two-channel (i.e., speech & EGG signals) approach is investigated for accurate estimation of formant information. The EGG signal is used for extracting exact pitch information that is needed for the pitch synchronous LPC analysis and closed phase LPC analysis. In the synthesis part, Klatt formant synthesizer is modified so that the user can change synthesis parameters arbitarily. Experimental results demonstrate the superiority of the two-channel analysis method over the one-channel(speech signal only) method in analysis as well as in synthesis. The implemented system is expected to be very helpful for studing the effects of synthesis parameters on the quality of synthetic speech and for the development of Korean text-to-speech(TTS) system with the formant synthesis method.

  • PDF

A Study on Voice Color Control Rules for Speech Synthesis System (음성합성시스템을 위한 음색제어규칙 연구)

  • Kim, Jin-Young;Eom, Ki-Wan
    • Speech Sciences
    • /
    • v.2
    • /
    • pp.25-44
    • /
    • 1997
  • When listening the various speech synthesis systems developed and being used in our country, we find that though the quality of these systems has improved, they lack naturalness. Moreover, since the voice color of these systems are limited to only one recorded speech DB, it is necessary to record another speech DB to create different voice colors. 'Voice Color' is an abstract concept that characterizes voice personality. So speech synthesis systems need a voice color control function to create various voices. The aim of this study is to examine several factors of voice color control rules for the text-to-speech system which makes natural and various voice types for the sounding of synthetic speech. In order to find such rules from natural speech, glottal source parameters and frequency characteristics of the vocal tract for several voice colors have been studied. In this paper voice colors were catalogued as: deep, sonorous, thick, soft, harsh, high tone, shrill, and weak. For the voice source model, the LF-model was used and for the frequency characteristics of vocal tract, the formant frequencies, bandwidths, and amplitudes were used. These acoustic parameters were tested through multiple regression analysis to achieve the general relation between these parameters and voice colors.

  • PDF

VoIP Planning and Evaluation through the Analysis of Speech Transmission Quality Based on the E-Model (E-모델 기반 통화 품질 분석을 통한 VoIP Planning 및 평가)

  • Bae Seong Yong;Kim Kwang Hoon
    • Journal of Internet Computing and Services
    • /
    • v.5 no.6
    • /
    • pp.31-43
    • /
    • 2004
  • Voice over Internet Protocol (VoIP) is currently a popular research topic as a real time voice packet transmission method. But current Internet environment do not guarantee the quality of voice when we take a side view of delay, jitter and loss. Up to now, many voice based evaluation algorithms have been used to measure speech quality of VoIP systems. However, these algorithms have the defects that their results are different according to voice samples and some algorithms can not take network environment for speech transmission path. The E-model can be used to solve the problems of these algorithms. In this paper. we introduce VoIP planning guidelines through the various analysis of E-model which can model impairments of network quality as well as VoIP equipment quality systematically, We, also, show the evaluation method and results of speech transmission quality.

  • PDF

VoIP Receiver Structure for Enhancing Speech Quality Based on Telematics (텔레메틱스 기반의 VoIP 음성 통화품질 향상을 위한 수신단 구조)

  • Kim, Hyoung-Gook;Seo, Kwang-Duk
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.11 no.3
    • /
    • pp.48-54
    • /
    • 2012
  • The quality of real-time voice communication over Internet Protocol networks based on telematics is affected by network impairments such as delays, jitters, and packet loss. To resolve this issue, this paper proposes a receiver-based enhancing method of VoIP speech quality. The proposed method enables users to deliver high-quality voice using playout control and signal reconstruction, which consists of concealment of lost packets, adaptive playout-buffer scheduling using active jitter estimation, and smooth interpolation between two signals in a transition region. The proposed algorithm achieves higher Perceptual Evaluation of Speech Quality (PESQ) values and low buffering delay than the reference algorithm.

Nonlinear Speech Enhancement Method for Reducing the Amount of Speech Distortion According to Speech Statistics Model (음성 통계 모형에 따른 음성 왜곡량 감소를 위한 비선형 음성강조법)

  • Choi, Jae-Seung
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.16 no.3
    • /
    • pp.465-470
    • /
    • 2021
  • A robust speech recognition technology is required that does not degrade the performance of speech recognition and the quality of the speech when speech recognition is performed in an actual environment of the speech mixed with noise. With the development of such speech recognition technology, it is necessary to develop an application that achieves stable and high speech recognition rate even in a noisy environment similar to the human speech spectrum. Therefore, this paper proposes a speech enhancement algorithm that processes a noise suppression based on the MMSA-STSA estimation algorithm, which is a short-time spectral amplitude method based on the error of the least mean square. This algorithm is an effective nonlinear speech enhancement algorithm based on a single channel input and has high noise suppression performance. Moreover this algorithm is a technique that reduces the amount of distortion of the speech based on the statistical model of the speech. In this experiment, in order to verify the effectiveness of the MMSA-STSA estimation algorithm, the effectiveness of the proposed algorithm is verified by comparing the input speech waveform and the output speech waveform.

A Personal Sound Amplification Product Compared to a Basic Hearing Aid for Speech Intelligibility in Adults with Mild-to-Moderate Sensorineural Hearing Loss

  • Choi, Ji Eun;Kim, Jinryoul;Yoon, Sung Hoon;Hong, Sung Hwa;Moon, Il Joon
    • Journal of Audiology & Otology
    • /
    • v.24 no.2
    • /
    • pp.91-98
    • /
    • 2020
  • Background and Objectives: This study aimed to compare functional hearing with the use of a personal sound amplification product (PSAP) or a basic hearing aid (HA) among sensorineural hearing impaired listeners. Subjects and Methods: Nineteen participants with mild-to-moderate sensorineural hearing loss (SNHL) (26-55 dB HL; pure-tone average, 0.5-4 kHz) were prospectively included. No participants had prior experience with HAs or PSAPs. Audiograms, speech intelligibility in both quiet and noisy environments, speech quality, and preference were assessed in three different listening conditions: unaided, with the HA, and with the PSAP. Results: The use of PSAP was associated with significant improvement in pure-tone thresholds at 1, 2, and 4 kHz compared to the unaided condition (all p<0.01). In the quiet environment, speech intelligibility was significantly improved after wearing a PSAP compared to the unaided condition (p<0.001), and this improvement was better than the result obtained with the HA. The PSAP also demonstrated similar improvement in the most comfortable levels compared to those obtained with the HA (p<0.05). However, there was no significant improvement of speech intelligibility in a noisy environment when wearing the PSAP (p=0.160). There was no significant difference in the reported speech quality produced by either device or in participant preference for the PSAP or HA. Conclusions: The current result suggests that PSAPs provide considerable benefits to speech intelligibility in a quiet environment and can be a good alternative to compensate for mild-to-moderate SNHL.

A Personal Sound Amplification Product Compared to a Basic Hearing Aid for Speech Intelligibility in Adults with Mild-to-Moderate Sensorineural Hearing Loss

  • Choi, Ji Eun;Kim, Jinryoul;Yoon, Sung Hoon;Hong, Sung Hwa;Moon, Il Joon
    • Korean Journal of Audiology
    • /
    • v.24 no.2
    • /
    • pp.91-98
    • /
    • 2020
  • Background and Objectives: This study aimed to compare functional hearing with the use of a personal sound amplification product (PSAP) or a basic hearing aid (HA) among sensorineural hearing impaired listeners. Subjects and Methods: Nineteen participants with mild-to-moderate sensorineural hearing loss (SNHL) (26-55 dB HL; pure-tone average, 0.5-4 kHz) were prospectively included. No participants had prior experience with HAs or PSAPs. Audiograms, speech intelligibility in both quiet and noisy environments, speech quality, and preference were assessed in three different listening conditions: unaided, with the HA, and with the PSAP. Results: The use of PSAP was associated with significant improvement in pure-tone thresholds at 1, 2, and 4 kHz compared to the unaided condition (all p<0.01). In the quiet environment, speech intelligibility was significantly improved after wearing a PSAP compared to the unaided condition (p<0.001), and this improvement was better than the result obtained with the HA. The PSAP also demonstrated similar improvement in the most comfortable levels compared to those obtained with the HA (p<0.05). However, there was no significant improvement of speech intelligibility in a noisy environment when wearing the PSAP (p=0.160). There was no significant difference in the reported speech quality produced by either device or in participant preference for the PSAP or HA. Conclusions: The current result suggests that PSAPs provide considerable benefits to speech intelligibility in a quiet environment and can be a good alternative to compensate for mild-to-moderate SNHL.

Performance Enhancement of Speech Declipping using Clipping Detector (클리핑 감지기를 이용한 음성 신호 클리핑 제거의 성능 향상)

  • Eunmi Seo;Jeongchan Yu;Yujin Lim;Hochong Park
    • Journal of Broadcast Engineering
    • /
    • v.28 no.1
    • /
    • pp.132-140
    • /
    • 2023
  • In this paper, we propose a method for performance enhancement of speech declipping using clipping detector. Clipping occurs when the input speech level exceeds the dynamic range of microphone, and it significantly degrades the speech quality. Recently, many methods for high-performance speech declipping based on machine learning have been developed. However, they often deteriorate the speech signal because of degradation in signal reconstruction process when the degree of clipping is not high. To solve this problem, we propose a new approach that combines the declipping network and clipping detector, which enables a selective declipping operation depending on the clipping level and provides high-quality speech in all clipping levels. We measured the declipping performance using various metrics and confirmed that the proposed method improves the average performance over all clipping levels, compared with the conventional methods, and greatly improves the performance when the clipping distortion is small.