• Title/Summary/Keyword: speech quality

Search Result 807, Processing Time 0.021 seconds

On the Frequency Dependency of Sound Quality Factors (음질 요소의 주파수 의존성에 대하여)

  • 류윤선;최재원;조희복
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 1997.10a
    • /
    • pp.286-292
    • /
    • 1997
  • Sound quality is becoming the major concern in passenger vehicle. The study on it has been done recently but it is not good enough. In order to improve the sound quality in passenger vehicle, so many noise sources must be considered and human feeling to the noise also be taken into account. In this paper, the sound quality was analyzed by vehicle road test which was carried out with varying the traveling speed. As basic factors for sound quality, only objective factors are considered such as loudness, sharpness, speech intelligibility, sound pressure level ... etc. The relations between sound pressure level and other factors are discussed from a point of view of traveling speed dependency. The frequency dependency of sound quality factor is also analyzed by frequency analysis.

  • PDF

The Error Pattern Analysis of the HMM-Based Automatic Phoneme Segmentation (HMM기반 자동음소분할기의 음소분할 오류 유형 분석)

  • Kim Min-Je;Lee Jung-Chul;Kim Jong-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.5
    • /
    • pp.213-221
    • /
    • 2006
  • Phone segmentation of speech waveform is especially important for concatenative text to speech synthesis which uses segmented corpora for the construction of synthetic units. because the quality of synthesized speech depends critically on the accuracy of the segmentation. In the beginning. the phone segmentation was manually performed. but it brings the huge effort and the large time delay. HMM-based approaches adopted from automatic speech recognition are most widely used for automatic segmentation in speech synthesis, providing a consistent and accurate phone labeling scheme. Even the HMM-based approach has been successful, it may locate a phone boundary at a different position than expected. In this paper. we categorized adjacent phoneme pairs and analyzed the mismatches between hand-labeled transcriptions and HMM-based labels. Then we described the dominant error patterns that must be improved for the speech synthesis. For the experiment. hand labeled standard Korean speech DB from ETRI was used as a reference DB. Time difference larger than 20ms between hand-labeled phoneme boundary and auto-aligned boundary is treated as an automatic segmentation error. Our experimental results from female speaker revealed that plosive-vowel, affricate-vowel and vowel-liquid pairs showed high accuracies, 99%, 99.5% and 99% respectively. But stop-nasal, stop-liquid and nasal-liquid pairs showed very low accuracies, 45%, 50% and 55%. And these from male speaker revealed similar tendency.

Therapeutic Singing on Speech Production Parameters in Head and Neck Cancer Patients: Case Studies (치료적 노래부르기를 통한 두경부암 환자의 말산출 기능 향상 사례)

  • Kim, Ju Hee;Kim, Soo Ji
    • 재활복지
    • /
    • v.22 no.3
    • /
    • pp.189-208
    • /
    • 2018
  • This case study investigated the changes in speech intelligibility of patients with head and neck cancers who participated in a therapeutic singing-based intervention. Three patients received a total of twelve 30-minute individual sessions. The intervention consisted of three steps: movements for relaxing breathing muscles, vocalization for increasing the range of articulatory movements, and therapeutic singing. In order to examine the changes in speech intelligibility, the voice quality parameters, diadochokinesis (DDK) and the quadrangle vowel space area (VSA) were measured at pre- and posttest. The recording of what each patient read a written paragraph, which were transcribed by blinded assessors, were also analyzed. The results demonstrated that all of the patients showed positive changes in the voice quality, the rate of repetitive syllable production measured by DDK, and the articulatory working space measured by VSA. Along with these measured changes, increases in positive mood and rehabilitation motivation reported by the patients support that the therapeutic singing-based intervention could induce meaningful changes in terms of speech intelligibility from patients with head and neck cancers. Given that this study was conducted with a small sample size, suggestions for further investigation on the effects of the intervention were also presented.

A Fast Normalized Cross-Correlation Computation for WSOLA-based Speech Time-Scale Modification (WSOLA 기반의 음성 시간축 변환을 위한 고속의 정규상호상관도 계산)

  • Lim, Sangjun;Kim, Hyung Soon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.31 no.7
    • /
    • pp.427-434
    • /
    • 2012
  • The overlap-add technique based on waveform similarity (WSOLA) method is known to be an efficient high-quality algorithm for time scaling of speech signal. The computational load of WSOLA is concentrated on the repeated normalized cross-correlation (NCC) calculation to evaluate the similarity between two signal waveforms. To reduce the computational complexity of WSOLA, this paper proposes a fast NCC computation method, in which NCC is obtained through pre-calculated sum tables to eliminate redundancy of repeated NCC calculations in the adjacent regions. While the denominator part of NCC has much redundancy irrespective of the time-scale factor, the numerator part of NCC has less redundancy and the amount of redundancy is dependent on both the time-scale factor and optimal shift value, thereby requiring more sophisticated algorithm for fast computation. The simulation results show that the proposed method reduces about 40%, 47% and 52% of the WSOLA execution time for the time-scale compression, 2 and 3 times time-scale expansions, respectively, while maintaining exactly the same speech quality of the conventional WSOLA.

On a Pitch Alteration Method by Time-axis Scaling Compensated with the Spectrum for High Quality Speech Synthesis (고음질 합성용 스펙트럼 보상된 시간축조절 피치 변경법)

  • Bae, Myung-Jin;Lee, Won-Cheol;Im, Sung-Bin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.4
    • /
    • pp.89-95
    • /
    • 1995
  • The waveform coding technique has concerned with simply preserving the waveform shape of speech signal through a redundancy reduction process. In the case of speech synthesis, the waveform coding with high sound quality is mainly used to the synthesis by analysis. However, since the parameters of this coding are not classified into either excitation or vocal tract parameters, it is difficult to applying the waveform coding to the synthesis by rule. In order to apply the waveform coding to the synthesis by rule, the pitch alteration technique is required in prosody control. In this paper, we propose a new pitch alteration method that can change the pitch period in waveform coding by scaling the time-axis and compensating the spectrum. This is relevant to the time-frequency domain method were the phase components of the waveform is preserved with a little spectrum distortion of 2.5 % and less for 50% pitch change.

  • PDF

Preprocessing method for enhancing digital audio quality in speech communication system (음성통신망에서 디지털 오디오 신호 음질개선을 위한 전처리방법)

  • Song Geun-Bae;Ahn Chul-Yong;Kim Jae-Bum;Park Ho-Chong;Kim Austin
    • Journal of Broadcast Engineering
    • /
    • v.11 no.2 s.31
    • /
    • pp.200-206
    • /
    • 2006
  • This paper presents a preprocessing method to modify the input audio signals of a speech coder to obtain the finally enhanced signals at the decoder. For the purpose, we introduce the noise suppression (NS) scheme and the adaptive gain control (AGC) where an audio input and its coding error are considered as a noisy signal and a noise, respectively. The coding error is suppressed from the input and then the suppressed input is level aligned to the original input by the following AGC operation. Consequently, this preprocessing method makes the spectral energy of the music input redistributed all over the spectral domain so that the preprocessed music can be coded more effectively by the following coder. As an artifact, this procedure needs an additional encoding pass to calculate the coding error. However, it provides a generalized formulation applicable to a lot of existing speech coders. By preference listening tests, it was indicated that the proposed approach produces significant enhancements in the perceived music qualities.

AMR-WB Algebraic Codebook Search Method Using the Re-examination of Pulses Position (펄스위치 재검색 방법을 이용한 AMR-WB 여기 코드북 검색)

  • Hur, Seok;Lee, In-Sung;Jee, Deock-Gu;Yoon, Byung-Sik;Choi, Song-In
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.40 no.4
    • /
    • pp.292-302
    • /
    • 2003
  • We propose a new method to reduce the complexity of excitation codebook search. The preselected excitation pulses by the coarse search method can be updated to pulses with higher quality performance measure. The excitation pulses can arbitrarily be deleted and inserted among the searched pulses until the overall performance achieves. If we use this excitation pulse search method in AMR-WB, the complexity required for excitation codebook search can be reduced to half the original method while the output speech maintains equal speech quality to a conventional method.

Low delay window switching modified discrete cosine transform for speech and audio coder (음성 및 오디오 부호화기를 위한 저지연 윈도우 스위칭 modified discrete cosine transform)

  • Kim, Young-Joon;Lee, In-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.37 no.2
    • /
    • pp.110-117
    • /
    • 2018
  • In this paper, we propose a low delay window switching MDCT (Modified Discrete Cosine Transform) method for speech/audio coder. The window switching algorithm is used to reduce the degradation of sound quality in non-stationary trasient duration and to reduce the algorithm delay by using the low delay TDAC (Time Domain Aliasing Cancellation). While the conventional window switching algorithms uses overlap-add with different lengths, the proposed method uses the fixed overlap add length. It results the reduction of algorithm delay by half and 1 bit reduction in frame indication information by using 2 window types. We apply the proposed algorithm to G.729.1 based on MDCT in order to evaluate the performance. The propose method shows the reduction of algorithm delay by half while speech quality of the proposed method maintains same as the conventional method.

MPEG-D USAC: Unified Speech and Audio Coding Technology (MPEG-D USAC: 통합 음성 오디오 부호화 기술)

  • Lee, Tae-Jin;Kang, Kyeong-Ok;Kim, Whan-Woo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.7
    • /
    • pp.589-598
    • /
    • 2009
  • As mobile devices become multi-functional, and converge into a single platform, there is a strong need for a codec that is able to provide consistent quality for speech and music content MPEG-D USAC standardization activities started at the 82nd MPEG meeting with a CfP and approved WD3 at the 88th MPEG meeting. MPEG-D USAC is converged technology of AMR-WB+ and HE-AAC V2. Specifically, USAC utilizes three core codecs (AAC ACELP and TCX) for low frequency regions, SBR for high frequency regions and the MPEG Surround tool for stereo information. USAC can provide consistent sound quality for both speech and music content and can be applied to various applications such as multi-media download to mobile device Digital radio Mobile TV and audio books.

Performance Improvement of Packet Loss Concealment Algorithm in G.711 Using Adaptive Signal Scale Estimation (적응적 신호 크기 예측을 이용한 G.711 패킷 손실 은닉 알고리즘의 성능향상)

  • Kim, Tae-Ha;Lee, In-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.34 no.5
    • /
    • pp.403-409
    • /
    • 2015
  • In this paper, we propose Packet Loss Concealment (PLC) method using adaptive signal scale estimation for performance improvement of G.711 PLC. The conventional method controls a gain using 20 % attenuation factor when continuous loss occurs. However, this method lead to deterioration because that don't consider the change of signal. So, we propose gain control by adaptive signal scale estimation through before and after frame information using Least Mean Square (LMS) predictor. Performance evaluation of proposed algorithm is presented through Perceptual Evaluation of Speech Quality (PESQ) evaulation.