• Title/Summary/Keyword: speech quality

Search Result 807, Processing Time 0.03 seconds

A New Vocoder based on AMR 7.4Kbit/s Mode for Speaker Dependent System (화자 의존 환경의 AMR 7.4Kbit/s모드에 기반한 보코더)

  • Min, Byung-Jae;Park, Dong-Chul
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.33 no.9C
    • /
    • pp.691-696
    • /
    • 2008
  • A new vocoder of Code Excited Linear Predictive (CELP) based on Adaptive Multi Rate (AMR) 7.4kbit/s mode is proposed in this paper. The proposed vocoder achieves a better compression rate in an environment of Speaker Dependent Coding System (SDSC) and is efficiently used for systems, such as OGM(Outgoing message) and TTS(Text To Speech), which needs only one person's speech. In order to enhance the compression rate of a coder, a new Line Spectral Pairs(LSP) code-book is employed by using Centroid Neural Network (CNN) algorithm. In comparison with original(traditional) AMR 7.4 Kbit/s coder, the new coder shows 27% higher compression rate while preserving synthesized speech quality in terms of Mean Opinion Score(MOS).

A Study on the Correlation between Body-Size and MDVP Parameters in the Normal Male and Female Korean Population (정상 한국인의 성별 체형정보와 MDVP 변수간의 상관관계 연구)

  • Kang, Jae-Hwan;Yoo, Jong-Hyang;Kim, Jong-Yeol
    • Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.107-119
    • /
    • 2008
  • This paper intends to investigate the correlation of 12 MDVP measurements with age, sex and body-size of sampled healthy patients. In order to extract pitch and 12 MDVP parameters efficiently and display the correlation of each parameter easily, we developed the speech analysis program using C/C++ and MFC development tool. The sample group consists of 205 males and 343 females with ages 9-81. We collected vowel voices /a/ and 8 body-size measurements from them. Body-size values were taken at 8 different torso positions of each person. We analyzed the matched voice samples and body-size measurements by the developed speech analysis program and SPSS program. The result shows that a typical characteristic age-F0 pattern that F0 of male subjects are rapidly decreased after mutational period and have stable state with age and that of female subjects are slowly changed by overall age. In MDVP, age-STD in males, age-sPPQ in females relationships are especially similar to the age-F0 relationship. In case of male group, sPPQ(0.316%), Jitt(0.04%), Shim(0.25%), APQ(0.28%) variables are increased with age after mutational period. And Jitt(0.042%), sPPQ(0.219%) of females group are increased with age too. In cases of height, weight and BMI there exists a weak correlation with MDVP, which shows a correlation coefficient below 0.25 about male and female groups. The survey of correlation relationship between 8 body-size measurements and MDVP shows a insignificant statistical result by only just having the correlation coefficient maximum in M8-8 and F0(-0.394%) for males and M8-6,7(-0.368%, -0.364%) for females.

  • PDF

A Clinical Experience of Cleft Palate Repair Using Operative Microscope: Sommerlad's Method (Sommerlad씨 술식에 따른 미세수술 술기를 이용한 구개성형술의 경험)

  • Park, Myong Chul;Shin, Seung Jun;Lee, Il Jae
    • Archives of Plastic Surgery
    • /
    • v.33 no.1
    • /
    • pp.61-66
    • /
    • 2006
  • The purpose of this study is to introduce the method of palate repair that combines minimal hard palate dissection and radical retropositiong of levator musculature, which was presented by Sommerlad. As this method presents, additional use of the operating microscope enables atraumatic and radical dissection, and it might provide more improved speech function to the patients. A total of 17 patients with cleft palate underwent Sommerlad's method from December 2003 to August 2004. The mean follow-up period was 4.5 months. The use of a microscope provided high quality variable magnification and good illumination at the operating field. Repair was carried out through incisions at the margins of cleft with mucoperiosteal flap elevation. Muscles were rearranged and repaired properly. It was unable to evaluate the improvement of speech because the patients were too young to learn meaningful speech. Average operating time including anesthetic induction time, V-tube insertion and recovery from anesthesia was 2 hours 45 minutes which was not quite different from conventional method's operating time. Oronasal fistula developed in 2 patients of them. One of them was healed spontaneously. As meticulous and radical muscle dissection was possible with Sommerlad's method, we could minimize the trauma to the muscular and neurovascluar structure. In addition, we expect better faculty of speech as a result of this method although longer follow-up time was unavailable.

Improved Harmonic-CELP Speech Coder with Dual Bit-Rates(2.4/4.0 kbps) (이중 전송률(2.4/4.0 kbps)을 갖는 개선된 하모닉-CELP 음성부호화기)

  • 김경민;윤성완;최용수;박영철;윤대희;강태익
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.28 no.3C
    • /
    • pp.239-247
    • /
    • 2003
  • This paper presents a dual-rate (2.4/4.0 kbps) Improved Harmonic-CELP(IHC) speech coder based on the EHC(Efficient Harmonic-CELP) which was presented by the authors. The proposed IHC employs the harmonic coding for voiced and the CELP for unvoiced segments. In the IHC, an initial voiced/unvoiced estimate is obtained by the pitch gain and energy. Then, the final V/UV mode is decided by using the frame energy contour. A new harmonic estimation combining peak picking and delta adjustment provides a more reliable harmonic estimation than that in the EHC. In addition, a noise mixing scheme in conjunction with an improved band voicing measurement provides the naturalness of the synthesized speech. To demonstrate the performance of the proposed IHC coder, the coder has been implemented and compared with the 2.0/4.0 kbps HVXC(Harmonic excitation Vector Coding) standardized by MPEG-4. Results of subjective evaluation showed that the proposed IHC coder and produce better speech quality than the HVXC, with only 40% complexity of the HVXC.

Improvement of Synthetic Speech Quality using a New Spectral Smoothing Technique (새로운 스펙트럼 완만화에 의한 합성 음질 개선)

  • 장효종;최형일
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.11
    • /
    • pp.1037-1043
    • /
    • 2003
  • This paper describes a speech synthesis technique using a diphone as an unit phoneme. Speech synthesis is basically accomplished by concatenating unit phonemes, and it's major problem is discontinuity at the connection part between unit phonemes. To solve this problem, this paper proposes a new spectral smoothing technique which reflects not only formant trajectories but also distribution characteristics of spectrum and human's acoustic characteristics. That is, the proposed technique decides the quantity and extent of smoothing by considering human's acoustic characteristics at the connection part of unit phonemes, and then performs spectral smoothing using weights calculated along a time axis at the border of two diphones. The proposed technique reduces the discontinuity and minimizes the distortion which is caused by spectral smoothing. For the purpose of performance evaluation, we tested on five hundred diphones which are extracted from twenty sentences using ETRI Voice DB samples and individually self-recorded samples.

Suitable IP Currency Quality Measurement Model in Ubiquitous Environment (유비쿼터스 환경에 적합한 IP 통화품질 측정 모델)

  • Choi Seung-Kwon;Lee Byeong-Rok;Sin Byung-Gok;Kim Sun-Chul;Cho Young-Hwan
    • The Journal of the Korea Contents Association
    • /
    • v.6 no.8
    • /
    • pp.20-29
    • /
    • 2006
  • This paper proposes a quality measurement model for video phone service over IP environment. Proposed model enhances conventional E-Model by using quality analysis and this model is suitable for ubiquitous environment. This research measures video phone quality by applying bust packet loss and recency effect. It uses delay and recency effect for compensating actual quality and recognized quality of user using NR and UR factor. Simulation results show that this model can provide more precise results than conventional model by considering recency effect of video phone service quality measurement model.

  • PDF

Multi frequency band noise suppression system using signal-to-noise ratio estimation (신호 대 잡음비 추정 방법을 이용한 다중 주파수 밴드 잡음 억제 시스템)

  • Oh, In Kyu;Lee, In Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.35 no.2
    • /
    • pp.102-109
    • /
    • 2016
  • This paper proposes a noise suppression method through SNR (Singal-to Noise Ratio) estimation in the two microphone array environment of close spacing. The conventional method uses a noise suppression method for a gain function obtained through the SNR estimation based on coherence function from full band. However, this method cause performance decreased by the noise damage that affects all the feature vector component. So, we propose a noise suppression method that allocates a frequency domain signal into N constant multi frequency band and each frequency band gets a gain function through SNR estimation based on coherence function. Performance evaluation of the proposed method is shown by comparison with PESQ (Perceptual Evaluation of Speech Quality) value which is an objective quality evaluation method provided by the ITU-T (International Telecommunications Union Telecommunication).

Tonal development and voice quality in the stops of Seoul Korean

  • Yu, Hye Jeong
    • Phonetics and Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.91-99
    • /
    • 2018
  • Korean stops are currently undergoing a tonogenetic sound change, as found in the Seoul dialect in which a merged VOT of aspirated and lax stops induces F0 to be the primary cue for distinguishing the two stops and the lax stops have lower F0 than the aspirated stops. In tonal languages, low tone is produced with a breathy voice. This study investigated whether there are changes in voice quality with respect to the tonogenetic sound change of Korean stops. Two age groups speaking the Seoul dialect participated in this study: five females and six males born in the 1940s and 1950s and nine females and eight males born in the 1980s and 1990s. This study replicated previous findings of VOT and F0 and further examined H1-H2, H1-A1, and H1-A2 to see how they correlate with the sound change. In the older and younger generations, H1-H2, H1-A1, and H1-A2 were significantly lower after the tense stops than after the aspirated and lax stops, but they were not significantly different after the aspirated and lax stops. However, the younger females exhibited some different results for H1-H2 and H1-A2 than the older generation. In the younger females, the H1-H2 mean was higher after the aspirated stops than it was after the lax stops at the vowel onset, and the H1-H2 difference increased at the vowel midpoint. Although there was an inter-speaker variation in the results of H1-H2 and H1-A1, analyses of individual speakers showed that the H1-H2 and H1-A1 were higher after the lax stops than after the aspirated stops in the younger female speakers. These results indicate that lax stops tend to be breathier than aspirated stops in the younger female speakers. They also indicate that changes in voice quality are on Korean stops with tonal sound change, but are still developing.

Character Recognition Algorithm in Low-Quality Legacy Contents Based on Alternative End-to-End Learning (대안적 통째학습 기반 저품질 레거시 콘텐츠에서의 문자 인식 알고리즘)

  • Lee, Sung-Jin;Yun, Jun-Seok;Park, Seon-hoo;Yoo, Seok Bong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.11
    • /
    • pp.1486-1494
    • /
    • 2021
  • Character recognition is a technology required in various platforms, such as smart parking and text to speech, and many studies are being conducted to improve its performance through new attempts. However, with low-quality image used for character recognition, a difference in resolution of the training image and test image for character recognition occurs, resulting in poor accuracy. To solve this problem, this paper designed an end-to-end learning neural network that combines image super-resolution and character recognition so that the character recognition model performance is robust against various quality data, and implemented an alternative whole learning algorithm to learn the whole neural network. An alternative end-to-end learning and recognition performance test was conducted using the license plate image among various text images, and the effectiveness of the proposed algorithm was verified with the performance test.

Comparison of Vowel and Text-Based Cepstral Analysis in Dysphonia Evaluation (발성장애 평가 시 /a/ 모음연장발성 및 문장검사의 켑스트럼 분석 비교)

  • Kim, Tae Hwan;Choi, Jeong Im;Lee, Sang Hyuk;Jin, Sung Min
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.26 no.2
    • /
    • pp.117-121
    • /
    • 2015
  • Background : Cepstral analysis which is obtained from Fourier transformation of spectrum has been known to be effective indicator to analyze the voice disorder. To evaluate the voice disorder, phonation of sustained vowel /a/ sound or continuous speech have been used but the former was limited to capture hoarseness properly. This study is aimed to compare the effectiveness in analysis of cepstrum between the sustained vowel /a/ sound and continuous speech. Methods : From March 2012 to December 2014, total 72 patients was enrolled in this study, including 24 unilateral vocal cord palsy, vocal nodule and vocal polyp patients, respectively. The entire patient evaluated their voice quality by VHI (Voice Handicap Index) before and after treatment. Phonation of sustained vowel /a/ sample and continuous speech using the first sentence of autumn paragraph was subjected by cepstral analysis and compare the pre-treatment group and post-treatment group. Results : The measured values of pre and post treatment in CPP-a (cepstral peak prominence in /a/ vowel sound) was 13.80, 13.91 in vocal cord palsy, 16.62, 17.99 in vocal cord nodule, 14.19, 18.50 in vocal cord polyp respectively. Values of CPP-s (cepstral peak prominence in text-based speech) in pre and post treatment was 11.11, 12.09 in vocal cord palsy, 12.11, 14.09 in vocal cord nodule, 12.63, 14.17 in vocal cord polyp. All 72 patients showed subjective improvement in VHI after treatment. CPP-a showed statistical improvement only in vocal polyp group, but CPP-s showed statistical improvement in all three groups (p<0.05). Conclusion : In analysis of cepstrum, text-based analysis is more representative in voice disorder than vowel sound speech. So when the acoustic analysis of voice by cepstrum, both phonation of sustained vowel /a/ sound and text based speech should be performed to obtain more accurate result.

  • PDF