• Title/Summary/Keyword: Speech sound

Search Result 628, Processing Time 0.025 seconds

A preliminary study of sound quality evaluation of cochlear implant users (인공와우 사용자의 심리음향적 음질평가 예비연구)

  • Bahng, Junghwa;Oh, Soo Hee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.1
    • /
    • pp.45-51
    • /
    • 2022
  • Sound quality evaluation is one of the psychoacoustic methods to measure subjective judgements for sound color. The purpose of this study is to investigate sound quality benefits of bimodal users by comparing sound quality scores between bimodal hearing condition and unilateral cochlear implant(CI) condition as a preliminary study. Thirteen bimodal users and seven unilateral CI users were participated in this study. Audiologists performed pure tone and speech audiometry and measured functional gain and real-ear insertion gain. Subjective assessment of sound quality was followed with four sounds including violin sound, male and female voices, and refrigerator noise. Participants judged the sound quality with six sound quality index. Bimodal users showed mean 0.8 points more sound quality improvements in bimodal condition than unilateral CI condition. Group comparison between bimodal and unilateral CI users showed no differences. A follow-up study of sound quality tools and methods should be considered to evaluate subjective bimodal benefits of cochlear implant users.

Low-band Extension of CELP Speech Coder by Recovery of Harmonics (고조파 복원에 의한 CELP 음성 부호화기의 저대역 확장)

  • Park Jin Soo;Choi Mu Yeol;Kim Hyung Soon
    • MALSORI
    • /
    • no.49
    • /
    • pp.63-75
    • /
    • 2004
  • Most existing telephone speech transmitted in current public networks is band-limited to 0.3-3.4 kHz. Compared with wideband speech(0-8 kHz), the narrowband speech lacks low-band (0-0.3 kHz) and high-band(3.4-8 kHz) components of sound. As a result, the speech is characterized by the reduced intelligibility and a muffled quality, and degraded speaker identification. Bandwidth extension is a technique to provide wideband speech quality, which means reconstruction of low-band and high-band components without any additional transmitted information. Our new approach considers to exploit harmonic synthesis method for reconstruction of low-band speech over the CELP coded speech. A spectral distortion measurement and listening test are introduced to assess the proposed method, and the improvement of synthesized speech quality was verified.

  • PDF

Comparison of Vowel and Text-Based Cepstral Analysis in Dysphonia Evaluation (발성장애 평가 시 /a/ 모음연장발성 및 문장검사의 켑스트럼 분석 비교)

  • Kim, Tae Hwan;Choi, Jeong Im;Lee, Sang Hyuk;Jin, Sung Min
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.26 no.2
    • /
    • pp.117-121
    • /
    • 2015
  • Background : Cepstral analysis which is obtained from Fourier transformation of spectrum has been known to be effective indicator to analyze the voice disorder. To evaluate the voice disorder, phonation of sustained vowel /a/ sound or continuous speech have been used but the former was limited to capture hoarseness properly. This study is aimed to compare the effectiveness in analysis of cepstrum between the sustained vowel /a/ sound and continuous speech. Methods : From March 2012 to December 2014, total 72 patients was enrolled in this study, including 24 unilateral vocal cord palsy, vocal nodule and vocal polyp patients, respectively. The entire patient evaluated their voice quality by VHI (Voice Handicap Index) before and after treatment. Phonation of sustained vowel /a/ sample and continuous speech using the first sentence of autumn paragraph was subjected by cepstral analysis and compare the pre-treatment group and post-treatment group. Results : The measured values of pre and post treatment in CPP-a (cepstral peak prominence in /a/ vowel sound) was 13.80, 13.91 in vocal cord palsy, 16.62, 17.99 in vocal cord nodule, 14.19, 18.50 in vocal cord polyp respectively. Values of CPP-s (cepstral peak prominence in text-based speech) in pre and post treatment was 11.11, 12.09 in vocal cord palsy, 12.11, 14.09 in vocal cord nodule, 12.63, 14.17 in vocal cord polyp. All 72 patients showed subjective improvement in VHI after treatment. CPP-a showed statistical improvement only in vocal polyp group, but CPP-s showed statistical improvement in all three groups (p<0.05). Conclusion : In analysis of cepstrum, text-based analysis is more representative in voice disorder than vowel sound speech. So when the acoustic analysis of voice by cepstrum, both phonation of sustained vowel /a/ sound and text based speech should be performed to obtain more accurate result.

  • PDF

L1-L2 Transfer in VOT and f0 Production by Korean English Learners: L1 Sound Change and L2 Stop Production

  • Kim, Mi-Ryoung
    • Phonetics and Speech Sciences
    • /
    • v.4 no.3
    • /
    • pp.31-41
    • /
    • 2012
  • Recent studies have shown that the stop system of Korean is undergoing a sound change in terms of the two acoustic parameters, voice onset time (VOT) and fundamental frequency (f0). Because of a VOT merger of a consonantal opposition and onset-f0 interaction, the relative importance of the two parameters has been changing in Korean where f0 is a primary cue and VOT is a secondary cue in distinguishing lax from aspirated stops in speech production as well as perception. In English, however, VOT is a primary cue and f0 is a secondary cue in contrasting voiced and voiceless stops. This study examines how Korean English learners use the two acoustic parameters of L1 in producing L2 English stops and whether the sound change of acoustic parameters in L1 affects L2 speech production. The data were collected from six adult Korean English learners. Results show that Korean English learners use not only VOT but also f0 to contrast L2 voiced and voiceless stops. However, unlike VOT variations among speakers, the magnitude effect of onset consonants on f0 in L2 English was steady and robust, indicating that f0 also plays an important role in contrasting the [voice] contrast in L2 English. The results suggest that the important role of f0 in contrasting lax and aspirated stops in L1 Korean is transferred to the contrast of voiced and voiceless stops in L2 English. The results imply that, for Korean English learners, f0 rather than VOT will play an important perceptual cue in contrasting voiced and voiceless stops in L2 English.

Effects of the Orthographic Representation on Speech Sound Segmentation in Children Aged 5-6 Years (5~6세 아동의 철자표상이 말소리분절 과제 수행에 미치는 영향)

  • Maeng, Hyeon-Su;Ha, Ji-Wan
    • Journal of Digital Convergence
    • /
    • v.14 no.6
    • /
    • pp.499-511
    • /
    • 2016
  • The aim of this study was to find out effect of the orthographic representation on speech sound segmentation performance. Children's performances of the orthographic representation task and the speech sound segmentation task had positive correlation in words of phoneme-grapheme correspondence and negative correlation in words of phoneme-grapheme non-correspondence. In the case of words of phoneme-grapheme correspondence, there was no difference in performance ability between orthographic representation high level group and low level group, while in the case of words of phoneme-grapheme non-correspondence, the low level group's performance was significantly better than the high level group's. The most frequent errors of both groups were orthographic conversion errors and such errors were significantly more noticeable in the high level group. This study suggests that from the time of learning orthographic knowledge, children utilize orthographic knowledge for the performance of phonological awareness tasks.

Formant Measurements of Complex Waves and Vowels Produced by Students (복합음과 대학생이 발음한 모음 포먼트 측정)

  • Yang, Byung-Gon
    • Speech Sciences
    • /
    • v.15 no.3
    • /
    • pp.39-51
    • /
    • 2008
  • Formant measurements are one of the most important factors to objectively test cross-linguistic differences among vowels produced by speakers of any given languages. However, many speech analysis softwares present erroneous estimates and some researchers use them without any verification procedures. The purposes of this paper are to examine formant measurements of complex waves which were synthesized from the average formant values of five Korean vowels using three default methods in Praat and to verify the measured values of the five vowels produced by 20 students using one of the methods. Variances along the time axis are discussed after determining absolute difference sum from the 1/3 vowel duration point. Results show that there were smaller measurement errors by the burg method. Also, greater errors were observed in the sl or lpc methods mostly caused by the inappropriate formant settings. Formant measurement deviations were greater in those vowels produced by the female students than those of the male students, which were mostly attributed to the settings for the vowels /o, u/. Formant settings can best be corrected by changing the number of formants to the number of visible dark bands on the spectrogram. Those results suggest that researchers should check the validity of the estimates from the speech analysis software. Further studies are recommended on the perception test of the original sound with the synthesized sound by the estimated formant values.

  • PDF

The Characteristics of the Korean Conversational Speech by Frequency (주파수분석에 의한 한글 연속음의 특성)

  • 신용철;최진태
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.9 no.1
    • /
    • pp.7-16
    • /
    • 1972
  • By analyzing the frequency of the speech under test to be affected the effect of a co-articulation, we find out the fact that a conversational speech is far from the collective sound continued by a monotone, and define also the frequency range of a Formant at the Korean conversational speech.

  • PDF

Detection and Synthesis of Transition Parts of The Speech Signal

  • Kim, Moo-Young
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.33 no.3C
    • /
    • pp.234-239
    • /
    • 2008
  • For the efficient coding and transmission, the speech signal can be classified into three distinctive classes: voiced, unvoiced, and transition classes. At low bit rate coding below 4 kbit/s, conventional sinusoidal transform coders synthesize speech of high quality for the purely voiced and unvoiced classes, whereas not for the transition class. The transition class including plosive sound and abrupt voiced-onset has the lack of periodicity, thus it is often classified and synthesized as the unvoiced class. In this paper, the efficient algorithm for the transition class detection is proposed, which demonstrates superior detection performance not only for clean speech but for noisy speech. For the detected transition frame, phase information is transmitted instead of magnitude information for speech synthesis. From the listening test, it was shown that the proposed algorithm produces better speech quality than the conventional one.

Acoustic Features of Phonatory Offset-Onset in the Connected Speech between a Female Stutterer and Non-Stutterers (연속구어 내 발성 종결-개시의 음향학적 특징 - 말더듬 화자와 비말더듬 화자 비교 -)

  • Han, Ji-Yeon;Lee, Ok-Bun
    • Speech Sciences
    • /
    • v.13 no.2
    • /
    • pp.19-33
    • /
    • 2006
  • The purpose of this paper was to examine acoustical characteristics of phonatory offset-onset mechanism in the connected speech of female adults with stuttering and normal nonfluency. The phonatory offset-onset mechanism refers to the laryngeal articulatory gestures. Those gestures are required to mark word boundaries in phonetic contexts of the connected speech. This mechanism included 7 patterns based on the speech spectrogram. This study showed the acoustic features in the connected speech in the production of female adults with stuttering (n=1) and normal nonfluency (n=3). Speech tokens in V_V, V_H, and V_S contexts were selected for the analysis. Speech samples were recorded by Sound Forge, and the spectrographic analysis was conducted using Praat. Results revealed a stuttering (with a type of block) female exhibited more laryngealization gestures in the V_V context. Laryngealization gesture was more characterized by a complete glottal stop or glottal fry both in V_H and in V_S contexts. The results were discussed from theoretical and clinical perspectives.

  • PDF

PROSODY CONTROL BASED ON SYNTACTIC INFORMATION IN KOREAN TEXT-TO-SPEECH CONVERSION SYSTEM

  • Kim, Yeon-Jun;Oh, Yung-Hwan
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06a
    • /
    • pp.937-942
    • /
    • 1994
  • Text-to-Speech(TTS) conversion system can convert any words or sentences into speech. To synthesize the speech like human beings do, careful prosody control including intonation, duration, accent, and pause is required. It helps listeners to understand the speech clearly and makes the speech sound more natural. In this paper, a prosody control scheme which makes use of the information of the function word is proposed. Among many factors of prosody, intonation, duration, and pause are closely related to syntactic structure, and their relations have been formalized and embodied in TTS. To evaluate the synthesized speech with the proposed prosody control, one of the subjective evaluation methods-MOS(Mean Opinion Score) method has been used. Synthesized speech has been tested on 10 listeners and each listener scored the speech between 1 and 5. Through the evaluation experiments, it is observed that the proposed prosody control helps TTS system synthesize the more natural speech.

  • PDF