Search | Korea Science

EVALUATION OF THE SYNTHETIC SPEECH QUALITY BY THE TD-PCULI METHOD

Kang, Chan-Hee;Shin, Yong-Jo;Kim, Yun-Seok;Kwon, Ki-Hyung;Chin, Yong-Ohk
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1994.06a
- /
- pp.977-983
- /
- 1994
In this paper we have evaluated the synthetic speech quality by the proposed TD-PCULI speech synthesis method. For the synthesis we have extracted parameters from the Korean monosyllables through the analysis of speech waveforms in the time domain. We have constructed the Korean data format dictionary for the synthesis-by-rule depending upon the frequencies of the Korean pronunciation large vocabulary dictionary, in which V type syllables are 19, CV type's are 80, VC type's are 30 and CVC type's are 100. And using them we have synthesized various Korean monosyllables, words and sentences. We have tested each 10 syllables selected according to the 4 Korean syllable types with the objective MOS(Mean Opinion Score) evluation method about the 4 items i.e., intelligibility, clearness, loudness, and naturality after selecting random group without the knowledge of them. And also we have tested the possibility to modify a duration and F0 into another forms with changing a duration (i.e., 150msec, 300msec, 500msec, 700msec and 1sec) and a central fundamental frequency(i.e., 80Hz, 118Hz, 140Hz, 170Hz, and 200Hz). As the results of experiments the noises occurred in the course of synthesizing the speech by the rules are removed to be a very clear level and we can find that the prosodic elements can be controled as a good condition.
PDF

Modification of pitch Algorithm and Its Application to Noise (피치 알고리즘의 수정 및 소음에의 적용)

Shin, Sung-Hwan;Ih, Jeong-Guon
- Proceedings of the Korean Society for Noise and Vibration Engineering Conference
- /
- 2002.11b
- /
- pp.511-516
- /
- 2002
Pitch is a perception related to the subjective frequency that is one of the psychological aspects or attributes of tones. It is also an important factor to determine the sound quality together with loudness and timber. Although the study on pitch has been active in the field of speech communication, but its application to the product sound quality is not yet enough. In this study, the empirical data by Zwicker is made use in the modification of the currently available pitch extraction model based on the place theory. By applying this modified model to various sound samples composed of tonal or banded components, the applicability of the model is suggested. As a demonstration example, the algorithm is used for the sound quality analysis of a product noise having fundamental frequency and harmonics. The result shows that the pitch should be regarded as an important subjective cue in the sound quality analysis.
PDF

Word Accent of Cheju Dialects in Korean (제주 방언의 낱말 악센트)

Park, Soon-Bok
- MALSORI
- /
- v.55
- /
- pp.33-43
- /
- 2005
This paper investigates the word accent pattern of Cheju dialects in Korean and determines whether it varies according to the age as well as the word itself and where the speakers come from. On the basis on the theory of pitch accent, which was suggested by Koo(1993) and Jung(1965) for the Korean standard accent, the fundamental frequency of each syllable is measured. The syllable that has the highest frequency is labelled for 2, while the rests for 1. The results of the experiment are that the two syllabic words have 21 accent pattern, while the three syllabic words 121 pattern and the four syllabic words 1211. In addition to this characteristic of accent pattern in Cheju dialects, it is interesting that the older the speakers, the less accent pattern the utterance has as suggested above.
PDF

Intonatin Conversion using the Other Speaker's Excitation Signal (他話者의 勵起信號를 이용한 抑揚變換)

Lee, Ki-Young;Choi, Chang-Seok;Choi, Kap-Seok;Lee, Hyun-Soo
- The Journal of the Acoustical Society of Korea
- /
- v.14 no.4
- /
- pp.21-28
- /
- 1995
In this paper an intonation conversion method is presented which provides the basic study on converting the original speech into the artificially intoned one. This method employs the other speaker's excitation signals as intonation information and the original vocal tract spectra, which are warped with the other speaker's ones by using DTW. as vocal features, and intonation converted speech signals are synthesized through short-time inverse Fourier transform(STIFT) of their product. To evaluate the intonation converted speech by this method, we collect Korean single vowels and sentences spoken by 30 males and compare fundamental frequency contours spectrograms, distortion measures and MOS test between the original speech and the converted one. The result shows that this method can convert and speech into the intoned one of the other speaker's.
PDF

Gender Analysis in Elderly Speech Signal Processing (노인음성신호처리에서의 젠더 분석)

Lee, JiYeoun
- Journal of Digital Convergence
- /
- v.16 no.10
- /
- pp.351-356
- /
- 2018
Changes in vocal cords due to aging can change the frequency of speech, and the speech signals of the elderly can be automatically distinguished from normal speech signals through various analyzes. The purpose of this study is to provide a tool that can be easily accessed by the elderly and disabled people who can be excluded from the rapidly changing technological society and to improve the voice recognition performance. In the study, the gender of the subjects was reported as sex analysis, and the number of female and male voice samples was used equally. In addition, the gender analysis was applied to set the voices of the elderly without using voices of all ages. Finally, we applied a review methodology of standards and reference models to reduce gender difference. 10 Korean women and 10 men aged 70 to 80 years old are used in this study. Comparing the F0 value extracted directly with the waveform and the F0 extracted with TF32 and the Wavesufer speech analysis program, Wavesufer analyzed the F0 of the elderly voice better than TF32. However, there is a need for a voice analysis program for elderly people. In conclusions, analyzing the voice of the elderly will improve speech recognition and synthesis capabilities of existing smart medical systems.
https://doi.org/10.14400/JDC.2018.16.10.351 인용 PDF KSCI

Analysis of Pre and Post-Operative Speech In Combined Operation of Type I Thyroplasty and Arytenoid Adduction for Unilateral Vocal Cord Palsy (편측성대마비에 대한 제 1형 갑상성형술과 피열연골내전술의 동시수술시 술전 및 술후 음성언어분석비교)

최홍식;정유삼;김성국;김영호;김광문
- Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
- /
- v.9 no.1
- /
- pp.66-70
- /
- 1998
Background and Objectives : The managements of unilateral vocal cord palsy include type Ⅰ thyroplasty and arytenoid adduction. One type operation has been shown no satisfactory effect. We evaluated preoperative and postoperative speech of unilateral vocal cord palsy patients who received combined operation of type Ⅰ thyroplasty and arytenoid adduction to help for the management plan of unilateral vocal cord palsy patients. Materials and Methods : We reviewed the postoperative results and complication of 17 surgically treated patients of unilateral vocal cord palsy at Severance hospital from Nov. 1996 to Dec. 1997 retrospectively. They were received combined operation of type Ⅰ thyroplasty and arytenoid adduction. Their pre and post-operative speech were analyzed with MDVP(Multi-Dimension-Voice analysis Program) of CSL(Computerized Speech Lab). Results : After the operation, MPT(Maximal Phonation Time) was increased and MFR(Mean Flow Rate) was decreased in all patients. NHR(Noise to Harmonic Ratio) and VTI(Voice Turbulence Index) were decreased : liner, RAP(Relative Average Perturbation Quotient), PPQ(Pitch Period Perturbation Quotient), sPPQ(smoothed Pitch Period Perturbation Quotient), vFo(fundamental frequency Variation) were decreased : Shimmer, APQ(Amplitude Perturbation Quotient), sAPQ(Smoothed Amplitude Perturbation Qoutient), vAm(Peak Amplitude Variation) were decreased in all the patients. Conclusions : In unilateral vocal cord pals), combined operation of type Ⅰ thyroplasty and arytenoid adduction could obtain satisfactory postoperative voice. MDVP has many parameters and good method for evaluation of voice surgery.
PDF

L1-L2 Transfer in VOT and f0 Production by Korean English Learners: L1 Sound Change and L2 Stop Production

Kim, Mi-Ryoung
- Phonetics and Speech Sciences
- /
- v.4 no.3
- /
- pp.31-41
- /
- 2012
Recent studies have shown that the stop system of Korean is undergoing a sound change in terms of the two acoustic parameters, voice onset time (VOT) and fundamental frequency (f0). Because of a VOT merger of a consonantal opposition and onset-f0 interaction, the relative importance of the two parameters has been changing in Korean where f0 is a primary cue and VOT is a secondary cue in distinguishing lax from aspirated stops in speech production as well as perception. In English, however, VOT is a primary cue and f0 is a secondary cue in contrasting voiced and voiceless stops. This study examines how Korean English learners use the two acoustic parameters of L1 in producing L2 English stops and whether the sound change of acoustic parameters in L1 affects L2 speech production. The data were collected from six adult Korean English learners. Results show that Korean English learners use not only VOT but also f0 to contrast L2 voiced and voiceless stops. However, unlike VOT variations among speakers, the magnitude effect of onset consonants on f0 in L2 English was steady and robust, indicating that f0 also plays an important role in contrasting the [voice] contrast in L2 English. The results suggest that the important role of f0 in contrasting lax and aspirated stops in L1 Korean is transferred to the contrast of voiced and voiceless stops in L2 English. The results imply that, for Korean English learners, f0 rather than VOT will play an important perceptual cue in contrasting voiced and voiceless stops in L2 English.
https://doi.org/10.13064/KSSS.2012.4.3.031 인용 PDF

Acoustic and Physiological Characteristics of Pre-term and Full-term Infants' Cries (미숙아와 만삭아 울음의 음향 및 생리학적 특성)

Lee, Hyun-Sook;Pae, Jae-Yeon;Ko, Do-Heung
- Phonetics and Speech Sciences
- /
- v.2 no.2
- /
- pp.37-42
- /
- 2010
The purpose of this study is to first discriminate and assess those infants who appear healthy in appearance but who could face possible risk factors in the future and, secondly, to identify those infants who may have difficulties in their developmental stages. The subjects of this study consisted of 35 full-term infants (39-40 weeks) and 33 pre-term infants (34-35 weeks). The infants' voices were recorded for three minutes, for which EDIROL by Roland and a stand-type microphone made by SONY were used. This was done to discern the value of the Breath unit (B-unit) and the fundamental frequencies ($F_0$). It was found that there were significant differences in terms of F0 since the pre-term infants had higher F0 than the full-term infants, showing a result of 436.4 Hz for the full-term infants and 460 Hz for the pre-term infants (p<.05) There was an average rate of 4.01 for the full-term infants and 4.02 (SD=1.69) for the pre-term infants in shimmer. For NHR, it was observed .44 for the full-term infants and .50 for the pre-term infants, thus revealing no significant differences in these observations. This study shows that the crying of newborn babies is related to their physical conditions and it is a sensatory response to these conditions. Furthermore, this study could be helpful for the early detection and measurement of newborn babies who look clinically healthy but could be at risk through acoustic and physiological analyses.
PDF

The Effects of Vocal Relaxation Training on Voice Improvement of Children with Vocal Nodules (성대접촉이완훈련이 성대결절아동의 음성개선에 미치는 효과)

Han, Ji Eun;Seong, Cheol Jae
- Phonetics and Speech Sciences
- /
- v.4 no.4
- /
- pp.147-154
- /
- 2012
The purpose of this study is to examine the effect of voice improvement when vocal training, which relaxes the vocal contact, is applied to children with vocal nodules. Subjects included 20 5- to 12-year-old boys with vocal nodules in Otolaryngology and for whom voice therapy had been advised. The vocal therapy was conducted for 40 minutes per a week for a total of eight times. Results were evaluated by videostroboscopy, auditory-perceptual evaluation of GRBAS Scale, aerodynamic test, and acoustic analysis before and after therapy. As a result, first, the size of vocal nodules was reduced and the unstable pattern of vocal contact was improved. Glottic closure was increased and Phase symmetry was decreased during vocal vibration. Mucosal wave was increased and muscle tension of the larynx was reduced. Second, auditory-perceptual evaluation showed that subjects' overall quality of voice improved. GRBAS Scale Evaluation showed that the characteristics of the subjects' voice which were rough, breathy, and strained and breathy were reduced after therapy. Third, the measurements of acoustic parameters showed a statistically significant improvement. The fundamental frequency of the subejects' voice was increased and values of Jitter and Shimmer, NHR, [H1-H2] decreased. Fourth, the maximum phonation time of children was increased. These results imply that vocal relaxation training conducted in this study has a very positive effect to improve the voice of children with vocal nodules.
https://doi.org/10.13064/KSSS.2012.4.4.147 인용 PDF

Korean-English bilingual children's production of stop contrasts

Oh, Eunhae
- Phonetics and Speech Sciences
- /
- v.11 no.3
- /
- pp.1-7
- /
- 2019
Korean (L1)-English (L2) bilingual adults' and children's production of Korean and English stops was examined to determine the age effects and L2 experience on the development of L1 and L2 stop contrasts. Four groups of Seoul Korean speakers (experienced and inexperienced adult and child groups) and two groups of age-matched native English speakers participated. The overall results of voice onset time (VOT) and fundamental frequency (F0) of phrase-initial stops in Korean and word-intial stops in English showed a delay in the acquisition of L1 due to the dominant exposure to L2. Significantly longer VOT and lower F0 for aspirated stops as well as high temporal variability across repetitions of lenis stops were interpreted to indicate a strong effect of English on Korean stop contrasts for bilingual children. That is, the heavy use of VOT for Korean stop contrasts shows bilingual children's attention to the acoustic cue that are primarily employed in the dominant L2. Furthermore, inexperienced children, but not adults, were shown to create new L2 categories that are distinctive from the L1 within 6 months of L2 experience, suggesting greater independence between the two phonological systems. The implications of bilinguals' age at the time of testing to the degree and direction of L1-L2 interaction are further discussed.
https://doi.org/10.13064/KSSS.2019.11.3.001 인용 PDF KSCI

Search Result 205, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)