Search | Korea Science

Implementation of TTS Engine for Natural Voice (자연음 TTS(Text-To-Speech) 엔진 구현)

Cho Jung-Ho;Kim Tae-Eun;Lim Jae-Hwan
- Journal of Digital Contents Society
- /
- v.4 no.2
- /
- pp.233-242
- /
- 2003
A TTS(Text-To-Speech) System is a computer-based system that should be able to read any text aloud. To output a natural voice, we need a general knowledge of language, a lot of time, and effort. Furthermore, the sound pattern of english has a variable pattern, which consists of phonemic and morphological analysis. It is very difficult to maintain consistency of pattern. To handle these problems, we present a system based on phonemic analysis for vowel and consonant. By analyzing phonological variations frequently found in spoken english, we have derived about phonemic contexts that would trigger the multilevel application of the corresponding phonological process, which consists of phonemic and allophonic rules. In conclusion, we have a rule data which consists of phoneme, and a engine which economize in system. The proposed system can use not only communication system, but also utilize office automation and so on.
PDF

Sums-of-Products Models for Korean Segment Duration Prediction

Chung, Hyun-Song
- Speech Sciences
- /
- v.10 no.4
- /
- pp.7-21
- /
- 2003
Sums-of-Products models were built for segment duration prediction of spoken Korean. An experiment for the modelling was carried out to apply the results to Korean text-to-speech synthesis systems. 670 read sentences were analyzed. trained and tested for the construction of the duration models. Traditional sequential rule systems were extended to simple additive, multiplicative and additive-multiplicative models based on Sums-of-Products modelling. The parameters used in the modelling include the properties of the target segment and its neighbors and the target segment's position in the prosodic structure. Two optimisation strategies were used: the downhill simplex method and the simulated annealing method. The performance of the models was measured by the correlation coefficient and the root mean squared prediction error (RMSE) between actual and predicted duration in the test data. The best performance was obtained when the data was trained and tested by ' additive-multiplicative models. ' The correlation for the vowel duration prediction was 0.69 and the RMSE. 31.80 ms. while the correlation for the consonant duration prediction was 0.54 and the RMSE. 29.02 ms. The results were not good enough to be applied to the real-time text-to-speech systems. Further investigation of feature interactions is required for the better performance of the Sums-of-Products models.
PDF

Development of Parameters for Diagnosing Laryngeal Diseases

Kim, Yong-Ju;Wang, Soo-Geun;Kim, Gi-Ryun;Kwon, Soon-Bok;Jeon, Kye-Rok;Back, Moo-Jin;Yang, Byung-Gon;Jo, Cheol-Woo;Kim, Hyung-Soon
- Speech Sciences
- /
- v.10 no.1
- /
- pp.117-129
- /
- 2003
Many people suffer from various laryngeal diseases. Since we can notice voice change easily, acoustic analysis can be helpful to diagnose the diseases. Several attempts have been made to clarify the relation between the parameters and the state of sick vocal folds but any decisive parameters are not found yet. The purpose of this study was to select and develop those parameters useful for diagnosing and differentiating laryngeal diseases. We examined eight MDVP parameters, and two additional MFCC and LPC parameters obtained from the production of an open vowel by 252 subjects with or without laryngeal diseases. Using a statistical procedure through the artificial neural networks, we attempted to differentiate laryngeal disease groups. Results showed that the LPC parameters indicated the highest differentiating rate by the networks followed by the MFCC and the MDVP parameters. In addition, Jita, Shim and NHR among the MDVP parameters came out better parameters in diagnosing laryngeal diseases.
PDF

A Comparative Study of Glottal Data from Normal Adults Using Two Laryngographs

Yang, Byung-Gon;Wang, Soo-Geun;Kwon, Soon-Bok
- Speech Sciences
- /
- v.10 no.1
- /
- pp.15-25
- /
- 2003
A laryngograph was developed to measure the open and closed movements of vocal folds in our laboratory. This study attempted to evaluate its performance by comparing its glottal data with that of the original laryngograph. Ten normal Korean adults Participated in the experiment. Each subject produced a sustained vowel /a/ for about five seconds. This study compared f0 values, contact quotients of the duration of closed vocal folds over one glottal pulse, and area quotients of the closed over open vocal folds derived from glottal waves using both the original and new laryngographs. Results showed that the mean and standard deviation of the two laryngographs were almost comparable with a correlation coefficient 0.662 but minor systematic shift below those of the original laryngograph was observed. The absolute mean difference converged into 1 Hz, which indicates a possibility of adopting some threshold of rejecting inappropriate pitch values beyond a threshold value. The contact quotient of the normal subjects came out slightly over the 50% in a citation speech. Finally, the area quotient converged into 1. We will pursue further studies on the abnormal patients in the future.
PDF

Voice quality distinctions of the three-way stop contrast under prosodic strengthening in Korean

Jiyoung Jang;Sahyang Kim;Taehong Cho
- Phonetics and Speech Sciences
- /
- v.16 no.1
- /
- pp.17-24
- /
- 2024
The Korean three-way stop contrast (lenis, aspirated, fortis) is currently undergoing a sound change, such that the primary cue distinguishing lenis and aspirated stops is shifting from voice onset time (VOT) to F0. Despite recent discussions of this shift, research on voice quality, traditionally considered an additional cue signaling the contrast, remains sparse. This study investigated the extent to which the associated voice quality [as reflected in the acoustic measurements of H1^*-H2^*, H1^*- A1^*, and cepstral peak prominence (CPP)] contributes to the three-way stop contrast, and how the realization is conditioned by prominence- vs. boundary-induced prosodic strengthening amid the ongoing sound change. Results for 12 native Korean speakers indicate that there was a substantial distinction in voice quality among the three stop categories with the breathiness of the vowel being the greatest after the lenis, intermediate after the aspirated, and least after the fortis stops, indicating the role of voice quality in the maintenance of the three-way stop contrast. Furthermore, prosodic strengthening has different effects on the contrast and contributes to the enhancement of the phonological contrast contingent on whether it is induced by prominence or boundary.
https://doi.org/10.13064/KSSS.2024.16.1.017 인용 PDF

The Comparative Study of Effect on Speech before and after Orthognathic Surgery of Patients (악교정 환자의 악교정 수술전후 발음양상에 대한 비교연구)

Kwon, Kyung-Hwan;Kim, Soo-Nam;Lee, Dong-Keun;Cho, Yong-Min;Lee, Suk-Hyang
- Maxillofacial Plastic and Reconstructive Surgery
- /
- v.22 no.2
- /
- pp.191-205
- /
- 2000
The purpose of this study was undertaken to determine the effects of orthognathic surgery on speech. The hyposis stated herein is that functional behaviors of the dentofacial complex, such as speech production, may be adversely affected by deviations of a structural nature(especially, Class III malocclusion). Twenty adults with Class III malocclusion(13 female and 7 male) were studied preoperative, immediate postoperative and either 6 or 12 months postoperative lateral cephalograms. They had mandibular prognathism and had undergone mandible setback operation. The position of tongue, soft palate(Uvula), hyoid bone, respiratory track width, and pharyngeal depth were assessed on lateral cephalograms with 23 cephalometric variables, ANOVA, Paired t-tests and Pearson's product-moment correlation coefficient tests were used to evalute the operative changes in all cephalometric parameters. A experienced speech and language pathologists performed narrow phonetic transcriptions of tape-recorded words and sentences produced by each of the ninth patients and the recording tapes were analyzed by phonetic computer program(Computerized Speech Lab(CSL) Model 4300BI(U.S.A.)) These judges also recorded their ratings of each patient's overall consonants, hypernasality, hyponasality, and articulation proficiency. The results obtained are as follows; 1. There were significant changes in distance of posterior pharyngeal wall to tongue (TI-TW2, TS-TW3) after the surgery at 6 months postoperatively(each p<0.01 p<0.05). 2. The posterior tongue point(TI, TS, PPT) moved posteriorly after surgery and remained to its changed position at 6 months postoperatively(p<0.05). The displacement of tongue was correlated with the movement of mandibular setback amount(p<0.05). The hyoid bone moved posteriorly superiorly after immediate postoperative period. There was significant changes in hyoid bone movement after immediated postoperative period(p<0.05), but returned to its original position during the follow-up period(p>0.05) 3. The soft palate was displaced posteriorly superiorly after immediated operative period and remained to its changed position at 6 months postoperatively(p<0.05). ANS-PNS-SPT angle increasing, PPU-PPPo distance narrowing was showed after surgery, and remained its appearance 6 months postoperatively(p<0.05). 4. There were significant changes in formant value and squre diagram of vowel sound after the orthognathic surgery and the follow-up period. There were significant changes in /ㅅ/sound and posterior tongue sound. 5. The posterior movement of tongue and the posteriosuperior movement of soft palate was correlated with mandibular setback amount after orthognathic surgery. On the vowel squre diagram, the author found that the place of articulation after operation moved downward, backward, upward. 6. In assessing speech abnormalities, dental occlusion should be considered as a contributing factor. The vast majority of subjects with preoperative misarticulations eliminated or reduced their errors following orthognathic surgery. There was significant difference in speech impovement between pre- and postoperation.
PDF

Hunting for the Hurt in Chaucer′s Book of the Duchess

Vaughan, Miceal F.
- Lingua Humanitatis
- /
- v.2 no.2
- /
- pp.85-107
- /
- 2002
The word play on h(e)art-hunting has become a virtual commonplace in criticism of Chaucer′s Book of the Duchess. Less widely discussed is the third meaning of ME herte, "hurt." The "hart"/ "heart" pun is, however, only implicit in the poem, while the rhyme of "heart" and "hurt" in lines 883-84 makes clear the close association of the terms for Chaucer. Earlier commentators insisted that this was in fact an instance of rime riche or "identical rhyme," but if it is so it is striking that it is the unique instance of the rhyme in Chaucer, whose works are full of occasions for hurt hearts. The essay argues that this is, instead, an instance of near-rhyme and that the confusion in scribal spellings of ME hurten(with ′u,′ ′0,′ ′i,′ ′y,′ and ′e′ ) suggests uncertainties about its root vowel that modem linguistic study has not clarified completely. If the rhyme of herte ("hurt") with herte ("heart") is, however, established by these lines in BD, then it is probably reasonable to ask about all the occasions where characters in the poem are hurt by emotional or physical distress. In the cases of A1cyone and the Man in Blak, the hurt is revealed plainly as the death of a loved one, and Alcyone′s death and the Man in Blak′s return "homwarde" offer contrasting responses to the realization and acknowledgement of their loss. In the case of the Narrator, however, the exact nature of his "hurt" is nowhere made clear and the questions this Jack of clarity raises for the reader remain unanswered when the poem declares its "hert-huntyng" done. Further examination of the Narrator′s character and his role in the poem may reveal him to be a physician himself in need of healing, and this reading of his character may identify him as an ancestor as much of Chaucer′s Pardoner as of the Pilgrim Narrator of Canterbury Tales.
PDF

The Movements of Vocal Folds during Voice Onset Time of Korean Stops

Hong, Ki-Hwan;Kim, Hyun-Ki;Yang, Yoon-Soo;Kim, Bum-Kyu;Lee, Sang-Heon
- Speech Sciences
- /
- v.9 no.1
- /
- pp.17-26
- /
- 2002
Voice onset time (VOT) is defined as the time interval from the oral release of a stop consonant to the onset of glottal pulsing in the following vowel. VOT is a temporal characteristic of stop consonants that reflects the complex timing of glottal articulation relative to supraglottal articulation. There have been many reports on efforts to clarify the acoustical and physiological properties that differentiate the three types of Korean stops, including acoustic, fiberscopic, aerodynamic and electromyographic studies. In the acoustic and fiberscopic studies for stop consonants, the voice onset time and glottal width during the production of stops has been known as the longest and largest in the heavily aspirated type followed by the slightly aspirated type and unaspirated types. The thyroarytenoid and posterior cricoarytenoid muscles were physiologically inter-correlated for differentiating these types of stops. However, a review of the English literature shows that the fine movement of the mucosal edges of the vocal folds during the production of stops has not been well documented. In recent. years, a new method for high-speed recording of laryngeal dynamics by use of a digital recording system allows us to observe with fine time resolution. The movements of the vocal fold edges were documented during the period of stop production using a fiberscopic system of high speed digital images. By observing the glottal width and the visual vibratory movements of the vocal folds before voice onset, the heavily aspirated stop was characterized as being more prominent and dynamic than the slightly aspirated and unaspirated stops.
PDF

Cognitive abilities and speakers' adaptation of a new acoustic form: A case of a /o/-raising in Seoul Korean

Kong, Eun Jong;Kang, Jieun
- Phonetics and Speech Sciences
- /
- v.10 no.3
- /
- pp.1-8
- /
- 2018
The vowel /o/ in Seoul Korean has been undergoing a sound change by altering the acoustic weighting of F2 and F1. Studies documented that this on-going change redefined the nature of a /o/-/u/ contrast as F2 differences rather than as F1 differences. The current study examined two cognitive factors namely executive function capacity (EF) and autistic traits, in terms of their roles in explaining who in speech community would adapt new acoustic forms of the target vowels, and who would retain the old forms. The participants, 55 college students speaking Seoul Korean, produced /o/ and /u/ vowels in isolated words; and completed three EF tasks (Digit N-Back, Stroop, and Trail-Making Task), and an Autism screening questionnaire. The relationships between speakers' cognitive task scores and their utilizations of F1 and F2 were analyzed using a series of correlation tests. Results yielded a meaningful relationship in participants' EF scores interacting with gender. Among the females, speakers with higher EF scores were better at retaining F1, which is a less informative cue for females since they utilized F2 more than they did F1 in realizing /o/ and /u/. In contrast, better EF control among male speakers was associated with more use of the new cue (F2) where males still utilized F1 as much as F2 in the production of /o/ and /u/ vowels. Taken together, individual differences in acoustic realization can be explained by individuals' cognitive abilities, and their progress in the sound change further predicts that cognitive ability influences the utilization of acoustic information which is non-primary to the speaker.
https://doi.org/10.13064/KSSS.2018.10.3.001 인용 PDF KSCI

Speaker Adapted Real-time Dialogue Speech Recognition Considering Korean Vocal Sound System (한국어 음운체계를 고려한 화자적응 실시간 단모음인식에 관한 연구)

Hwang, Seon-Min;Yun, Han-Kyung;Song, Bok-Hee
- The Journal of Korea Institute of Information, Electronics, and Communication Technology
- /
- v.6 no.4
- /
- pp.201-207
- /
- 2013
Voice Recognition technique has been developed and it has been actively applied to various information devices such as smart phones and car navigation system. But the basic research technique related the speech recognition is based on research results in English. Since the lip sync producing generally requires tedious hand work of animators and it serious affects the animation producing cost and development period to get a high quality lip animation. In this research, a real time processed automatic lip sync algorithm for virtual characters in digital contents is studied by considering Korean vocal sound system. This suggested algorithm contributes to produce a natural lip animation with the lower producing cost and the shorter development period.
https://doi.org/10.17661/jkiiect.2013.6.4.201 인용 PDF

Search Result 247, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)