통합 검색 | Korea Science

가변 운율 모델링을 이용한 고음질 감정 음성합성기 구현에 관한 연구 (A Study on Implementation of Emotional Speech Synthesis System using Variable Prosody Model)

민소연;나덕수
- 한국산학기술학회논문지
- /
- 제14권8호
- /
- pp.3992-3998
- /
- 2013
본 논문은 고음질의 대용량 코퍼스 기반 음성 합성기에 감정 음성 코퍼스를 추가하여 보다 다양한 합성음을 생성할 수 있는 방법에 관한 것이다. 파형 접합형 합성기에서 사용할 수 있는 형태로 감정 음성 코퍼스를 구축하여 기존의 일반 음성 코퍼스와 동일한 합성단위 선택과정을 통해 합성음을 생성할 수 있도록 구현하였다. 감정 음성 합성을 위해 태그를 사용하여 텍스트를 입력하고, 억양구 단위로 일치하는 데이터가 존재하는 경우 감정 음성으로 합성하고, 그렇지 않은 경우 일반 음성으로 합성하도록 하였다. 그리고 음성에서 운율을 구성하는 요소로 휴지기(break)가 있는데, 감정 음성의 휴지기는 일반 음성보다 불규칙한 특성이 있다. 따라서 합성기에서 생성되는 휴지기 정보를 감정 음성 합성에 그대로 사용하는 것이 어려워진다. 이 문제를 해결하기 위해 가변 휴지기(Variable break)[3] 모델링을 적용하였다. 실험은 일본어 합성기를 사용하였고, 그 결과 일반 음성의 휴지기 예측 모듈을 그대로 사용하면서 자연스러운 감정 합성음을 얻을 수 있었다.
https://doi.org/10.5762/KAIS.2013.14.8.3992 인용 PDF KSCI

구강 개방 상태에 따른 말 명료도 및 말 용인도 특성 (Characteristics of speech intelligibility and speech acceptability connected with mouth opening condition)

송윤경
- 말소리와 음성과학
- /
- 제3권3호
- /
- pp.141-148
- /
- 2011
There are many factors that affect speech intelligibility and speech acceptability. Structural anomalies and neuromotor pathologies are known for the reasons of abnormal speech sounds. And there are minor variations related to oral mechanism. Speaking with restricted mouth opening related to therapeutic procedure or habitual speech pattern might affect the quality of speech sounds. So this study compared speech intelligibility and speech acceptability of recorded 24 words in two conditions (restricted mouth opening condition and normal mouth opening condition) by 30 normal hearing adults. The results showed that speech intelligibility and speech acceptability were significantly lower in restricted mouth opening condition. And speech acceptability was significantly lower than speech intelligibility in restricted mouth opening condition. Speech acceptability in restricted mouth opening condition was significantly lower especially in open vowel. These findings indicated that the mouth opening condition could affect vowel shape and could be an adverse effect on speech intelligibility and speech acceptability.
PDF

정신분열병환자의 언어지각 능력 - 우울증 환자군, 정상인과의 비교 연구 - (Speech Perception Ability of Schizophrenics - A Comparative Study with Depressives & Normal Control -)

정영조;이순정;이승환
- 생물정신의학
- /
- 제9권2호
- /
- pp.112-119
- /
- 2002
Object:This study was to investigate the difference of speech perception ability in schizophrenic patients, and depression patients in order to explore trait-dependent speech perception ability of each disorder. Methods:The speech perception ability was assessed with masked speech tracking test(MST) in schizophrenic patients(N=31), depression patients(N=25), and normal controls(N=21). The continuous performance test(CPT) and sentence repetition test(SRT) were also used for assessment of attention and working memory. Results:The schizophrenic patients showed significant impaired MST performance, compared with depressive patients and normal controls. The performances of CPT and SRT were also more impaired in schizophrenic patients. The difference of MST performances between two patient group was cancelled out after consideration of differences in CPT & SRT performances. Conclusions:These results imply that schizophrenic patients have the impaired speech perception ability compared with depressive patients and normal controls. But speech perception ability was significantly influenced with CPT and SRT. For evaluation of pure speech perception ability, the more elaborate controlled study that excluded factors such as attention, working memory and intelligence is needed.
PDF

Low Frequency Perception of Rhythm and Intonation Speech Patterns by Normal Hearing Adults

Kim, Young-Sun;Asp, Carl-W.
- 음성과학
- /
- 제9권1호
- /
- pp.7-16
- /
- 2002
This study tested normal hearing adults' auditory perception of rhythm and intonation patterns, with low-frequency speech energy. The results showed that the narrow-band low-frequency zones of 125, 250, or 500 Hz provided the same important rhythm and intonation cues as did the wide-band condition. This suggested that an auditory training strategy that uses low-frequency filters would be effective for structuring or re-structuring the perception of rhythm and intonation patterns. These filters force the client to focus on these patterns, because the speech intelligibility is drastically reduced. This strategy can be used with both normal-hearing and hearing impaired children and adults with poor listening skills, and possibly poor speech intelligibility.
PDF

뇌성마비 성인 발화의 운율특성 (Prosodic Properties in the Speech of Adults with Cerebral Palsy)

이숙향;고현주;김수진
- 대한음성학회지:말소리
- /
- 제64호
- /
- pp.39-51
- /
- 2007
The purpose of this study is to investigate prosodic characteristics in the speech of adults with cerebral palsy through a comparison with the speech of normal speakers. Ten speakers with cerebral palsy (6 males, 4 females) and 6 normal speakers (3 males, 3 females) served as subjects. The results revealed that, compared to normal speakers, speakers with cerebral palsy showed a slower speech rate, a larger number of intonational phrases(IPs) and pauses, a larger number of accentual phrases(APs) per IP, a longer duration of pauses, and more gradual slopes of [L +H] in APs. However, the two groups showed similar tone patterns in their APs. The results also showed mild to moderate correlations between speech intelligibility and the prosodic properties which showed significant differences between the two groups, suggesting that they could be important prosodic factors to predict speech intelligibility in the speech of adults with cerebral palsy.
PDF

비 유창성 실어증 환자 말소리의 음향학적 분석 (An Acoustic Analysis of Speech in Patients with Nonfluent Aphasia)

김현기;강은영;김연희
- 음성과학
- /
- 제9권3호
- /
- pp.87-97
- /
- 2002
The purpose of this study is to analyze the speech duration in Korean-speaking aphasics. Five patients with nonfluent aphasia (2 with traumatic brain injury and 3 with strokes) and five normal adults participated in this experiment. The mean age in patients with nonfluent aphasia was $45.8\pm2.3$ years and $47.4\pm2.3$ years for the normal adults. The Computerized Speech Lab was used to evaluate the acoustic characteristics of the subjects. Voice onset time, vowel duration, total duration, hold and consonant duration were evaluated for the monosyllabic and the polysyllabic words. The patients with nonfluent aphasia did not show the voicing bar on hold area, however, it was seen in the normal persons in the intervocalic position. Explosion duration of glottalized stops in the intervocalic position was significantly prolonged in nonfluent aphasics in comparison with the normal persons. This suggestes that the laryngeal adjustment is disturbed in these patients. Consonant duration, vowel duration, and total duration of the polysyllabic words were significantly longer in the patients with nonfluent aphasia than those of the normal persons. These results demonstrate the disturbances in controlling articulatory muscles during sound production in patients with nonfluent aphasia. The objective and quantitative analysis based on the acoustic characteristics of nonfluent aphasics, will be very useful in therapeutic planning and on the the effects of speech therapy.
PDF

The Effect of the Speech Enhancement Algorithm for Sensorineural Hearing Impaired Listeners

Kim, Dong-Wook;Lee, Young-Woo;Lee, Jong-Shill;Chee, Young-Joon;Lee, Sang-Min;Kim, In-Young;Kim, Sun-I.
- 대한의용생체공학회:의공학회지
- /
- 제28권6호
- /
- pp.732-743
- /
- 2007
Background noise is one of the major complaints of not only hearing impaired persons but also normal listeners. This paper describes the results of two experiments in which speech recognition performance was determined for listeners with normal hearing and sensorineural hearing loss in noise environment. First, we compared speech enhancement algorithms by evaluation speech recognition ability in various speech-to-noise ratios and types of noise. Next, speech enhancement algorithms by reducing background noise were presented and evaluated to improve speech intelligibility for sensorineural hearing impairment listeners. We tested three noise reduction methods using single-microphone, such as spectrum subtraction and companding, Wiener filter method, and maximum likelihood envelop estimation. Their responses in background noise were investigated and compared with those by the speech enhancement algorithm that presented in this paper. The methods improved speech recognition test score for the sensorineural hearing impaired listeners, but not for normal listeners. The results suggest the speech enhancement algorithm with the loudness compression can improve speech intelligibility for listeners with sensorineural hearing loss.
https://doi.org/10.9718/JBER.2007.28.6.732 인용 PDF KSCI

구개열(口蓋裂) 환자(患者)에 있어서 구개(口蓋) 성형술후(成形術後) 비인강(鼻咽腔) 폐쇄(閉鎖)에 관(關)한 임상적(臨床的) 연구(硏究) (CLINICAL STUDY OF VELOPHARYNGEAL CLOSURE AFTER THE PRIMARY PALATORRHAPHY IN CLEFT PALATE PATIENTS)

고광희;신효근
- Maxillofacial Plastic and Reconstructive Surgery
- /
- 제14권1_2호
- /
- pp.1-21
- /
- 1992
In order to find the causes of velopharyngeal incompetency after primary palatorrhaphy in cleft patients, we analyzed the form and function of the velopharyngeal space of fifteen operated cleft palate patients and five normal subjects. The velopharyngeal function was evaluated by lateral cephalometric radiography, velopharyngography and hypernasality cul-de-sac test. The obtained results were as follows. 1. The rate of velopharyngeal incompetency was twenty percent, three of the fifteen operated patients. Two of them were complete cleft palate and the other was incomplete one. 2. The length of soft palate and levator eminence were longer in normal group than those of good speech group and complete cleft palate group during phonation of /i/ (P<0.05). The lengthening rate of soft palate was smaller in good and poor speech group than that of normal group(P<0.05), and, reduced in order, normal group, complete cleft palate group and incomplete palate group(P<0.05). 3. The nasopharyngeal distance had no significant difference between all groups at rest, but, smaller in normal group than that of both cleft palate group(P<0.05), good speech group and poor speech group(P<0.05) during phonation of /i/ The difference in nasopharyngeal distance between rest and /i/ phonation was greater in normal group than that of both cleft palate group, good speech group and poor speech group. 4. The moving distance of sop palate reduced in order, normal group, incomplete cleft palate group, complete cleft palate group(P<0.05). 5. The distance between lateral pharyngeal wall had no significant difference between all groups in rest, but, smaller than that of complete cleft palate group in normal group(P<0.01) and increased in order normal group, good speech group, poor speech group(P<0.01) during phonation of /a/. The mobility of lateral wall was reduced in order, normal group, good speech group poor speech group(P<0. 01). 6. There was low corelationship between the mobility of lateral pharyngeal wall and soft palate. Therfore, it suggest that the movements of lateral pharyngeal wall and soft palate occurs independently.
PDF

Improved Acoustic Modeling Based on Selective Data-driven PMC

Kim, Woo-Il;Kang, Sun-Mee;Ko, Han-Seok
- 음성과학
- /
- 제9권1호
- /
- pp.39-47
- /
- 2002
This paper proposes an effective method to remedy the acoustic modeling problem inherent in the usual log-normal Parallel Model Composition intended for achieving robust speech recognition. In particular, the Gaussian kernels under the prescribed log-normal PMC cannot sufficiently express the corrupted speech distributions. The proposed scheme corrects this deficiency by judiciously selecting the 'fairly' corrupted component and by re-estimating it as a mixture of two distributions using data-driven PMC. As a result, some components become merged while equal number of components split. The determination for splitting or merging is achieved by means of measuring the similarity of the corrupted speech model to those of the clean model and the noise model. The experimental results indicate that the suggested algorithm is effective in representing the corrupted speech distributions and attains consistent improvement over various SNR and noise cases.
PDF

A Study on Intonation Patterns of Speech Produced by Cochlear Implanted Children

Park, Sang-Hee;Jang, Tae-Yeoub;Lee, Sang-Heun;Jeong, Ok-Ran;Seok, Dong-Il
- 음성과학
- /
- 제9권1호
- /
- pp.27-38
- /
- 2002
The purpose of the study is to examine intonation patterns of cochlear implanted children compared with those of normal hearing children. The data tokens of three normal and five cochlear implanted children were collected and investigated. Their intonation patterns were analyzed using the speech analysis tool, Praat. The characteristics of the two utterance types, interrogative and declarative, were investigated. No significant difference in intonation patterns between the two subject groups was found. However, the general pitch of cochlear implanted children was higher than that of normal hearing children. In addition, cochlear implanted children showed frequent pitch breaks.
PDF

검색결과 630건 처리시간 0.027초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)