통합 검색 | Korea Science

HMM 기반 감정 음성 합성기 개발을 위한 감정 음성 데이터의 음색 유사도 분석 (Analysis of Voice Color Similarity for the development of HMM Based Emotional Text to Speech Synthesis)

민소연;나덕수
- 한국산학기술학회논문지
- /
- 제15권9호
- /
- pp.5763-5768
- /
- 2014
하나의 합성기에서 감정이 표현되지 않는 기본 음성과 여러 감정 음성을 함께 합성하는 경우 음색을 유지하는 것이 중요해 진다. 감정이 과도하게 표현된 녹음 음성을 사용하여 합성기를 구현하는 경우 음색이 유지되지 못해 각 합성음이 서로 다른 화자의 음성처럼 들릴 수 있다. 본 논문에서는 감정 레벨을 조절하는 HMM 기반 음성 합성기를 구현하기 위해 구축한 음성데이터의 음색 변화를 분석하였다. 음성 합성기를 구현하기 위해서는 음성을 녹음하여 데이터베이스를 구축하게 되는데, 감정 음성 합성기를 구현하기 위해서는 특히 녹음 과정이 매우 중요하다. 감정을 정의하고 레벨을 유지하는 것은 매우 어렵기 때문에 모니터링이 잘 이루어져야 한다. 음성 데이터베이스는 일반 음성과 기쁨(Happiness), 슬픔(Sadness), 화남(Anger)의 감정 음성으로 구성하였고, 각 감정은 High/Low의 2가지 레벨로 구별하여 녹음하였다. 기본음성과 감정 음성의 음색 유사도 측정을 위해 대표 모음들의 각각의 스펙트럼을 누적하여 평균 스펙트럼을 구하고, 평균 스펙트럼에서 F1(제 1포만트)을 측정하였다. 감정 음성과 일반 음성의 음색 유사도는 Low-level의 감정 데이터가 High-level의 데이터 보다 우수하였고, 제안한 방법이 이러한 감정 음성의 음색 변화를 모니터링 할 수 있는 방법이 될 수 있음을 확인할 수 있었다.
https://doi.org/10.5762/KAIS.2014.15.9.5763 인용 PDF KSCI

하악골 전돌증 수술 후 하악골 이동량에 따른 발음 양상에 관한 비교 연구 (COMPARISON OF SPEECH PATTERNS ACCORDING TO THE DEGREE OF SURGICAL SETBACK IN MANDIBULAR PROGNATHIC PATIENTS)

신기영;이동근;오승환;성헌모;이숙향
- Maxillofacial Plastic and Reconstructive Surgery
- /
- 제23권1호
- /
- pp.48-58
- /
- 2001
After performing mandibular setback surgery, we found some changes in patterns and organs of speech. This investigation was undertaken to investigate the aspect and degree of speech patterns according to the amount of surgical setback in mandibular prognathic patients. Thirteen patients with skeletal Class III malocclusion were studied preoperative and postoperative over 6 months. They had undergone the mandible setback operation via bilateral sagittal split ramus osteotomy(BSSRO). We split the patients into two groups. Group 1 included patients whose degree of mandibular setback was 6mm or less, and Group 2 above 6mm. Control group was two adults wish normal speech patterns. A phonetician performed narrow phonetic transcriptions of tape-recorded words and sentences produced by each of the patients and the acoustic characteristics of the plosives, fricatives, and flaps were analyzed with a phonetic computer program (Computerized Speech Lab(CSL) Model 4300B(USA)). The results are as follows: 1. Generally, Patients showed longer closure duration of plosives, shorter VOT(voice onset time) and higher ratio of closure duration against VOT. 2. Patients showed more frequent diffuse distribution than the control group in frication noise energy of fricatives. 3. In fricatives, frequency of compact from were higher in group 1 than in group 2. 4. Generally, a short duration of closure for /ㄹ/ was not realized in the patient's flaps. Instead, it was realized as fricatives, sonorant with a vowel-like formant structure, or trill type consonant. 5. Abnormality of the patient's articulation was reduced, but adaptation of their articulation after surgery was not perfect and the degree of adaptation was different according to the degree of surgical setback.
PDF

악교정 환자의 악교정 수술전후 발음양상에 대한 비교연구 (The Comparative Study of Effect on Speech before and after Orthognathic Surgery of Patients)

권경환;김수남;이동근;조용민;이숙향
- Maxillofacial Plastic and Reconstructive Surgery
- /
- 제22권2호
- /
- pp.191-205
- /
- 2000
The purpose of this study was undertaken to determine the effects of orthognathic surgery on speech. The hyposis stated herein is that functional behaviors of the dentofacial complex, such as speech production, may be adversely affected by deviations of a structural nature(especially, Class III malocclusion). Twenty adults with Class III malocclusion(13 female and 7 male) were studied preoperative, immediate postoperative and either 6 or 12 months postoperative lateral cephalograms. They had mandibular prognathism and had undergone mandible setback operation. The position of tongue, soft palate(Uvula), hyoid bone, respiratory track width, and pharyngeal depth were assessed on lateral cephalograms with 23 cephalometric variables, ANOVA, Paired t-tests and Pearson's product-moment correlation coefficient tests were used to evalute the operative changes in all cephalometric parameters. A experienced speech and language pathologists performed narrow phonetic transcriptions of tape-recorded words and sentences produced by each of the ninth patients and the recording tapes were analyzed by phonetic computer program(Computerized Speech Lab(CSL) Model 4300BI(U.S.A.)) These judges also recorded their ratings of each patient's overall consonants, hypernasality, hyponasality, and articulation proficiency. The results obtained are as follows; 1. There were significant changes in distance of posterior pharyngeal wall to tongue (TI-TW2, TS-TW3) after the surgery at 6 months postoperatively(each p<0.01 p<0.05). 2. The posterior tongue point(TI, TS, PPT) moved posteriorly after surgery and remained to its changed position at 6 months postoperatively(p<0.05). The displacement of tongue was correlated with the movement of mandibular setback amount(p<0.05). The hyoid bone moved posteriorly superiorly after immediate postoperative period. There was significant changes in hyoid bone movement after immediated postoperative period(p<0.05), but returned to its original position during the follow-up period(p>0.05) 3. The soft palate was displaced posteriorly superiorly after immediated operative period and remained to its changed position at 6 months postoperatively(p<0.05). ANS-PNS-SPT angle increasing, PPU-PPPo distance narrowing was showed after surgery, and remained its appearance 6 months postoperatively(p<0.05). 4. There were significant changes in formant value and squre diagram of vowel sound after the orthognathic surgery and the follow-up period. There were significant changes in /ㅅ/sound and posterior tongue sound. 5. The posterior movement of tongue and the posteriosuperior movement of soft palate was correlated with mandibular setback amount after orthognathic surgery. On the vowel squre diagram, the author found that the place of articulation after operation moved downward, backward, upward. 6. In assessing speech abnormalities, dental occlusion should be considered as a contributing factor. The vast majority of subjects with preoperative misarticulations eliminated or reduced their errors following orthognathic surgery. There was significant difference in speech impovement between pre- and postoperation.
PDF

Perception of native Korean Speakers on English and German

Kang, Hyun-Sook;Koo, So-Ryeong;Lee, Sook-hyang
- 대한음성학회:학술대회논문집
- /
- 대한음성학회 2000년도 7월 학술대회지
- /
- pp.86-87
- /
- 2000
In this paper, we discuss why two different surface forms appear in loanwords for English and German /${\int}$/ In Korean, a vowel is inserted into loanwords if a consonant cannot be properly syllabified. Therefore, /${\int}$/ in some positions of loanwords trigger vowel insertion. Interestingly, /${\int}$/s in the onset cluster of English and German words were borrowed in Korean as Iful with the inserted vowel [u] whereas If Is in the coda position of English and German words were borrowed as Ifil with the inserted vowel [i]. For example, 'shrimp' is adopted as [${\int}urimphi$] whereas 'rush' is adopted as [$ra{\int}i$]. In this paper, we attempt to find out the phonetic reason for the distribution of the surface forms of /${\int}$/. We assume that since the formant frequency of [i] is higher than that of [u], the peak frequency of /${\int}$/ with the surface form of [${\int}$i] in loanwords may be higher than that of /${\int}$/ with the surface form of [${\int}u$]. We also assume that duration may be another factor for the distribution of [${\int}i$] and [${\int}u$]. Since /${\int}$/ and /u/ use lip rounding whereas /i/ doesn't, the duration for [${\int}i$] might be longer than that of [${\int}u$]. German supports our assumption. /${\int}$/ in the onset cluster is longer than /${\int}$/ in the coda position. It also has higher peak frequency than that of /${\int}$/ in the coda position. In loanwords, ${\int}$ in the onset cluster is borrowed as [${\int}u$] as in Spiegel whereas /${\int}$/ in the coda position is borrowed as [${\int}i$] as in Bosch. English, however, does not support our assumption. Peak frequency of [${\int}$] depends on the preceding vowel, not on its position in the syllable structure. If the preceding vowel is front, then the peak freuency of the following of the following /${\int}$/ is high but if the preceding vowel is back, than the peak frequency of the following /${\int}$/ is low. The peak frequency of /${\int}$/ in the onset cluster seems to be in between. As we assumed, however, the duration of /${\int}$/ in the coda position is longer than of /${\int}$/ in the onset cluster. With the mixed results, we question whether Koreans really hear two different xounds for /${\int}$/ in English words. For the future experiment, we would like to perform the perception tet for /${\int}$/ in English words.
PDF

Butorphanol의 투여가 장문합술 후 개의 행동에 미치는 영향 (Effects of Butorphanol on Behavior after Intestinal Anastomosis in Dogs)

구자민;이희천;장홍희;성용증;이효종;연성찬
- 한국임상수의학회지
- /
- 제22권1호
- /
- pp.6-15
- /
- 2005
본 연구는 개에서 장문합술 후 비침습적인 행동관찰을 통해 통증을 평가하기 위하여 수행되었으며, 또한 이를 토대로 butorphanol의 진통효과에 대해 연구하였다. 본 실험에서 대조군은 마취를 실시하였으나 장문합술은 시행되지 않았다. 진통제 투여군의 5마리 개들에게는 장문합술을 실시하였고 butorphanol을 투여하였다. 진통제 비투여군의 5마리 개들에게는 진통제 투여 없이 장문합술을 실시하였다. 진통제 투여군의 개들은 수술 전 그리고 수술직후 butorphanol(0.4mg/kg, IM)이 투여되었고, 반면에 대조군과 진통제를 투여하지 않은 군에서는 동일한 양의 멸균 생리식염수가 투여되었다. 개의 행동은 마취 후 400분 동안 비디오테이프로 기록되었고, 그 시간에 실험자는 매 80분마다 개와 상호작용을 하였다. 각각의 상호작용에서, 실험자는 관찰된 행동을 바탕으로 멜버른 대학의 통증 측정방법을 이용하여 통증 점수를 기록하였다. 한 사람의 관찰자에 의해 정량화 된 상호작용과 비 상호작용의 행동을 측정하기 위하여 한 개체에 집중하는 연속적 표본 추출 방법이 적용되었다. 발성은 마취 후 400분 동안 녹음하였고 소리 길이, 소리 강도, 소리 pitch와 1-4 포먼트를 분석하였다. 외과수술은 통증측정 점수를 증가시켰다. 실험자와의 상호작용 중에서 수술 후 인사하는 행동이 감소되었다. 진통제를 투여한 수술군과 위약을 투여한 군사이의 차이점은 정량화된 행동측정과 발성을 통하여 구별할 수 있었다. Butorphanol을 투여한 수술군과 위약을 투여한 군 사이에는 유의적인 차이를 관찰할 수 있었다 (p< 0.05).
PDF KSCI

성대신호 기반의 명령어인식기를 위한 특징벡터 연구 (Effective Feature Vector for Isolated-Word Recognizer using Vocal Cord Signal)

정영규;한문성;이상조
- 한국정보과학회논문지:소프트웨어및응용
- /
- 제34권3호
- /
- pp.226-234
- /
- 2007
본 논문은 환경 노이즈를 원천적으로 차단하는 성대 마이크를 이용한 명령어 인식기를 개발한다. 성대마이크는 환경 노이즈의 효과를 최소화하는 장점이 있다. 그러나 고주파의 부재와 부분적인 포먼트 정보 손실 때문에, 성대마이크를 이용해서 개발된 ASR시스템은 표준마이크를 이용한 시스템에 비해 낮은 성능을 보인다. 이러한 문제 때문에 ASR시스템 개발에 성대마이크를 이용한 경우는 표준 마이크로 낮은 성능을 보인다. 이러한 문제 때문에 ASR시스템 개발에 성대마이크를 이용한 경우는 표준 마이크로부터 입력되는 정보 보안하는데 주로 사용된다. 본 논문은 한국어의 음운적 특정과 신호 분석을 통해 성대마이크만을 사용한 높은 성능의 ASR 시스템을 개발 할 수 있음을 보인다. 주파수 대역내 에너지 합을 이용하는 MFCC 알고리즘이 갖는 성대신호 분석의 문제점을 제시하고, 성대신호를 대상으로 보다 높은 성능을 갖는 특정추출 알고리즘의 조건을 제시한다. 이러한 조건은 (1) 민감한 band-pass filter와 (2) 유/무성음 분리를 위해 사용하는 특정벡터의 사용이다 실험 결과 제안된 조건을 만족하는 ZCPA 알고리즘을 적용한 경우가 MFCC를 적용한 경우보다 약 16%정도의 높은 성능을 보인다. 그러고 CMS와 RASTA와 같은 channel normalization 알고리즘을 적용한 경우 약 2%의 성능 향상이 있다.
PDF KSCI

소프라노의 성악 발성에 대한 음향학적 특징 연구 (A Study on Acoustical Properties of Soprano′s Singing)

임동철;문소연;이행세
- 한국음향학회지
- /
- 제19권5호
- /
- pp.60-64
- /
- 2000
본 논문에서는 소프라노가 성악 발성으로 한국어 단모음을 발음할 때, 그 단모음들의 포르만트가 F0(Fundamental frequency)에 따라 어떻게 바뀌어지는지 연구되었다. 일반적으로 다른 파트의 경우와는 달리, 소프라노가 노래를 할 때에는 포르만트가 그 F0의 영향을 크게 받는 것으로 알려져 있다. 따라서, 성악발성에 대한 연구를 위해서는 소프라노가 발성할 수 있는 전 음역 대의 F0에서 각 모음에 대한 포르만트 분석이 필요하다. 이러한 분석 결과를 바탕으로 성악 발성의 특징들을 패턴화하여 성악발성 평가 시스템이나 성악발성 합성 시스템을 구축할 수 있다. 5명의 전문 소프라노를 대상으로 '아, 에, 이, 오, 우' 5모음의 성악발성을 A3(220.0Hz)에서부터 A5(880.0Hz)까지의 피치에서 포르만트 분석을 하였다. 또한, 일반적인 대화 시 이 5가지 모음의 포르만트를 분석하여 성악발성의 경우와 비교하였다. 연구 결과, '아, 에, 이'의 F2/F1의 그래프가, B4(493.8Hz)이상의 F0에서는 거의 직선으로 나타났다. B4는 Changing Voice가 시작되는 곳으로, 성악가의 음색 변화가 포르만트 형태의 변화와 밀접한 관계가 있음을 알 수 있다. 또한, A5에서는 '아, 에, 이, 오, 우'의 F1, F2의 수치가 거의 일치하는 것으로 나타났다. 즉, 최고음부에서 불려지는 모음들은 서로 구별되기가 어렵게 되는 것이다. 본 논문은 성악발성 평가 시스템이나 성악발성 합성 시스템을 구축할 때에, '아, 오, 우'의 경우에는 B4에서 A5의 F1, F2를 F0대한 기울기로 규정화할 것을 제안한다. 이와 같은 규정화를 통하여 성악발성과 관련된 시스템 구축에 필요한 노력과 비용을 줄일 수 있을 것이다.
PDF

서울말 /?/와 /?/의 지각특성 (Perceptual cues for /o/ and /u/ in Seoul Korean)

변희경
- 말소리와 음성과학
- /
- 제12권3호
- /
- pp.1-14
- /
- 2020
서울말의 ᅩ/ᅮ가 모음 공간에서 융합하는 변화가 진행되고 있는 것이 많은 연구들에 의해 지적되어 왔다. 한편 ᅩ/ᅮ의 중복이 현저한 여성의 발화에서는 포먼트 대신 모음의 음질 (H1-H2)이 ᅩ/ᅮ를 구별하고 있는 것이 확인되었다. 본 고의 목적은 생성에 보이는 ᅩ/ᅮ 간의 H1-H2 차이가 지각에서도 유효한지를 확인하는 것이다. 지각에서도 H1-H2가 유효하게 기능하고 있다면 포먼트만으로는 설명이 어려운 여성의 ᅩ/ᅮ를 생성과 지각 모두에서 H1-H2로 정연하게 설명할 수 있을 것이다. 서울, 경기 출신의 대학생 35명을 대상으로 서울말 모어 화자가 단독발화한 ' 오'와 '우'를 사용하여 지각실험을 실시하였다. 자극음은 모두 여성의 발화로 모음 공간에서 상당한 중복을 보이는 182예이다. 생성과 지각의 관계를 보기 위해 자극음의 음향분석도 실시하였다. 정답률(바르게 지각한 동정률)은 평균 89%로 /ᅩ/가 86% (n=88), /ᅮ/가 91% (n=94)였다. 자극음의 음향분석에서 포먼트로 구별이 어려운 ᅩ/ᅮ의 경우 H1-H2가 유의하게 작용하고 있는 것이 확인되었다. 그러나 ᅩ/ᅮ의 지각에 H1-H2의 영향력은 매우 미미하며 ᅩ/ᅮ의 구별에는 남녀 모두 포먼트가 결정적인 역할을 하는 것으로 나타났다. 생성에서 H1-H2로 ᅩ/ᅮ를 구별하는 것은 여성의 발화로, 남성의 발화에서는 H1-H2가 아닌 포먼트로 구별된다. 남성의 발화에서도 H1-H2 차이가 ᅩ/ᅮ의 구별에 쓰인다면 지각에 있어서도 H1-H2가 주된 지각특성이 될 수도 있겠지만 적어도 현 단계에서 ᅩ/ᅮ의 지각에 H1-H2의 사용은 남녀 모두 아직 도입되지 않은 것으로 보인다.
https://doi.org/10.13064/KSSS.2020.12.3.001 인용 PDF KSCI

챈트 및 읽기 발화조건에 따른 한국어 모음 포먼트 비교 (A comparison of Korean vowel formants in conditions of chanting and reading utterances)

박지혜;성철재
- 말소리와 음성과학
- /
- 제12권3호
- /
- pp.85-94
- /
- 2020
모음의 조음은 발화 시 자음과 결합하여 인접한 자음의 조음에 영향을 미칠 수 있기 때문에 정확한 조음위치를 형성하고 적절한 구강개방을 이루는 것이 중요하나 언어병리 분야의 다양한 대상자들은 이에 어려움을 보인다. 따라서 이들을 위한 치료를 위해서 언어의 특성을 적절히 반영하고 있는 노래(챈트)가 효과적인 도구로 사용될 수 있다. 본 연구에서는 챈트조건과 읽기조건에 따른 모음 특성을 비교하여 챈트조건이 모음 발화 강화의 수단으로 적절한지를 알아보고자 하였다. 연구 대상은 한국어를 모국어로 하는 20~30대 일반 성인 60명(남 30명, 여 30명)이었으며, 모음 /이/, /아/, /우/가 포함된 챈트 및 읽기 과제를 각 4회씩 반복 발화하도록 하여 녹음한 뒤 음성 자료를 분석하였다. 음향 변수를 분석한 결과, 읽기조건과 비교해 챈트조건 발화에서 F₁, F₂ 값이 더 커지고, 모음 삼각형 무게중심의 이동 방향이 전방화, 하강화 된다는 의미 있는 결과를 확인하였다. 남녀 비교 시 여성의 값이 유의하게 컸고, 4회 반복 발화 간에 차이를 보이지 않아 결과의 신뢰도를 높였다. 발화조건 중 단어 수준과 구 수준에 따른 차이는 대부분 보이지 않았으며, 챈트조건의 음악요소에는 악센트가 영향을 미치는 것으로 나타났다. 이와 같은 결과는 챈트의 사용이 웅얼거리는 듯한 발화를 하는 대상자들에게 적절한 구강개방을 이루도록 도울 수 있으며, 후방화된 조음 위치로 인해 오조음을 보이는 이들에게 모음의 무게중심을 이동시키는 효과적인 방법이 될 것임을 시사한다.
https://doi.org/10.13064/KSSS.2020.12.3.085 인용 PDF KSCI

상악 전치부 결손이 발음에 미치는 영향에 관한 연구 (A STUDY OF THE INFLUENCE ON PHONATION WHEN MAXILLARY ANTERIOR TEETH ARE MISSING)

노창섭;최대균;우이형;최부병
- 대한치과보철학회지
- /
- 제30권3호
- /
- pp.338-360
- /
- 1992
This study was performed to investigate the phonetic alterations with upper anterior teeth were missing. To compare the changes of the phonations, before and after insertion of the temporary prosthesis, six subjects who lost their upper anterior teeth were selected (2-male, 4-female). Tested sounds (/ga(가), na(나), da(다), ra(라), sa(사), ja(자), cha(차), ta(타), pa(파), ha(하), gi(기), ni(니), di(디), ri(리), si(시), jl(지), chi(치), ti(티), pi(피), hi(히), seu(스), se(세), so(소), su(수)/were programmed into an IBM AT with and without temporary prosthesis. These experiments were analyzed by formants, consonants durations, and energy level changes with an LSI speech work station program. During the pronunciation of the tested sounds (with and without temporary prosthesis), mandibular movements were recorded to a Mandibular Kinesiogram and analyzed . The findings led to the following conclusions: 1. Objective differences could not be found. However, in every informant, subjective improvement could be noticed. 2. There were no persistant correlations of the formant's changes. And in every informant, phonetic changes were variable. 3. There were various changes of the consonant durations in every informant. By and large, those of /si(시), jl(지), chi(치), Pi(피), hi(히)/ were longer than other tested sounds. After insertion of the prosthesis, durations were shorter. Consonants with /i(ㅣ)/ were longer than with /a(ㅏ)/, with or without prosthesis. 4. With and without temporary prosthesis, mandibular movements were various in the frontal view. Mandibular movements showed lateral deviations, and mandibular positions with /si(시), ji(지), ti(티), seu(스), hi(히)/ were nearer to the mandibular rest position. 5. The kinds of temporary prosthesis and conditions of the missing teeth influenced every informant variously, so there were no correlation between informants. 6. Energy levels increased in all tested sounds with a fixed temporary prosthesis. And, there were no differences between before and after insertion of a removable temporary prosthesis. However, sibilant sounds, and consonants with /i(ㅣ)/ showed a little increased energy level.
PDF

검색결과 414건 처리시간 0.019초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)