통합 검색 | Korea Science

음성합성시스템을 위한 음색제어규칙 연구 (A Study on Voice Color Control Rules for Speech Synthesis System)

김진영;엄기완
- 음성과학
- /
- 제2권
- /
- pp.25-44
- /
- 1997
When listening the various speech synthesis systems developed and being used in our country, we find that though the quality of these systems has improved, they lack naturalness. Moreover, since the voice color of these systems are limited to only one recorded speech DB, it is necessary to record another speech DB to create different voice colors. 'Voice Color' is an abstract concept that characterizes voice personality. So speech synthesis systems need a voice color control function to create various voices. The aim of this study is to examine several factors of voice color control rules for the text-to-speech system which makes natural and various voice types for the sounding of synthetic speech. In order to find such rules from natural speech, glottal source parameters and frequency characteristics of the vocal tract for several voice colors have been studied. In this paper voice colors were catalogued as: deep, sonorous, thick, soft, harsh, high tone, shrill, and weak. For the voice source model, the LF-model was used and for the frequency characteristics of vocal tract, the formant frequencies, bandwidths, and amplitudes were used. These acoustic parameters were tested through multiple regression analysis to achieve the general relation between these parameters and voice colors.
PDF

낮은 차원의 벡터 변환을 통한 음성 변환 (Voice conversion using low dimensional vector mapping)

이기승;도원;윤대희
- 전자공학회논문지S
- /
- 제35S권4호
- /
- pp.118-127
- /
- 1998
In this paper, we propose a voice personality transformation method which makes one person's voice sound like another person's voice. In order to transform the voice personality, vocal tract transfer function is used as a transformation parameter. Comparing with previous methods, the proposed method can obtain high-quality transformed speech with low computational complexity. Conversion between the vocal tract transfer functions is implemented by a linear mapping based on soft clustering. In this process, mean LPC cepstrum coefficients and mean removed LPC cepstrum modeled by the low dimensional vector are used as transformation parameters. To evaluate the performance of the proposed method, mapping rules are generated from 61 Korean words uttered by two male and one female speakers. These rules are then applied to 9 sentences uttered by the same persons, and objective evaluation and subjective listening tests for the transformed speech are performed.
PDF

최적 분류 변환을 이용한 음성 개성 변환 (Voice Personality Transformation Using an Optimum Classification and Transformation)

이기승
- 한국음향학회지
- /
- 제23권5호
- /
- pp.400-409
- /
- 2004
본 논문에서는 임의의 화자가 발성한 음성을 다른 화자가 발성한 음성처럼 들리도록 변환하는 음성 변환 알고리즘을 제안하였다. 개인이 지니고 있는 음성의 특성을 변환하기 위해 성도 전달 함수의 특성을 변환 변수로 사용하였으며, 기존의 기법과 비교하여 목표 화자의 음성과 주관적, 객관적으로 더욱 유사한 변환음을 얻기 위한 새로운 방법을 제안하였다. 성도 전달 함수의 변환은 전체 특징 벡터 공간을 분류 한 뒤, 각 구획에 대한 선형 변환식을 통해 구현된다. 특징 변수로서 LPC 켑스트럼을 사용하였으며, 벡터 공간의 분류와 선형 변환식의 추정을 동시에 최적화시키는 분류-변환 알고리즘이 새로이 제안되었다. 제안된 음성 변환 기법의 성능을 평가하기 위해 3명의 남성 화자와 1명의 여성 화자로부터 수집된 약 150개의 문장을 사용하여 변환 규칙을 생성하였으며, 이를 동일한 화자가 발성한 다른 150개의 문장에 대해 적용하여 객관적인 성능 평가와 주관적 청취 테스트를 수행하였다.
PDF KSCI

다중 응답 분류회귀트리를 이용한 음성 개성 변환 (Voice Personality Transformation Using a Multiple Response Classification and Regression Tree)

이기승
- 한국음향학회지
- /
- 제23권3호
- /
- pp.253-261
- /
- 2004
본 논문에서는 음성 신호가 지니고 있는 화자 의존적 특징 변수를 변환 시키는 음성 개성 변환 기법이 새롭게 제안되었다. 제안된 방법은 성도 전달 함수의 특성을 반영하는 켑스트럼 벡터와 여기 신호의 특성을 반영하는 피치 값을 변환 대상 변수로 삼았으며, 이들에 대한 변환 기법으로 다중 응답 분류 회귀 트리를 사용하였다. 다중 응답 분류 회귀 트리는 기존의 분류 회귀 트리를 다차원 확장시킨 형태로서, 반응값이 벡터 형태로 존재하는 분류 회귀 트리를 의미한다. 본 논문에서는 기존의 코드북 메핑 방법과 비교하여 제안된 기법의 성능을 평가하였으며, 분류 회귀 트리에 입력되는 관찰값을 다양하게 변화시켜 트리의 복잡도와 변환 성능을 정량적으로 분석하였다. 네 명의 화자를 이용한 음성 개성 변환 실험에서, 기존의 코드북 메핑과 비교하여 객관적으로 우수한 성능을 나타내었으며, 청취 테스트에서도 변환음이 목표로 하는 화자의 음성과 유사함을 관찰할 수 있었다.
PDF KSCI

직교 벡터 공간 변환을 이용한 음성 개성 변환 (Voice personality transformation using an orthogonal vector space conversion)

이기승;박군종;윤대희
- 전자공학회논문지B
- /
- 제33B권1호
- /
- pp.96-107
- /
- 1996
본 논문에서는 직교 벡터 공간 변환을 이용한 새로운 음성 개성 변환 알고리즘을 제안하였다. 음성 개성 변환이란 임의 환자(source)가 가지고 있는 몇 개의 특징 변수를 다른 화자(target)의 특징 변수로 변환하는 기법이다. 본 논문에서는 LPC 켑스트럼 계수와 여기 신호의 스펙트럼, 그리고 피치 궤적을 변환하여 음성 개성변환을 구현하였다. LPC 켑스트럼 계수의 변환을 위해 직교 벡터 공간 변환 기법이 제안되었다. 이 기법은 KL(Karhunen-Loeve)변환을 이용한 principle component의 분리와 최소 자승 오차를 갖는 선형 좌표 변환을 통해 LPC 켑스트럼의 변환을 수행한다. 또한, 화자간의 운율적인 특징을 변환하기 위해 피치 궤적 변환 기법이 제안되었다. 피치 궤적 변환을 위하여 먼저 두 화자간의 기준 피치 패턴의 작성하고 기준 패턴간의 대응 관계를 추정한 후 이를 이용하여 source 화자의 피치 패턴이 target 피치 패턴으로 변환되도록 하였다. 컴퓨터를 이용한 모의 실험 결과 제안된 알고리즘은 객관적인 평가와 주관적인 평가에 있어서 우수한 성능을 나타내었다.
PDF

말소리와 성격 이미지 (Speech sound and personality impression)

이은영;유혜옥
- 말소리와 음성과학
- /
- 제9권4호
- /
- pp.59-67
- /
- 2017
Regardless of their intention, listeners tend to assess speakers' personalities based on the sounds of the speech they hear. Assessment criteria, however, have not been fully investigated to indicate whether there is any relationship between the acoustic cue of produced speech sounds and perceived personality impression. If properly investigated, the potential relationship between these two will provide crucial insights on the aspects of human communications and further on human-computer interaction. Since human communications have distinctive characteristics of simultaneity and complexity, this investigation would be the identification of minimum essential factors among the sounds of speech and perceived personality impression. The purpose of this study, therefore, is to identify significant associations between the speech sounds and perceived personality impression of speaker by the listeners. Twenty eight subjects participated in the experiment and eight acoustic parameters were extracted by using Praat from the recorded sounds of the speech. The subjects also completed the Neo-five Factor Inventory test so that their personality traits could be measured. The results of the experiment show that four major factors(duration average, pitch difference value, pitch average and intensity average) play crucial roles in defining the significant relationship.
https://doi.org/10.13064/KSSS.2017.9.4.059 인용 PDF KSCI

음악치료사의 목소리 사용 경험에 대한 현상학적 연구 (A Phenomenological Study of Music Therapist's Experiences of Using Voice)

신진희;소혜진
- 한국엔터테인먼트산업학회논문지
- /
- 제13권2호
- /
- pp.155-167
- /
- 2019
이 연구는 음악치료사들의 목소리 사용 경험을 심층적으로 탐구하는 데 목적이 있다. 연구자는 이와 관련된 경험에 대하여 자세히 진술해줄 수 있는 음악치료사 7명을 대상으로 심층면담을 진행하였으며, Giorgi의 현상학적 연구방법으로 분석하였다. 분석결과 '임상에서의 목소리 사용으로 인한 다양한 감정촉진', '치료사의 개인적 성향에 따른 목소리 사용', '치료적 목적을 위한 목소리 사용', '목소리 사용으로 인한 내담자와의 긍정적인 음악적 경험', '음악치료 도구로서의 목소리 사용의 어려움', '만족스럽지 않은 목소리의 변화 시도'로 총 6개의 구성요소가 도출되었다. 본 연구를 통해 음악치료사들은 음악치료 안에서 노래 부르기 혹은 즉흥적 노래를 통해 내담자와의 긍정적인 관계를 형성할 수 있다고 보고하였다. 그러나 개인적인 이유로 목소리 사용에 있어서 어려움을 경험하기도 하였다고 설명하였으며 다양한 자기작업을 통해서 목소리 사용을 확장하고 더 나아가서 자신의 변화와 성장을 실현하고자 하였다고 진술하였다.
https://doi.org/10.21184/jkeia.2019.2.13.2.155 인용

보이스피싱 발생 및 대응방안 (Voice Phishing Occurrence and Counterplan)

조호대
- 한국콘텐츠학회논문지
- /
- 제12권7호
- /
- pp.176-182
- /
- 2012
보이스피싱(Voice Phishing)은 전화를 이용하여 개인정보를 불법적으로 알아내어 이를 토대로 예금을 인출해가는 사기수법으로 피해사례들이 속출하면서 새로운 사회문제로 등장하였다. 그 피해의 대상은 선량한 일반 시민으로 무차별적으로 공략하고 있으며 주로 중국인 대만인 등 외국인들에 의해 저질러지는 범죄이다. 범죄의 착수가 우리나라 국경 밖에서 이루어지고 있다는 점에서 새로운 형태의 범죄유형이라 할 수 있다. 이에 본 연구는 보이스피싱과 관련하여 현재의 발생실태와 사례를 분석하고 효과적인 대응현황을 모색하고자 한다. 보이스피싱 관련 범죄는 지속적인 홍보와 단속에도 불구하고 범죄가 근절되지 않고 오히려 수법이 다양화 전문화 되면서 발전해 가는 양상을 보이고 있다. 향후 보이스피싱을 근절하기 위해서는 금융 통신 수사분야에서 문제점에 대한 대응방안이 마련되어야 할 것으로 본다. 또한 신속한 수사의 착수와 수사관련 기법의 개발을 통해 경찰 단속활동이 강화되어야 할 것이고, 국제 범죄적 성격을 보이고 있으므로 인터폴등 관련기관 및 국제공조협력이 강화되어야 한다.
https://doi.org/10.5392/JKCA.2012.12.07.176 인용 PDF KSCI

확률적 방법을 이용한 음성 개성 변환 (Voice Personality Transformation Using a Probabilistic Method)

이기승
- 한국음향학회지
- /
- 제24권3호
- /
- pp.150-159
- /
- 2005
본 논문에서는 임의의 음성을 특정 화자가 발성한 것처럼 들리도록 변환하는 음성 개성 변환 알고리즘에 대해 연구하였다. 제안된 기법은 화자의 음성을 LPC 켑스트럼, 피치, 발성 속도를 사용하여 표현하였으며 각각에 대한 변환 규칙을 생성하여 변환을 수행하였다. LPC 켑스트럼은 혼합 가우시안 모델을 이용한 확률적으로 모델링하고, 두 화자간의 대응관계를 조건 확률로 나타내었다. 확률적인 모델링에 필요한 각종 파라메터들을 얻기 위해 최대 가능도 기법이 사용되었으며, 변환 LPC 켑스트럼은 최소 자승 오차 방법에 근거하여 얻어지도록 하였다. 운율 변환을 위한 변수로 본 논문에서는 피치와 발성 속도를 사용하였으며, 두 음성간의 평균값 비율을 사용하여 운율 변환을 수행하였다. 제안된 기법은 기존 벡터 양자화 기반의 기법과 비교에서, 객관적인 척도로 사용한 평균 켑스트럼 거리 감소율, 가능도 증가율 면에서 우수한 성능을 나타내었다. 주관적인 테스트에서도 기존의 방법과 유사한 인식율을 얻었으며 특히 완만하게 변화하는 스펙트럼 궤적에 따른 고음질이 얻어짐을 확인할 수 있었다.
PDF KSCI

Emotion Recognition using Short-Term Multi-Physiological Signals

Kang, Tae-Koo
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- 제16권3호
- /
- pp.1076-1094
- /
- 2022
Technology for emotion recognition is an essential part of human personality analysis. To define human personality characteristics, the existing method used the survey method. However, there are many cases where communication cannot make without considering emotions. Hence, emotional recognition technology is an essential element for communication but has also been adopted in many other fields. A person's emotions are revealed in various ways, typically including facial, speech, and biometric responses. Therefore, various methods can recognize emotions, e.g., images, voice signals, and physiological signals. Physiological signals are measured with biological sensors and analyzed to identify emotions. This study employed two sensor types. First, the existing method, the binary arousal-valence method, was subdivided into four levels to classify emotions in more detail. Then, based on the current techniques classified as High/Low, the model was further subdivided into multi-levels. Finally, signal characteristics were extracted using a 1-D Convolution Neural Network (CNN) and classified sixteen feelings. Although CNN was used to learn images in 2D, sensor data in 1D was used as the input in this paper. Finally, the proposed emotional recognition system was evaluated by measuring actual sensors.
https://doi.org/10.3837/tiis.2022.03.018 인용 PDF KSCI HTML

검색결과 26건 처리시간 0.026초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)