Search | Korea Science

A Study on Voice Color Control Rules for Speech Synthesis System (음성합성시스템을 위한 음색제어규칙 연구)

Kim, Jin-Young;Eom, Ki-Wan
- Speech Sciences
- /
- v.2
- /
- pp.25-44
- /
- 1997
When listening the various speech synthesis systems developed and being used in our country, we find that though the quality of these systems has improved, they lack naturalness. Moreover, since the voice color of these systems are limited to only one recorded speech DB, it is necessary to record another speech DB to create different voice colors. 'Voice Color' is an abstract concept that characterizes voice personality. So speech synthesis systems need a voice color control function to create various voices. The aim of this study is to examine several factors of voice color control rules for the text-to-speech system which makes natural and various voice types for the sounding of synthetic speech. In order to find such rules from natural speech, glottal source parameters and frequency characteristics of the vocal tract for several voice colors have been studied. In this paper voice colors were catalogued as: deep, sonorous, thick, soft, harsh, high tone, shrill, and weak. For the voice source model, the LF-model was used and for the frequency characteristics of vocal tract, the formant frequencies, bandwidths, and amplitudes were used. These acoustic parameters were tested through multiple regression analysis to achieve the general relation between these parameters and voice colors.
PDF

Voice conversion using low dimensional vector mapping (낮은 차원의 벡터 변환을 통한 음성 변환)

Lee, Kee-Seung;Doh, Won;Youn, Dae-Hee
- Journal of the Korean Institute of Telematics and Electronics S
- /
- v.35S no.4
- /
- pp.118-127
- /
- 1998
In this paper, we propose a voice personality transformation method which makes one person's voice sound like another person's voice. In order to transform the voice personality, vocal tract transfer function is used as a transformation parameter. Comparing with previous methods, the proposed method can obtain high-quality transformed speech with low computational complexity. Conversion between the vocal tract transfer functions is implemented by a linear mapping based on soft clustering. In this process, mean LPC cepstrum coefficients and mean removed LPC cepstrum modeled by the low dimensional vector are used as transformation parameters. To evaluate the performance of the proposed method, mapping rules are generated from 61 Korean words uttered by two male and one female speakers. These rules are then applied to 9 sentences uttered by the same persons, and objective evaluation and subjective listening tests for the transformed speech are performed.
PDF

Voice Personality Transformation Using an Optimum Classification and Transformation (최적 분류 변환을 이용한 음성 개성 변환)

이기승
- The Journal of the Acoustical Society of Korea
- /
- v.23 no.5
- /
- pp.400-409
- /
- 2004
In this paper. a voice personality transformation method is proposed. which makes one person's voice sound like another person's voice. To transform the voice personality. vocal tract transfer function is used as a transformation parameter. Comparing with previous methods. the proposed method makes transformed speech closer to target speaker's voice in both subjective and objective points of view. Conversion between vocal tract transfer functions is implemented by classification of entire vector space followed by linear transformation for each cluster. LPC cepstrum is used as a feature parameter. A joint classification and transformation method is proposed, where optimum clusters and transformation matrices are simultaneously estimated in the sense of a minimum mean square error criterion. To evaluate the performance of the proposed method. transformation rules are generated from 150 sentences uttered by three male and on female speakers. These rules are then applied to another 150 sentences uttered by the same speakers. and objective evaluation and subjective listening tests are performed.
PDF KSCI

Voice Personality Transformation Using a Multiple Response Classification and Regression Tree (다중 응답 분류회귀트리를 이용한 음성 개성 변환)

이기승
- The Journal of the Acoustical Society of Korea
- /
- v.23 no.3
- /
- pp.253-261
- /
- 2004
In this paper, a new voice personality transformation method is proposed. which modifies speaker-dependent feature variables in the speech signals. The proposed method takes the cepstrum vectors and pitch as the transformation paremeters, which represent vocal tract transfer function and excitation signals, respectively. To transform these parameters, a multiple response classification and regression tree (MR-CART) is employed. MR-CART is the vector extended version of a conventional CART, whose response is given by the vector form. We evaluated the performance of the proposed method by comparing with a previously proposed codebook mapping method. We also quantitatively analyzed the performance of voice transformation and the complexities according to various observations. From the experimental results for 4 speakers, the proposed method objectively outperforms a conventional codebook mapping method. and we also observed that the transformed speech sounds closer to target speech.
PDF KSCI

Voice personality transformation using an orthogonal vector space conversion (직교 벡터 공간 변환을 이용한 음성 개성 변환)

Lee, Ki-Seung;Park, Kun-Jong;Youn, Dae-Hee
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.33B no.1
- /
- pp.96-107
- /
- 1996
A voice personality transformation algorithm using orthogonal vector space conversion is proposed in this paper. Voice personality transformation is the process of changing one person's acoustic features (source) to those of another person (target). In this paper, personality transformation is achieved by changing the LPC cepstrum coefficients, excitation spectrum and pitch contour. An orthogonal vector space conversion technique is proposed to transform the LPC cepstrum coefficients. The LPC cepstrum transformation is implemented by principle component decomposition by applying the Karhunen-Loeve transformation and minimum mean-square error coordinate transformation(MSECT). Additionally, we propose a pitch contour modification method to transform the prosodic characteristics of any speaker. To do this, reference pitch patterns for source and target speaker are firstly built up, and speaker's one. The experimental results show the effectiveness of the proposed algorithm in both subjective and objective evaluations.
PDF

Speech sound and personality impression (말소리와 성격 이미지)

Lee, Eunyung;Yuh, Heaok
- Phonetics and Speech Sciences
- /
- v.9 no.4
- /
- pp.59-67
- /
- 2017
Regardless of their intention, listeners tend to assess speakers' personalities based on the sounds of the speech they hear. Assessment criteria, however, have not been fully investigated to indicate whether there is any relationship between the acoustic cue of produced speech sounds and perceived personality impression. If properly investigated, the potential relationship between these two will provide crucial insights on the aspects of human communications and further on human-computer interaction. Since human communications have distinctive characteristics of simultaneity and complexity, this investigation would be the identification of minimum essential factors among the sounds of speech and perceived personality impression. The purpose of this study, therefore, is to identify significant associations between the speech sounds and perceived personality impression of speaker by the listeners. Twenty eight subjects participated in the experiment and eight acoustic parameters were extracted by using Praat from the recorded sounds of the speech. The subjects also completed the Neo-five Factor Inventory test so that their personality traits could be measured. The results of the experiment show that four major factors(duration average, pitch difference value, pitch average and intensity average) play crucial roles in defining the significant relationship.
https://doi.org/10.13064/KSSS.2017.9.4.059 인용 PDF KSCI

A Phenomenological Study of Music Therapist's Experiences of Using Voice (음악치료사의 목소리 사용 경험에 대한 현상학적 연구)

Shin, JinHee;So, HyeJin
- Journal of Korea Entertainment Industry Association
- /
- v.13 no.2
- /
- pp.155-167
- /
- 2019
The aim of this study was to examine the experiences of music therapists in using their voice clinically. The researcher conducted in-depth interviews with seven music therapists who were able to explain their experiences in using voice. Each interview was analyzed using the phenomenological method of Amedeo Giorgi. The data analysis yielded 9 sub-categories and 6 components: "promotion of various feelings due to clinical use of voice", "voice use depending on the therapist's personality", "voice use for therapy", "positive musical experiences with clients in using voice", "difficulty in using voice as a tool for music therapy", and "attempt to change unsatisfactory voice". The result showed that the music therapists had both positive and difficult experiences with their clients in using their voice. Their instances of perceived unsatisfactory voice prompted them to develop themselves personally and professionally. This study is intended to provide a general understanding of voice use by music therapists and offer a solid basis for music therapists to study voice in the future.
https://doi.org/10.21184/jkeia.2019.2.13.2.155 인용

Voice Phishing Occurrence and Counterplan (보이스피싱 발생 및 대응방안)

Cho, Ho-Dae
- The Journal of the Korea Contents Association
- /
- v.12 no.7
- /
- pp.176-182
- /
- 2012
Voice Phishing finds out personal information illegally using electrification and it is confidence game that withdraw deposit on the basis of this. It appeared by new social problem as damage instances increase rapidly. Target of the damage is invading indiscriminately to good civilian and is crime that commit by foreigners such as a most Chinese, Formosan. Voice Phishing can be crime type of new form in terms of criminal practice is achieved in the foreign countries. Therefore, this study wishes to analyze present occurrence actual conditions and example, and search effective confrontation plan regarding Voice Phishing. Voice Phishing criminal offense is growing as crime is not eradicated in spite of continuous public relations and control, and technique is diversified and specializes preferably. Hereafter, confrontation plan about problem may have to be readied in banking communication investigation to eradicate Voice Phishing. Also, polices control activity may have to be reinforce through quick investigation's practice and development of investigation technique, and relevant government ministry and international mutual assistance cooperation such as the Interpol should be reinforced because is shown international crime personality.
https://doi.org/10.5392/JKCA.2012.12.07.176 인용 PDF KSCI

Voice Personality Transformation Using a Probabilistic Method (확률적 방법을 이용한 음성 개성 변환)

Lee Ki-Seung
- The Journal of the Acoustical Society of Korea
- /
- v.24 no.3
- /
- pp.150-159
- /
- 2005
This paper addresses a voice personality transformation algorithm which makes one person's voices sound as if another person's voices. In the proposed method, one person's voices are represented by LPC cepstrum, pitch period and speaking rate, the appropriate transformation rules for each Parameter are constructed. The Gaussian Mixture Model (GMM) is used to model one speaker's LPC cepstrums and conditional probability is used to model the relationship between two speaker's LPC cepstrums. To obtain the parameters representing each probabilistic model. a Maximum Likelihood (ML) estimation method is employed. The transformed LPC cepstrums are obtained by using a Minimum Mean Square Error (MMSE) criterion. Pitch period and speaking rate are used as the parameters for prosody transformation, which is implemented by using the ratio of the average values. The proposed method reveals the superior performance to the previous VQ-based method in subjective measures including average cepstrum distance reduction ratio and likelihood increasing ratio. In subjective test. we obtained almost the same correct identification ratio as the previous method and we also confirmed that high qualify transformed speech is obtained, which is due to the smoothly evolving spectral contours over time.
PDF KSCI

Emotion Recognition using Short-Term Multi-Physiological Signals

Kang, Tae-Koo
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.16 no.3
- /
- pp.1076-1094
- /
- 2022
Technology for emotion recognition is an essential part of human personality analysis. To define human personality characteristics, the existing method used the survey method. However, there are many cases where communication cannot make without considering emotions. Hence, emotional recognition technology is an essential element for communication but has also been adopted in many other fields. A person's emotions are revealed in various ways, typically including facial, speech, and biometric responses. Therefore, various methods can recognize emotions, e.g., images, voice signals, and physiological signals. Physiological signals are measured with biological sensors and analyzed to identify emotions. This study employed two sensor types. First, the existing method, the binary arousal-valence method, was subdivided into four levels to classify emotions in more detail. Then, based on the current techniques classified as High/Low, the model was further subdivided into multi-levels. Finally, signal characteristics were extracted using a 1-D Convolution Neural Network (CNN) and classified sixteen feelings. Although CNN was used to learn images in 2D, sensor data in 1D was used as the input in this paper. Finally, the proposed emotional recognition system was evaluated by measuring actual sensors.
https://doi.org/10.3837/tiis.2022.03.018 인용 PDF KSCI HTML

Search Result 26, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)