Search | Korea Science

A Study on Voice Color Control Rules for Speech Synthesis System (음성합성시스템을 위한 음색제어규칙 연구)

Kim, Jin-Young;Eom, Ki-Wan
- Speech Sciences
- /
- v.2
- /
- pp.25-44
- /
- 1997
When listening the various speech synthesis systems developed and being used in our country, we find that though the quality of these systems has improved, they lack naturalness. Moreover, since the voice color of these systems are limited to only one recorded speech DB, it is necessary to record another speech DB to create different voice colors. 'Voice Color' is an abstract concept that characterizes voice personality. So speech synthesis systems need a voice color control function to create various voices. The aim of this study is to examine several factors of voice color control rules for the text-to-speech system which makes natural and various voice types for the sounding of synthetic speech. In order to find such rules from natural speech, glottal source parameters and frequency characteristics of the vocal tract for several voice colors have been studied. In this paper voice colors were catalogued as: deep, sonorous, thick, soft, harsh, high tone, shrill, and weak. For the voice source model, the LF-model was used and for the frequency characteristics of vocal tract, the formant frequencies, bandwidths, and amplitudes were used. These acoustic parameters were tested through multiple regression analysis to achieve the general relation between these parameters and voice colors.
PDF

Analysis of Voice Color Similarity for the development of HMM Based Emotional Text to Speech Synthesis (HMM 기반 감정 음성 합성기 개발을 위한 감정 음성 데이터의 음색 유사도 분석)

Min, So-Yeon;Na, Deok-Su
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.15 no.9
- /
- pp.5763-5768
- /
- 2014
Maintaining a voice color is important when compounding both the normal voice because an emotion is not expressed with various emotional voices in a single synthesizer. When a synthesizer is developed using the recording data of too many expressed emotions, a voice color cannot be maintained and each synthetic speech is can be heard like the voice of different speakers. In this paper, the speech data was recorded and the change in the voice color was analyzed to develop an emotional HMM-based speech synthesizer. To realize a speech synthesizer, a voice was recorded, and a database was built. On the other hand, a recording process is very important, particularly when realizing an emotional speech synthesizer. Monitoring is needed because it is quite difficult to define emotion and maintain a particular level. In the realized synthesizer, a normal voice and three emotional voice (Happiness, Sadness, Anger) were used, and each emotional voice consists of two levels, High/Low. To analyze the voice color of the normal voice and emotional voice, the average spectrum, which was the measured accumulated spectrum of vowels, was used and the F1(first formant) calculated by the average spectrum was compared. The voice similarity of Low-level emotional data was higher than High-level emotional data, and the proposed method can be monitored by the change in voice similarity.
https://doi.org/10.5762/KAIS.2014.15.9.5763 인용 PDF KSCI

Voice Color Conversion Based on the Formants and Spectrum Tilt Modification (포먼트 이동과 스펙트럼 기울기의 변환을 이용한 음색 변환)

Son Song-Young;Hahn Min-Soo
- MALSORI
- /
- no.45
- /
- pp.63-77
- /
- 2003
The purpose of voice color conversion is to change the speaker identity perceived from the speech signal. In this paper, we propose a new voice color conversion algorithm through the formant shifting and the spectrum-tilt modification in the frequency domain. The basic idea of this technique is to convert the positions of source formants into those of target speaker's formants through interpolation and decimation and to modify the spectrum-tilt by utilizing the information of both speakers' spectrum envelops. The LPC spectrum is adopted to evaluate the position of formant and the information of spectrum-tilt. Our algorithm enables us to convert the speaker identity rather successfully while maintaining good speech quality, since it modifies speech waveforms directly in the frequency domain.
PDF

Identification of Voice Features for Recently Voice Fishing by Voice Analysis (음성 분석을 통한 최근 보이스피싱의 음성 특징 규명)

Lee, Bum Joo;Cho, Dong Uk;Jeong, Yeon Man
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.41 no.10
- /
- pp.1276-1283
- /
- 2016
The scale of financial damages on voice fishing has not been decreased despite of national and social efforts to reduce the amounts of voice fishing damage. One of these reasons is a sophisticated and vernacular speech style that makes it difficult to recognize the offenders. Furthermore, nowadays, young men have intensively been deceived by not only sophisticated and vernacular speech style which is used the employer of real public offices but also obtained personal information. As a result, this lead directly to the financial damages of younger people who has a stronger judgement than older. For this, we investigated the comparison and analysis between the criminals of voice fishing and the same generation younger people for identifying voice features. The experiment was carried out based on the pitch, bandwidth of pitch, energy, speech speed and voice color for searching the difference of voice characteristics between the criminals of voice fishing and the same generation younger people since 2011. The experimental result shows that there is a significant difference in energy and speech speed between the criminals of voice fishing and the same generation younger people.
https://doi.org/10.7840/kics.2016.41.10.1276 인용 PDF KSCI

Standardization Voice Training Method for Professional Voice User Based on Traditional (전통적 벨칸토 발성훈련법에 기초한 음성전문직업인 발성훈련의 표준화)

Kim, Chul Jun
- Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
- /
- v.28 no.1
- /
- pp.17-19
- /
- 2017
Opera singers train their vocal organ to have a good timbre of voice. They train and train again to have a strong resonance, large range of voice, homogenous color of voice, a voice goes far and to avoid vocal disorder, etc. This article is analyzing from scientific and medical perspective. It could approach the secret of the great art of 400 years history - . Furthermore standardizing voice training method based on will facilitate to train, therapy and care the voice professional user and voice disorders.
PDF

Development of a Monitoring and Forecasting System for the Delivery of Pregnant Sow (임신돈의 분만 감시 및 예측 시스템 개발)

임영일
- Journal of Animal Environmental Science
- /
- v.6 no.1
- /
- pp.15-22
- /
- 2000
A monitoring and the forecasting system for the swine delivery was developed using CCD camera multi-function board microphone and data-recorder equipped on a personal computer. For the swine delivery monitoring and forecasting factors four factors were selected such as genitalia swine body shape breast color and sound. Image of physical variation of body shape, shape and color of genitalia area and color of breast of pregnant sow were grabbed using the CCD color camera and multi-function board and variation of voice of pregnant sow was acquired using microphone and data recorder. Acquired information of image and voice were analyzed using a custom developed algorithm and program. The result of the forecasting efficiency of swine delivery was 89%, 71% and 100% using the variation of genitalia are the body shape and the voice of pregnant sow. respectively. The efficiency of image processing was 100% for the delivery detection when the piglet was delivered half of its body from genitalia of pregnant sow, The monitoring and forecasting system informed the estimated time of the delivery of swine to a farm manager immediately if an estimated and established time set by the farm manager was the same and/or the estimated time ws earlier than the established time and the system detected the delivery.
PDF

Pitch Modification based on a Voice Source Model (음원 모델에 기초한 합성음의 피치 조절)

Choi, Yong-Jin;Yeo, Su-Jin;Kim, Jin-Young;Sung, Koeng-Mo
- Speech Sciences
- /
- v.3
- /
- pp.132-147
- /
- 1998
Previously developed methods for pitch modification have not been based on the voice source model. Therefore, the synthesized speech often sounds unnatural although it may be highly intelligible. The purpose of this paper is to analyze the alteration of a voice source signal with pitch period and to establish the pitch-modification rule based on the result of this analysis. We examine the alteration of the interval of closing phase, closed phase and open phase using the excitation waveform as the pitch increases. In comparison to the previous methods which performed directly on the speech signal, the pitch modification method based on a voice source model shows high intelligibility and naturalness. This study might benefit the application to the speaker identification and the voice color conversion. Therefore the proposed method will provide high quality synthetic speech.
PDF

Traffic Signal Recognition System Based on Color and Time for Visually Impaired

P. Kamakshi
- International Journal of Computer Science & Network Security
- /
- v.23 no.4
- /
- pp.48-54
- /
- 2023
Nowadays, a blind man finds it very difficult to cross the roads. They should be very vigilant with every step they take. To resolve this problem, Convolutional Neural Networks(CNN) is a best method to analyse the data and automate the model without intervention of human being. In this work, a traffic signal recognition system is designed using CNN for the visually impaired. To provide a safe walking environment, a voice message is given according to light state and timer state at that instance. The developed model consists of two phases, in the first phase the CNN model is trained to classify different images captured from traffic signals. Common Objects in Context (COCO) labelled dataset is used, which includes images of different classes like traffic lights, bicycles, cars etc. The traffic light object will be detected using this labelled dataset with help of object detection model. The CNN model detects the color of the traffic light and timer displayed on the traffic image. In the second phase, from the detected color of the light and timer value a text message is generated and sent to the text-to-speech conversion model to make voice guidance for the blind person. The developed traffic light recognition model recognizes traffic light color and countdown timer displayed on the signal for safe signal crossing. The countdown timer displayed on the signal was not considered in existing models which is very useful. The proposed model has given accurate results in different scenarios when compared to other models.
https://doi.org/10.22937/IJCSNS.2023.23.4.7 인용 PDF

A Study On Male-To-Female Voice Conversion (남녀 음성 변환 기술연구)

Choi Jung-Kyu;Kim Jae-Min;Han Min-Su
- Proceedings of the Acoustical Society of Korea Conference
- /
- spring
- /
- pp.115-118
- /
- 2000
Voice conversion technology is essential for TTS systems because the construction of speech database takes much effort. In this paper. male-to-female voice conversion technology in Korean LPC TTS system has been studied. In general. the parameters for voice color conversion are categorized into acoustic and prosodic parameters. This paper adopts LSF(Line Spectral Frequency) for acoustic parameter, pitch period and duration for prosodic parameters. In this paper. Pitch period is shortened by the half, duration is shortened by $25\%, and LSFs are shifted linearly for the voice conversion. And the synthesized speech is post-filtered by a bandpass filter. The proposed algorithm is simpler than other algorithms. for example, VQ and Neural Net based methods. And we don't even need to estimate formant information. The MOS(Mean Opinion Socre) test for naturalness shows 2.25 and for female closeness, 3.2. In conclusion, by using the proposed algorithm. male-to-female voice conversion system can be simply implemented with relatively successful results.
PDF

Mean Value of Aerodynamic Study in Normal Korean (음성검사 중 공기역학적 검사에서 한국인의 정상 평균치)

서장수;송시연;권오철;김준우;이희경;정옥란
- Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
- /
- v.8 no.1
- /
- pp.27-32
- /
- 1997
Recently, many people suffering from voice color change visit otolaryngologist. There is no specific data which can be evaluated objectively for voice color change in korean. In aerodynamic study, maximum phonation time, mean air flow rate, phonatory flow volume and subglottal pressure were tested by using Aerophone II voice function analyzer in korean. 112 male and 122 female aged from 10 to 69 years were randomly selected. Maximum phonation time was 20.8${\pm}$6.4sec in male and 17.2${\pm}$4.1sec in female. Mean air flow rate was 167.1${\pm}$61.4ml/sec. in male and 129.6${\pm}$49.3ml/sec in female. Phonatory flow volume was 3184.5${\pm}$646.0ml in male and 2122.1${\pm}$670.5ml in female. Subglottal pressure was 4.1${\pm}$1.8 cmH2O in male and 3.5${\pm}$1.4cm $H_2O$ in female. There was no statistically significant difference among age groups in all above results.
PDF

Search Result 60, Processing Time 0.094 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)