Search | Korea Science

A Study on Voice Color Control Rules for Speech Synthesis System (음성합성시스템을 위한 음색제어규칙 연구)

Kim, Jin-Young;Eom, Ki-Wan
- Speech Sciences
- /
- v.2
- /
- pp.25-44
- /
- 1997
When listening the various speech synthesis systems developed and being used in our country, we find that though the quality of these systems has improved, they lack naturalness. Moreover, since the voice color of these systems are limited to only one recorded speech DB, it is necessary to record another speech DB to create different voice colors. 'Voice Color' is an abstract concept that characterizes voice personality. So speech synthesis systems need a voice color control function to create various voices. The aim of this study is to examine several factors of voice color control rules for the text-to-speech system which makes natural and various voice types for the sounding of synthetic speech. In order to find such rules from natural speech, glottal source parameters and frequency characteristics of the vocal tract for several voice colors have been studied. In this paper voice colors were catalogued as: deep, sonorous, thick, soft, harsh, high tone, shrill, and weak. For the voice source model, the LF-model was used and for the frequency characteristics of vocal tract, the formant frequencies, bandwidths, and amplitudes were used. These acoustic parameters were tested through multiple regression analysis to achieve the general relation between these parameters and voice colors.
PDF

A Single Channel Adaptive Noise Cancellation for Speech Signals (음성신호의 단일입력 적응잡음제거)

Gahng, Hae-Dong;Bae, Keun-Sung
- The Journal of the Acoustical Society of Korea
- /
- v.13 no.3
- /
- pp.16-24
- /
- 1994
A single channel adaptive noise canceling (ANC) technique is presented for removing effects of additive noise on the speech signal. The conventional method obtains a reference signal using the pitch estimated on a frame basis from the input speech. The proposed method, however, gets the reference signal using the delay estimated recursively on a sample by sample basis. To estimate the delay, we derive recursion formula of autocorrelation function and average magnitude difference function. The performance of the proposed method is evaluated for the speech signals distorted by the additive white Gaussian noise. Experimental results with normalized least mean square (NLMS) adaptive algorithm demonstrate that the proposed method improves the perceived speech quality quite well besides the signal-to-noise ratio.
PDF

Speech processing strategy and executive function: Korean children's stop perception

Kong, Eun Jong;Yoo, Jeewon
- Phonetics and Speech Sciences
- /
- v.9 no.3
- /
- pp.57-65
- /
- 2017
The current study explored how Korean-speaking children processed the multiple acoustic cues (VOT and f0) for the stop laryngeal contrast (/t'/, /t/, and /$t^h$/) and examined whether individual perceptual strategies could be related to a general cognitive ability performing executive functions (EF). 15 children (aged from 7 to 8) participated in the speech perception task identifying the three Korean laryngeal stops (3AFC) on listening to the auditory stimuli of C-/a/ with synthetically varying VOT and f0. They completed a series of EF tasks to measure working memory, inhibition, and cognitive shifting ability. The findings showed that children used the two cues in a highly correlated manner. While children utilized VOT consistently for the three laryngeal categories, their use of f0 was either reduced or enhanced depending on the phonetic categories. Importantly, the children's processing strategies of a f0 suppression for a tense-aspirated contrast were meaningfully associated with children's better cognitive abilities such as working memory, inhibition, and attentional shifting. As a preliminary experimental investigation, the current research demonstrated that listeners with inefficient processing strategies were poor at the EF skills, suggesting that cognitive skills might be responsible for developmental variations of processing sub-phonemic information for the linguistic contrast.
https://doi.org/10.13064/KSSS.2017.9.3.057 인용 PDF KSCI

Speech Outcome and Timing of Furlow Palatoplasty in Cleft Palate (Furlow 구개성형술을 시행한 구개열에서 언어발달과 적절한 수술시기)

Jin, Ung Sik;Kim, Suk Wha;Lee, Soung Joo
- Archives of Plastic Surgery
- /
- v.33 no.1
- /
- pp.67-74
- /
- 2006
Palatoplasty using Furlow's double-opposing Z-plasty has been performed from June, 1995 to September, 1999 at Seoul National University Children's Hospital. The goal of this study is to determine the optimal timing of repair and cleft severity affecting velopharyngeal function. This is the retrospective study of patients operated by the second author. The mean age of patients was 10.53 months. The patients could be divided into three groups-isolated cleft palate(n=70), unilateral cleft lip and palate(n=88), and bilateral cleft lip and palate(n=42). To evaluate the velopharyngeal function, we used two parameters, speech evaluation and cineofluorography using DSR(digital subtraction radiography). Also, to determine the relevance between cleft severity and speech development, we measured the distance between maxillary tuberosities and cleft margins. Among 200 patients, about 96% had no or minimal hypernasality and 87% had no or mild nasal emission. The cleft width and length of soft palate seemed not to be related with the speech development. Palatoplasty at the age under 12 months resulted in less 'nasal emission' and better 'articulation' of the parameters that were assessed at the age of 7 years. It can be concluded Furlow palatoplasty shows satisfactory results and also it seems that it is better to perform the operation before the age of 12 months.
PDF KSCI

Delay characteristics of speech packets in virtual cellular network(VCN) (가상 셀룰라 망(VCN)에서의 음성 패킷 지연 특성)

정명순;김화종
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.23 no.9A
- /
- pp.2305-2312
- /
- 1998
This paper analyzed the delay characteristics of speech packets in virtual cellular network(VCN). The probability distribution of packet delay is obtained using the markov chain model when periodic speech packets are transmitted by slotted-ALOHA protocol. The effects of probility of capture and retransmission policy on the performance were also analyzed. At first, the probability cumulative function of packet delay is calculated from the probability of capture as a function of location of mobile terminal. In order to investigate the effects of backoff delay, we defined a parameter NPr, where N is the period (frame size) of the speech packets and Pr is the retransmission probability for each speech packet. We also obtained the 1% outage delay for various frame size N.
PDF

Real-time Implementation of CS-ACELP Speech Coder for IMT-2000 Test-bed (IMT-2000 Test-bed 상에서 CS-ACELP 음성부호화기 실시간 구현)

김형중;최송인;김재원;윤병식
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.2 no.3
- /
- pp.335-341
- /
- 1998
In this paper, we present a real time implementation of CS-ACELP(Conjugate Structure Algebraic Code Excited Linear Prediction) speech coder. ITU-T has standardized the CS-ACELP algorithm as G.729. Areal-time implementation of CS-ACELP speech coder algorithm is achieved using 16 bit fixed-point DSP chip. To implement in fixed-point DSP Chip, integer simulation of CS-ACELP algorithm is used. Furthermore. input/output function and communication function included in CS-ACELP speech coder is described. We develope CS-ACELP speech coder in DSP evaluation board and evaluate in IMT-2000 Test-bed.
PDF

Speaker-Adaptive Speech Synthesis based on Fuzzy Vector Quantizer Mapping and Neural Networks (퍼지 벡터 양자화기 사상화와 신경망에 의한 화자적응 음성합성)

Lee, Jin-Yi;Lee, Gwang-Hyeong
- The Transactions of the Korea Information Processing Society
- /
- v.4 no.1
- /
- pp.149-160
- /
- 1997
This paper is concerned with the problem of speaker-adaptive speech synthes is method using a mapped codebook designed by fuzzy mapping on FLVQ (Fuzzy Learning Vector Quantization). The FLVQ is used to design both input and reference speaker's codebook. This algorithm is incorporated fuzzy membership function into the LVQ(learning vector quantization) networks. Unlike the LVQ algorithm, this algorithm minimizes the network output errors which are the differences of clas s membership target and actual membership values, and results to minimize the distances between training patterns and competing neurons. Speaker Adaptation in speech synthesis is performed as follow;input speaker's codebook is mapped a reference speaker's codebook in fuzzy concepts. The Fuzzy VQ mapping replaces a codevector preserving its fuzzy membership function. The codevector correspondence histogram is obtained by accumulating the vector correspondence along the DTW optimal path. We use the Fuzzy VQ mapping to design a mapped codebook. The mapped codebook is defined as a linear combination of reference speaker's vectors using each fuzzy histogram as a weighting function with membership values. In adaptive-speech synthesis stage, input speech is fuzzy vector-quantized by the mapped codcbook, and then FCM arithmetic is used to synthesize speech adapted to input speaker. The speaker adaption experiments are carried out using speech of males in their thirties as input speaker's speech, and a female in her twenties as reference speaker's speech. Speeches used in experiments are sentences /anyoung hasim nika/ and /good morning/. As a results of experiments, we obtained a synthesized speech adapted to input speaker.
PDF

Recognition for Noisy Speech by a Nonstationary AR HMM with Gain Adaptation Under Unknown Noise (잡음하에서 이득 적응을 가지는 비정상상태 자기회귀 은닉 마코프 모델에 의한 오염된 음성을 위한 인식)

이기용;서창우;이주헌
- The Journal of the Acoustical Society of Korea
- /
- v.21 no.1
- /
- pp.11-18
- /
- 2002
In this paper, a gain-adapted speech recognition method in noise is developed in the time domain. Noise is assumed to be colored. To cope with the notable nonstationary nature of speech signals such as fricative, glides, liquids, and transition region between phones, the nonstationary autoregressive (NAR) hidden Markov model (HMM) is used. The nonstationary AR process is represented by using polynomial functions with a linear combination of M known basis functions. When only noisy signals are available, the estimation problem of noise inevitably arises. By using multiple Kalman filters, the estimation of noise model and gain contour of speech is performed. Noise estimation of the proposed method can eliminate noise from noisy speech to get an enhanced speech signal. Compared to the conventional ARHMM with noise estimation, our proposed NAR-HMM with noise estimation improves the recognition performance about 2-3%.
PDF KSCI

A Study on the Aerodynamic and Acoustic Characteristics in Dysarthria Speakers' Diadochokinesis by Articulation Valves in Vocal Tract (마비성구어장애 화자의 조음밸브 교호운동에 관한 공기역학 및 음향학적 특징)

Park, Hee-June;Kwon, Soon-Bok;Wang, Soo-Geun;Jeong, Ok-Ran
- Speech Sciences
- /
- v.15 no.2
- /
- pp.177-189
- /
- 2008
This study was to investigate diadochokinetic (DDK) rate, regularity and mean flow rate of articulation valves in dysarthria. DDK rate, mean airflow rate (MFR) and regularity of DDK syllable repetitions of vocal function /ihi/, tongue function /ta/, velopharyngeal function /bm/, and labial function /pa/ in 24 normal and dysarthric speakers were measured. Aerophone Ⅱ and Motor Speech Profile were used for data recording and analysis. The results of the findings were as follows: First, there were significant differences between the dysarthria and the normal group in DDK rate. DDK rates in ataxic dysarthria were the lowest and spastic, flaccid, and hypokinetic dysarthria followed in sequence. Second, there was a significant difference between the dysarthria and the normal group in DDK regularity. Third, there was a significant difference between dysarthria groups and normal group in DDK MFR. Finally, there was a significant difference between the 4 groups of dysarthria and the normal group in DDK air flow tracking. The results of this study can be guidelines for normal DDK rate, regularity and flow rate in dysarthria groups. In addition, their differential diagnoses and descriptions are important to make a decision on medical and behavioral management of the individuals with disorders according to DDK characteristics.
PDF

Model adaptation employing DNN-based estimation of noise corruption function for noise-robust speech recognition (잡음 환경 음성 인식을 위한 심층 신경망 기반의 잡음 오염 함수 예측을 통한 음향 모델 적응 기법)

Yoon, Ki-mu;Kim, Wooil
- The Journal of the Acoustical Society of Korea
- /
- v.38 no.1
- /
- pp.47-50
- /
- 2019
This paper proposes an acoustic model adaptation method for effective speech recognition in noisy environments. In the proposed algorithm, the noise corruption function is estimated employing DNN (Deep Neural Network), and the function is applied to the model parameter estimation. The experimental results using the Aurora 2.0 framework and database demonstrate that the proposed model adaptation method shows more effective in known and unknown noisy environments compared to the conventional methods. In particular, the experiments of the unknown environments show 15.87 % of relative improvement in the average of WER (Word Error Rate).
https://doi.org/10.7776/ASK.2019.38.1.047 인용 PDF KSCI HTML

Search Result 694, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)