Search | Korea Science

Performance Comparison and Duration Model Improvement of Speaker Adaptation Methods in HMM-based Korean Speech Synthesis (HMM 기반 한국어 음성합성에서의 화자적응 방식 성능비교 및 지속시간 모델 개선)

Lee, Hea-Min;Kim, Hyung-Soon
- Phonetics and Speech Sciences
- /
- v.4 no.3
- /
- pp.111-117
- /
- 2012
In this paper, we compare the performance of several speaker adaptation methods for a HMM-based Korean speech synthesis system with small amounts of adaptation data. According to objective and subjective evaluations, a hybrid method of constrained structural maximum a posteriori linear regression (CSMAPLR) and maximum a posteriori (MAP) adaptation shows better performance than other methods, when only five minutes of adaptation data are available for the target speaker. During the objective evaluation, we find that the duration models are insufficiently adapted to the target speaker as the spectral envelope and pitch models. To alleviate the problem, we propose the duration rectification method and the duration interpolation method. Both the objective and subjective evaluations reveal that the incorporation of the proposed two methods into the conventional speaker adaptation method is effective in improving the performance of the duration model adaptation.
https://doi.org/10.13064/KSSS.2012.4.3.111 인용 PDF

Adaptation in Families of Children with Down Syndrome: A Mixed-methods Design (다운증후군 자녀를 둔 가족의 적응력: 혼합적 연구 방법 적용)

Choi, Hyunkyung
- Journal of Korean Academy of Nursing
- /
- v.45 no.4
- /
- pp.501-512
- /
- 2015
Purpose: The purpose of this study, which was guided by the Resiliency Model of Family Stress, Adjustment, and Adaptation, was twofold: (a) to explore family and parental adaptation and factors influencing family adaptation in Korean families of children with Down syndrome (DS) through a quantitative methodology and (b) to understand the life with a Korean child with DS through a qualitative method. Methods: A mixed-methods design was adopted. A total of 147 parents of children with DS completed a package of questionnaires, and 19 parents participated in the in-depth interviews. Quantitative and qualitative data were analyzed using stepwise multiple regression and content analysis respectively. Results: According to the quantitative data, the overall family adaptation scores indicated average family functioning. Financial status was an important variable in understanding both family and parental adaptation. Family adaptation was best explained by family problem solving and coping communication, condition management ability, and family hardiness. Family strains and family hardiness were the family factors with the most influence on parental adaption. Qualitative data analysis showed that family life with a child with DS encompassed both positive and negative aspects and was expressed with 5 themes, 10 categories, and 16 sub-categories. Conclusion: Results of this study expand our limited knowledge and understanding concerning families of children with DS in Korea and can be used to develop effective interventions to improve the adaptation of family as a unit as well as parental adaptation.
https://doi.org/10.4040/jkan.2015.45.4.501 인용 PDF KSCI

Fast speaker adaptation using extended diagonal linear transformation for deep neural networks

Kim, Donghyun;Kim, Sanghun
- ETRI Journal
- /
- v.41 no.1
- /
- pp.109-116
- /
- 2019
This paper explores new techniques that are based on a hidden-layer linear transformation for fast speaker adaptation used in deep neural networks (DNNs). Conventional methods using affine transformations are ineffective because they require a relatively large number of parameters to perform. Meanwhile, methods that employ singular-value decomposition (SVD) are utilized because they are effective at reducing adaptive parameters. However, a matrix decomposition is computationally expensive when using online services. We propose the use of an extended diagonal linear transformation method to minimize adaptation parameters without SVD to increase the performance level for tasks that require smaller degrees of adaptation. In Korean large vocabulary continuous speech recognition (LVCSR) tasks, the proposed method shows significant improvements with error-reduction rates of 8.4% and 17.1% in five and 50 conversational sentence adaptations, respectively. Compared with the adaptation methods using SVD, there is an increased recognition performance with fewer parameters.
https://doi.org/10.4218/etrij.2017-0087 인용 PDF KSCI

Language Model Adaptation Based on Topic Probability of Latent Dirichlet Allocation

Jeon, Hyung-Bae;Lee, Soo-Young
- ETRI Journal
- /
- v.38 no.3
- /
- pp.487-493
- /
- 2016
Two new methods are proposed for an unsupervised adaptation of a language model (LM) with a single sentence for automatic transcription tasks. At the training phase, training documents are clustered by a method known as Latent Dirichlet allocation (LDA), and then a domain-specific LM is trained for each cluster. At the test phase, an adapted LM is presented as a linear mixture of the now trained domain-specific LMs. Unlike previous adaptation methods, the proposed methods fully utilize a trained LDA model for the estimation of weight values, which are then to be assigned to the now trained domain-specific LMs; therefore, the clustering and weight-estimation algorithms of the trained LDA model are reliable. For the continuous speech recognition benchmark tests, the proposed methods outperform other unsupervised LM adaptation methods based on latent semantic analysis, non-negative matrix factorization, and LDA with n-gram counting.
https://doi.org/10.4218/etrij.16.0115.0499 인용 PDF KSCI

The Comparison of Characteristics in various Speaker Adaptation Methods (여러 화자 적응 방법들의 특성 비교)

황영수
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1998.06e
- /
- pp.339-342
- /
- 1998
In this paper, we proposed various speaker adaptation methods and studied the performance of these methods. Methods which were studied in this paper are MAPE(Maximum A Posteriori Probability Estimation), ARTMAP. In order to evaluate the performance of these methods, we used Korean isolated digits as the experimental data, the hybrid speaker adaptation method, which unfied MAPE, linear spectral estimating and outpur probability of SCHMM, showed the better recognition result than those which performed other methods. And the method using ARTMAP showed the similar result to above hybrid method.
PDF

On Speaker Adaptations with Sparse Training Data for Improved Speaker Verification

Ahn, Sung-Joo;Kang, Sun-Mee;Ko, Han-Seok
- Speech Sciences
- /
- v.7 no.1
- /
- pp.31-37
- /
- 2000
This paper concerns effective speaker adaptation methods to solve the over-training problem in speaker verification, which frequently occurs when modeling a speaker with sparse training data. While various speaker adaptations have already been applied to speech recognition, these methods have not yet been formally considered in speaker verification. This paper proposes speaker adaptation methods using a combination of MAP and MLLR adaptations, which are successfully used in speech recognition, and applies to speaker verification. Experimental results show that the speaker verification system using a weighted MAP and MLLR adaptation outperforms that of the conventional speaker models without adaptation by a factor of up to 5 times. From these results, we show that the speaker adaptation method achieves significantly better performance even when only small training data is available for speaker verification.
PDF

Improving Adversarial Domain Adaptation with Mixup Regularization

Bayarchimeg Kalina;Youngbok Cho
- Journal of information and communication convergence engineering
- /
- v.21 no.2
- /
- pp.139-144
- /
- 2023
Engineers prefer deep neural networks (DNNs) for solving computer vision problems. However, DNNs pose two major problems. First, neural networks require large amounts of well-labeled data for training. Second, the covariate shift problem is common in computer vision problems. Domain adaptation has been proposed to mitigate this problem. Recent work on adversarial-learning-based unsupervised domain adaptation (UDA) has explained transferability and enabled the model to learn robust features. Despite this advantage, current methods do not guarantee the distinguishability of the latent space unless they consider class-aware information of the target domain. Furthermore, source and target examples alone cannot efficiently extract domain-invariant features from the encoded spaces. To alleviate the problems of existing UDA methods, we propose the mixup regularization in adversarial discriminative domain adaptation (ADDA) method. We validated the effectiveness and generality of the proposed method by performing experiments under three adaptation scenarios: MNIST to USPS, SVHN to MNIST, and MNIST to MNIST-M.
https://doi.org/10.56977/jicce.2023.21.2.139 인용 PDF

Performance Improvement of Rapid Speaker Adaptation Using Bias Compensation and Mean of Dimensional Eigenvoice Models (바이어스 보상과 차원별 Eigenvoice 모델 평균을 이용한 고속화자적응의 성능향상)

박종세;김형순;송화전
- The Journal of the Acoustical Society of Korea
- /
- v.23 no.5
- /
- pp.383-389
- /
- 2004
In this paper. we propose the bias compensation methods and the eigenvoice method using the mean of dimensional eigenvoice to improve the performance of rapid speaker adaptation based on eigenvoice under mismatch between training and test environment. Experimental results for vocabulary-independent word recognition task (using PBW 452 DB) show that the proposed methods yield improvements for small adaptation data. We obtained about 22∼30% relative improvement by the bias compensation methods as amount of adaptation data varied from 1 to 50, and obtained 41% relative improvement in error rate by the eigenvoice method using the mean of dimensional eigenvoice with only single adaptation word.
PDF KSCI

Adaptation to Menopause and Use of Yangsaeng in Middle-aged Korean Women (중년여성의 폐경기 적응과 양생실천 정도)

Park, Hye-Sook;Kim, Ae-Jung
- Women's Health Nursing
- /
- v.16 no.1
- /
- pp.1-9
- /
- 2010
Purpose: The study addressed the adaption of middle- aged Korean women to menopause, including the use of Yangsaeng, a traditional health care regimen that incorporates specific principles and methods to promote health and prevent illness, with the aim of improving health and longevity of life. Methods: Middle-aged women (40~59 years, n=171) residing in Seoul and Gyeong-Gi Province. Data was collected by using a self-reported questionnaire. Menopausal period adaptation was measured by 29 items in four categories (physical, self-concept, role function, and inter-dependent). Yangsaeng was measured by 31 questionnaire items in eight categories (morality, mind, diet, activity and rest, exercise, sleep, seasonal, and sexuality). Results: Significant differences in menopausal adaptation were evident on the basis of participant education and income. There were significant differences in Yangsaeng in terms of participant education, nature of employment, and income. Menopausal adaptation positively correlated to use of Yangsaeng. Physical adaptation, self-concept adaptation, role function adaptation, and inter-dependent adaptation positively correlated to morality Yangsaeng, mind Yangsaeng, and activity and rest Yangsaeng. Conclusion: Middle-aged Korean women who practice Yangsaeng may be better positioned to adapt to menopause. Yangsaeng may be an advantageous nursing intervention in this population.
https://doi.org/10.4069/kjwhn.2010.16.1.1 인용 PDF KSCI

A Comparative Study of Speaker Adaptation Methods for HMM-Based Speech Recognition (HMM 음성인식 시스템을 위한 화자적응 방법들의 성능비교)

Koo, Myoung-Wan;Un, Chong-Kwan;Lee, Hwang-Soo
- The Journal of the Acoustical Society of Korea
- /
- v.10 no.3
- /
- pp.37-43
- /
- 1991
In this paper, we compare the performances of speaker adaptation which consist of two stages of processing for an HMM-based speech recognition system. We compare three kinds of VQ adaptation methods which may be used in the first stage to reduce the distortion error for a new speaker : label prototype adaptation, adaptation with a codebook from adaptation speech itself, and adaptation with a mapped codebook. We then compare the performance of four kinds of HMM parameter adaptation methods which may be used in the second stage to transform HMM parameters for a new speaker : adaptation by the Viterbi algorithm, that by the DTW algorithm, that by the iterative alignment algorithm. The results show that adaptation based on the fuzzy histogram algorithm yields the highest accuracy in an HMM-based speech recognition system.
PDF

Search Result 1,136, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)