• Title/Summary/Keyword: small-data speech recognition

Search Result 48, Processing Time 0.021 seconds

Rapid Speaker Adaptation Based on Eigenvoice Using Weight Distribution Characteristics (가중치 분포 특성을 이용한 Eigenvoice 기반 고속화자적응)

  • 박종세;김형순;송화전
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.5
    • /
    • pp.403-407
    • /
    • 2003
  • Recently, eigenvoice approach has been widely used for rapid speaker adaptation. However, even in the eigenvoice approach, Performance improvement using very small amount of adaptation data is relatively small in comparison with that using somewhat large adaptation data because the reliable estimation of weights of eigenvoice is difficult. In this paper, we propose a rapid speaker adaptation method based on eigenvoice using the weight distribution characteristics to improve the performance on a small adaptation data. In the Experimental results on vocabulary-independent word recognition task (using PBW 452 database), the weight threshold method alleviates the problem of relatively low performance for a tiny small adaptation data. When single adaptation word is used, word error rate is reduced about 9-18% by the weight threshold method.

Speaker Identification Using Greedy Kernel PCA (Greedy Kernel PCA를 이용한 화자식별)

  • Kim, Min-Seok;Yang, Il-Ho;Yu, Ha-Jin
    • MALSORI
    • /
    • no.66
    • /
    • pp.105-116
    • /
    • 2008
  • In this research, we propose a speaker identification system using a kernel method which is expected to model the non-linearity of speech features well. We have been using principal component analysis (PCA) successfully, and extended to kernel PCA, which is used for many pattern recognition tasks such as face recognition. However, we cannot use kernel PCA for speaker identification directly because the storage required for the kernel matrix grows quadratically, and the computational cost grows linearly (computing eigenvector of $l{\times}l$ matrix) with the number of training vectors I. Therefore, we use greedy kernel PCA which can approximate kernel PCA with small representation error. In the experiments, we compare the accuracy of the greedy kernel PCA with the baseline Gaussian mixture models using MFCCs and PCA. As the results with limited enrollment data show, the greedy kernel PCA outperforms conventional methods.

  • PDF

Personalized Speech Classification Scheme for the Smart Speaker Accessibility Improvement of the Speech-Impaired people (언어장애인의 스마트스피커 접근성 향상을 위한 개인화된 음성 분류 기법)

  • SeungKwon Lee;U-Jin Choe;Gwangil Jeon
    • Smart Media Journal
    • /
    • v.11 no.11
    • /
    • pp.17-24
    • /
    • 2022
  • With the spread of smart speakers based on voice recognition technology and deep learning technology, not only non-disabled people, but also the blind or physically handicapped can easily control home appliances such as lights and TVs through voice by linking home network services. This has greatly improved the quality of life. However, in the case of speech-impaired people, it is impossible to use the useful services of the smart speaker because they have inaccurate pronunciation due to articulation or speech disorders. In this paper, we propose a personalized voice classification technique for the speech-impaired to use for some of the functions provided by the smart speaker. The goal of this paper is to increase the recognition rate and accuracy of sentences spoken by speech-impaired people even with a small amount of data and a short learning time so that the service provided by the smart speaker can be actually used. In this paper, data augmentation and one cycle learning rate optimization technique were applied while fine-tuning ResNet18 model. Through an experiment, after recording 10 times for each 30 smart speaker commands, and learning within 3 minutes, the speech classification recognition rate was about 95.2%.

Performance Improvement of Voting-based Speaker Identification System by using the Observation Confidence (관측신뢰도 적용에 의한 투표기법 기반의 화자인식시스템의 성능향상)

  • Choi, Hong-Sub
    • Speech Sciences
    • /
    • v.15 no.2
    • /
    • pp.79-88
    • /
    • 2008
  • Recently demands for the speech technology-based products targeted for the mobile terminals such as cellular phones and PDA are rapidly increasing. And voting-based speaker identification algorithm is known to have a good performance in the mobile environment, since it works well with small amount of speaker training data. In this paper, we proposed a method to improve the performance of this voting based speaker identification system by using the observation confidence value which is derived from the function of SNR each frame. The proposed method is evaluated with ETRI cellular phone DB which is made for the speaker recognition task. The experimental results show that the proposed method has better performance of 2-3% identification rate than the conventional GMM method.

  • PDF

Speaker Adaptation using ICA-based Feature Transformation (ICA 기반의 특징변환을 이용한 화자적응)

  • Park ManSoo;Kim Hoi-Rin
    • MALSORI
    • /
    • no.43
    • /
    • pp.127-136
    • /
    • 2002
  • The speaker adaptation technique is generally used to reduce the speaker difference in speech recognition. In this work, we focus on the features fitted to a linear regression-based speaker adaptation. These are obtained by feature transformation based on independent component analysis (ICA), and the transformation matrix is learned from a speaker independent training data. When the amount of data is small, however, it is necessary to adjust the ICA-based transformation matrix estimated from a new speaker utterance. To cope with this problem, we propose a smoothing method: through a linear interpolation between the speaker-independent (SI) feature transformation matrix and the speaker-dependent (SD) feature transformation matrix. We observed that the proposed technique is effective to adaptation performance.

  • PDF

Implementation of HMM-Based Speech Recognizer Using TMS320C6711 DSP

  • Bae Hyojoon;Jung Sungyun;Son Jongmok;Kwon Hongseok;Kim Siho;Bae Keunsung
    • Proceedings of the IEEK Conference
    • /
    • summer
    • /
    • pp.391-394
    • /
    • 2004
  • This paper focuses on the DSP implementation of an HMM-based speech recognizer that can handle several hundred words of vocabulary size as well as speaker independency. First, we develop an HMM-based speech recognition system on the PC that operates on the frame basis with parallel processing of feature extraction and Viterbi decoding to make the processing delay as small as possible. Many techniques such as linear discriminant analysis, state-based Gaussian selection, and phonetic tied mixture model are employed for reduction of computational burden and memory size. The system is then properly optimized and compiled on the TMS320C6711 DSP for real-time operation. The implemented system uses 486kbytes of memory for data and acoustic models, and 24.5kbytes for program code. Maximum required time of 29.2ms for processing a frame of 32ms of speech validates real-time operation of the implemented system.

  • PDF

L1-norm Regularization for State Vector Adaptation of Subspace Gaussian Mixture Model (L1-norm regularization을 통한 SGMM의 state vector 적응)

  • Goo, Jahyun;Kim, Younggwan;Kim, Hoirin
    • Phonetics and Speech Sciences
    • /
    • v.7 no.3
    • /
    • pp.131-138
    • /
    • 2015
  • In this paper, we propose L1-norm regularization for state vector adaptation of subspace Gaussian mixture model (SGMM). When you design a speaker adaptation system with GMM-HMM acoustic model, MAP is the most typical technique to be considered. However, in MAP adaptation procedure, large number of parameters should be updated simultaneously. We can adopt sparse adaptation such as L1-norm regularization or sparse MAP to cope with that, but the performance of sparse adaptation is not good as MAP adaptation. However, SGMM does not suffer a lot from sparse adaptation as GMM-HMM because each Gaussian mean vector in SGMM is defined as a weighted sum of basis vectors, which is much robust to the fluctuation of parameters. Since there are only a few adaptation techniques appropriate for SGMM, our proposed method could be powerful especially when the number of adaptation data is limited. Experimental results show that error reduction rate of the proposed method is better than the result of MAP adaptation of SGMM, even with small adaptation data.

A Multi-speaker Speech Synthesis System Using X-vector (x-vector를 이용한 다화자 음성합성 시스템)

  • Jo, Min Su;Kwon, Chul Hong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.7 no.4
    • /
    • pp.675-681
    • /
    • 2021
  • With the recent growth of the AI speaker market, the demand for speech synthesis technology that enables natural conversation with users is increasing. Therefore, there is a need for a multi-speaker speech synthesis system that can generate voices of various tones. In order to synthesize natural speech, it is required to train with a large-capacity. high-quality speech DB. However, it is very difficult in terms of recording time and cost to collect a high-quality, large-capacity speech database uttered by many speakers. Therefore, it is necessary to train the speech synthesis system using the speech DB of a very large number of speakers with a small amount of training data for each speaker, and a technique for naturally expressing the tone and rhyme of multiple speakers is required. In this paper, we propose a technology for constructing a speaker encoder by applying the deep learning-based x-vector technique used in speaker recognition technology, and synthesizing a new speaker's tone with a small amount of data through the speaker encoder. In the multi-speaker speech synthesis system, the module for synthesizing mel-spectrogram from input text is composed of Tacotron2, and the vocoder generating synthesized speech consists of WaveNet with mixture of logistic distributions applied. The x-vector extracted from the trained speaker embedding neural networks is added to Tacotron2 as an input to express the desired speaker's tone.

Out-Of-Domain Detection Using Hierarchical Dirichlet Process

  • Jeong, Young-Seob
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.1
    • /
    • pp.17-24
    • /
    • 2018
  • With improvement of speech recognition and natural language processing, dialog systems are recently adapted to various service domains. It became possible to get desirable services by conversation through the dialog system, but it is still necessary to improve separate modules, such as domain detection, intention detection, named entity recognition, and out-of-domain detection, in order to achieve stable service offer. When it misclassifies an in-domain sentence of conversation as out-of-domain, it will result in poor customer satisfaction and finally lost business. As there have been relatively small number of studies related to the out-of-domain detection, in this paper, we introduce a new method using a hierarchical Dirichlet process and demonstrate the effectiveness of it by experimental results on Korean dataset.

Self-Adaptation Algorithm Based on Maximum A Posteriori Eigenvoice for Korean Connected Digit Recognition (한국어 연결 숫자음 인식을 일한 최대 사후 Eigenvoice에 근거한 자기적응 기법)

  • Kim Dong Kook;Jeon Hyung Bae
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.8
    • /
    • pp.590-596
    • /
    • 2004
  • This paper Presents a new self-adaptation algorithm based on maximum a posteriori (MAP) eigenvoice for Korean connected digit recognition. The proposed MAP eigenvoice is developed by introducing a probability density model for the eigenvoice coefficients. The Proposed approach provides a unified framework that incorporates the Prior model into the conventional eigenvoice estimation. In self-adaptation system we use only one adaptation utterance that will be recognized, we use MAP eigenvoice that is most robust adaptation. In series of self-adaptation experiments on the Korean connected digit recognition task. we demonstrate that the performance of the proposed approach is better than that of the conventional eigenvoice algorithm for a small amount of adaptation data.