• Title/Summary/Keyword: Text-independent

Search Result 237, Processing Time 0.026 seconds

Modified GMM Training for Inexact Observation and Its Application to Speaker Identification

  • Kim, Jin-Young;Min, So-Hee;Na, Seung-You;Choi, Hong-Sub;Choi, Seung-Ho
    • Speech Sciences
    • /
    • v.14 no.1
    • /
    • pp.163-174
    • /
    • 2007
  • All observation has uncertainty due to noise or channel characteristics. This uncertainty should be counted in the modeling of observation. In this paper we propose a modified optimization object function of a GMM training considering inexact observation. The object function is modified by introducing the concept of observation confidence as a weighting factor of probabilities. The optimization of the proposed criterion is solved using a common EM algorithm. To verify the proposed method we apply it to the speaker recognition domain. The experimental results of text-independent speaker identification with VidTimit DB show that the error rate is reduced from 14.8% to 11.7% by the modified GMM training.

  • PDF

Performance Improvement of Robust Speaker Verification According to Various Standard Deviations of a Reference Distribution in Histogram Transformation (히스토그램 변환에서 기준분포의 표준편차 변경에 따른 강인한 화자인증 성능 개선)

  • Kwon, Chul-Hong
    • Phonetics and Speech Sciences
    • /
    • v.2 no.3
    • /
    • pp.127-134
    • /
    • 2010
  • Additive noise and channel mismatch strongly degrade the performance of speaker verification systems, as they distort the features of speech. In this paper a histogram transformation technique is presented to improve the robustness of text-independent speaker verification systems. The technique transforms the features extracted from speech such that their histogram is conformed to a reference distribution. The effect of different standard deviations for the reference distribution is investigated. Experimental results indicate that, in channel mismatched environments, the proposed technique offers significant improvements over existing techniques. We also verify performance improvement of the proposed method using statistics.

  • PDF

Robust Speaker Identification Using Linear Transformation Optimized for Diagonal Covariance GMM (대각공분산 GMM에 최적인 선형변환을 이용한 강인한 화자식별)

  • Kim, Min-Seok;Yang, Il-Ho;Yu, Ha-Jin
    • MALSORI
    • /
    • no.65
    • /
    • pp.67-80
    • /
    • 2008
  • We have been building a text-independent speaker recognition system that is robust to unknown channel and noise environments. In this paper, we propose a linear transformation to obtain robust features. The transformation is optimized to maximize the distances between the Gaussian mixtures. We use rotation of the axes, to cope with the problem of scaling the transformation matrix. The proposed transformation is similar to PCA or LDA, but can achieve better result in some special cases where PCA and LDA can not work properly. We use YOHO database to evaluate the proposed method and compare the result with PCA and LDA. The results show that the proposed method outperforms all the baseline, PCA and LDA.

  • PDF

Improvements on Phrase Breaks Prediction Using CRF (Conditional Random Fields) (CRF를 이용한 운율경계추성 성능개선)

  • Kim Seung-Won;Lee Geun-Bae;Kim Byeong-Chang
    • MALSORI
    • /
    • no.57
    • /
    • pp.139-152
    • /
    • 2006
  • In this paper, we present a phrase break prediction method using CRF(Conditional Random Fields), which has good performance at classification problems. The phrase break prediction problem was mapped into a classification problem in our research. We trained the CRF using the various linguistic features which was extracted from POS(Part Of Speech) tag, lexicon, length of word, and location of word in the sentences. Combined linguistic features were used in the experiments, and we could collect some linguistic features which generate good performance in the phrase break prediction. From the results of experiments, we can see that the proposed method shows improved performance on previous methods. Additionally, because the linguistic features are independent of each other in our research, the proposed method has higher flexibility than other methods.

  • PDF

The Scope and Relationship of Project Management and Systems Engineering Management (프로젝트 관리(PM)와 시스템엔지니어링 관리(SEM)의 범위 및 관계)

  • Lee, Joongyoon
    • Journal of the Korean Society of Systems Engineering
    • /
    • v.10 no.2
    • /
    • pp.1-13
    • /
    • 2014
  • Put Abstract text here. Recently, the Korea industry need systems engineering technology more than yesterday. The systems engineering management function is needed to use the systems engineering engine process effectively. The systems engineering management function is overlapped a lot with project management function and has vague interface. The project management and systems engineering are two independent disciplines. So there are few references which address the scope and relationships clearly between systems engineering management and project management. This paper suggest the scope and relationships between systems engineering management and project management and revise the PMP and SEMP structure of Blanchard and Fabrycky. Followings are the key concept of this paper presenting. The project management(PM) composed of technical management and non-technical management. The technical management scope is same with systems engineering management. The non-technical management scope is composed of parts of enterprise support processes.

An Introduction to Logistic Regression: From Basic Concepts to Interpretation with Particular Attention to Nursing Domain

  • Park, Hyeoun-Ae
    • Journal of Korean Academy of Nursing
    • /
    • v.43 no.2
    • /
    • pp.154-164
    • /
    • 2013
  • Purpose: The purpose of this article is twofold: 1) introducing logistic regression (LR), a multivariable method for modeling the relationship between multiple independent variables and a categorical dependent variable, and 2) examining use and reporting of LR in the nursing literature. Methods: Text books on LR and research articles employing LR as main statistical analysis were reviewed. Twenty-three articles published between 2010 and 2011 in the Journal of Korean Academy of Nursing were analyzed for proper use and reporting of LR models. Results: Logistic regression from basic concepts such as odds, odds ratio, logit transformation and logistic curve, assumption, fitting, reporting and interpreting to cautions were presented. Substantial shortcomings were found in both use of LR and reporting of results. For many studies, sample size was not sufficiently large to call into question the accuracy of the regression model. Additionally, only one study reported validation analysis. Conclusion: Nursing researchers need to pay greater attention to guidelines concerning the use and reporting of LR models.

Parameters Comparison in the speaker Identification under the Noisy Environments (화자식별을 위한 파라미터의 잡음환경에서의 성능비교)

  • Choi, Hong-Sub
    • Speech Sciences
    • /
    • v.7 no.3
    • /
    • pp.185-195
    • /
    • 2000
  • This paper seeks to compare the feature parameters used in speaker identification systems under noisy environments. The feature parameters compared are LP cepstrum (LPCC), Cepstral mean subtraction(CMS), Pole-filtered CMS(PFCMS), Adaptive component weighted cepstrum(ACW) and Postfilter cepstrum(PF). The GMM-based text independent speaker identification system is designed for this target. Some series of experiments show that the LPCC parameter is adequate for modelling the speaker in the matched environments between train and test stages. But in the mismatched training and testing conditions, modified parameters are preferable the LPCC. Especially CMS and PFCMS parameters are more effective for the microphone mismatching conditions while the ACW and PF parameters are good for more noisy mismatches.

  • PDF

Computerized Sound Dictionary of Korean and English

  • Kim, Jong-Mi
    • Speech Sciences
    • /
    • v.8 no.1
    • /
    • pp.33-52
    • /
    • 2001
  • A bilingual sound dictionary in Korean and English has been created for a broad range of sound reference to cross-linguistic, dialectal, native language (L1)-transferred biological and allophonic variations. The paper demonstrates that the pronunciation dictionary of the lexicon is inadequate for sound reference due to the preponderance of unmarked sounds. The audio registry consists of the three-way comparison of 1) English speech from native English speakers, 2) Korean speech from Korean speakers, and 3) English speech from Korean speakers. Several sub-dictionaries have been created as the foundation research for independent development. They are 1) a pronunciation dictionary of the Korean lexicon in a keyboard-compatible phonetic transcription, 2) a sound dictionary of L1-interfered language, and 3) an audible dictionary of Korean sounds. The dictionary was designed to facilitate the exchange of the speech signal and its corresponding text data on various media particularly on CD-ROM. The methodology and findings of the construction are discussed.

  • PDF

Text-independent Speaker Identification Using Soft Bag-of-Words Feature Representation

  • Jiang, Shuangshuang;Frigui, Hichem;Calhoun, Aaron W.
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.14 no.4
    • /
    • pp.240-248
    • /
    • 2014
  • We present a robust speaker identification algorithm that uses novel features based on soft bag-of-word representation and a simple Naive Bayes classifier. The bag-of-words (BoW) based histogram feature descriptor is typically constructed by summarizing and identifying representative prototypes from low-level spectral features extracted from training data. In this paper, we define a generalization of the standard BoW. In particular, we define three types of BoW that are based on crisp voting, fuzzy memberships, and possibilistic memberships. We analyze our mapping with three common classifiers: Naive Bayes classifier (NB); K-nearest neighbor classifier (KNN); and support vector machines (SVM). The proposed algorithms are evaluated using large datasets that simulate medical crises. We show that the proposed soft bag-of-words feature representation approach achieves a significant improvement when compared to the state-of-art methods.

Performance Enhancement of Speaker Identification System Based on GMM Using the Modified EM Algorithm (수정된 EM알고리즘을 이용한 GMM 화자식별 시스템의 성능향상)

  • Kim, Seong-Jong;Chung, Ik-Joo
    • Speech Sciences
    • /
    • v.12 no.4
    • /
    • pp.31-42
    • /
    • 2005
  • Recently, Gaussian Mixture Model (GMM), a special form of CHMM, has been applied to speaker identification and it has proved that performance of GMM is better than CHMM. Therefore, in this paper the speaker models based on GMM and a new GMM using the modified EM algorithm are introduced and evaluated for text-independent speaker identification. Various experiments were performed to evaluate identification performance of two algorithms. As a result of the experiments, the GMM speaker model attained 94.6% identification accuracy using 40 seconds of training data and 32 mixtures and 97.8% accuracy using 80 seconds of training data and 64 mixtures. On the other hand, the new GMM speaker model achieved 95.0% identification accuracy using 40 seconds of training data and 32 mixtures and 98.2% accuracy using 80 seconds of training data and 64 mixtures. It shows that the new GMM speaker identification performance is better than the GMM speaker identification performance.

  • PDF