• Title/Summary/Keyword: gaussian mixture model

Search Result 417, Processing Time 0.023 seconds

HMM-based missing feature reconstruction for robust speech recognition in additive noise environments (가산잡음환경에서 강인음성인식을 위한 은닉 마르코프 모델 기반 손실 특징 복원)

  • Cho, Ji-Won;Park, Hyung-Min
    • Phonetics and Speech Sciences
    • /
    • v.6 no.4
    • /
    • pp.127-132
    • /
    • 2014
  • This paper describes a robust speech recognition technique by reconstructing spectral components mismatched with a training environment. Although the cluster-based reconstruction method can compensate the unreliable components from reliable components in the same spectral vector by assuming an independent, identically distributed Gaussian-mixture process of training spectral vectors, the presented method exploits the temporal dependency of speech to reconstruct the components by introducing a hidden-Markov-model prior which incorporates an internal state transition plausible for an observed spectral vector sequence. The experimental results indicate that the described method can provide temporally consistent reconstruction and further improve recognition performance on average compared to the conventional method.

GMM-Based Maghreb Dialect Identification System

  • Nour-Eddine, Lachachi;Abdelkader, Adla
    • Journal of Information Processing Systems
    • /
    • v.11 no.1
    • /
    • pp.22-38
    • /
    • 2015
  • While Modern Standard Arabic is the formal spoken and written language of the Arab world; dialects are the major communication mode for everyday life. Therefore, identifying a speaker's dialect is critical in the Arabic-speaking world for speech processing tasks, such as automatic speech recognition or identification. In this paper, we examine two approaches that reduce the Universal Background Model (UBM) in the automatic dialect identification system across the five following Arabic Maghreb dialects: Moroccan, Tunisian, and 3 dialects of the western (Oranian), central (Algiersian), and eastern (Constantinian) regions of Algeria. We applied our approaches to the Maghreb dialect detection domain that contains a collection of 10-second utterances and we compared the performance precision gained against the dialect samples from a baseline GMM-UBM system and the ones from our own improved GMM-UBM system that uses a Reduced UBM algorithm. Our experiments show that our approaches significantly improve identification performance over purely acoustic features with an identification rate of 80.49%.

Advance Neuro-Fuzzy Modeling Using a New Clustering Algorithm (새로운 클러스터링 알고리듬을 적용한 향상된 뉴로-퍼지 모델링)

  • 김승석;김성수;유정웅
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.53 no.7
    • /
    • pp.536-543
    • /
    • 2004
  • In this paper, we proposed a new method of modeling a neuro-fuzzy system using a hybrid clustering algorithm. The initial parameters and the number of clusters of the proposed system are optimally chosen simultaneously with respect to the process of regression, which is a unique characteristics of the proposed system. The proposed algorithm presented in this work improves the overall performance of the proposed a neuro-fuzzy system by choosing a proper number of clusters adaptively according the characteristics of given data. The process of clustering is performed by deciding on the number of classes, which yields the property of convergence of the system. In experiments, the superiority of the proposed neuro-fuzzy system is demonstrated, especially the process of optimizing parameters and clustering of learning speed.

Korean Speech Segmentation and Recognition by Frame Classification via GMM (GMM을 이용한 프레임 단위 분류에 의한 우리말 음성의 분할과 인식)

  • 권호민;한학용;고시영;허강인
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2003.06a
    • /
    • pp.18-21
    • /
    • 2003
  • In general it has been considered to be the difficult problem that we divide continuous speech into short interval with having identical phoneme quality. In this paper we used Gaussian Mixture Model (GMM) related to probability density to divide speech into phonemes, an initial, medial, and final sound. From them we peformed continuous speech recognition. Decision boundary of phonemes is determined by algorithm with maximum frequency in a short interval. Recognition process is performed by Continuous Hidden Markov Model(CHMM), and we compared it with another phoneme divided by eye-measurement. For the experiments result we confirmed that the method we presented is relatively superior in auto-segmentation in korean speech.

  • PDF

Combination of Classifiers Decisions for Multilingual Speaker Identification

  • Nagaraja, B.G.;Jayanna, H.S.
    • Journal of Information Processing Systems
    • /
    • v.13 no.4
    • /
    • pp.928-940
    • /
    • 2017
  • State-of-the-art speaker recognition systems may work better for the English language. However, if the same system is used for recognizing those who speak different languages, the systems may yield a poor performance. In this work, the decisions of a Gaussian mixture model-universal background model (GMM-UBM) and a learning vector quantization (LVQ) are combined to improve the recognition performance of a multilingual speaker identification system. The difference between these classifiers is in their modeling techniques. The former one is based on probabilistic approach and the latter one is based on the fine-tuning of neurons. Since the approaches are different, each modeling technique identifies different sets of speakers for the same database set. Therefore, the decisions of the classifiers may be used to improve the performance. In this study, multitaper mel-frequency cepstral coefficients (MFCCs) are used as the features and the monolingual and cross-lingual speaker identification studies are conducted using NIST-2003 and our own database. The experimental results show that the combined system improves the performance by nearly 10% compared with that of the individual classifier.

Speaker Identification in Small Training Data Environment using MLLR Adaptation Method (MLLR 화자적응 기법을 이용한 적은 학습자료 환경의 화자식별)

  • Kim, Se-hyun;Oh, Yung-Hwan
    • Proceedings of the KSPS conference
    • /
    • 2005.11a
    • /
    • pp.159-162
    • /
    • 2005
  • Identification is the process automatically identify who is speaking on the basis of information obtained from speech waves. In training phase, each speaker models are trained using each speaker's speech data. GMMs (Gaussian Mixture Models), which have been successfully applied to speaker modeling in text-independent speaker identification, are not efficient in insufficient training data environment. This paper proposes speaker modeling method using MLLR (Maximum Likelihood Linear Regression) method which is used for speaker adaptation in speech recognition. We make SD-like model using MLLR adaptation method instead of speaker dependent model (SD). Proposed system outperforms the GMMs in small training data environment.

  • PDF

Real-time plasma condition estimate model based on Optical Emission Spectroscopy (OES) datafor semiconductor processing (반도체공정을 위한 OES 데이터 기반 실시간 플라즈마 상태예측 모형)

  • Hee Jin Jung;Jin Seung Ryu
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.341-344
    • /
    • 2023
  • 건식 반도체 공정에서 저온플라즈마를 일정한 상태로 유지하는 것은 반도체 공정의 효율을 높이기 위해서 매우 중요한 문제이다. 그러나 저온플라즈마 반응로를 진공상태로 유지해야하기 때문에 플라즈마의 상태를 예측하는 작업은 매우 어렵다. 본 연구에서는 OES 센서에서 수집된 데이터를 이용하여 플라즈마의 상태를 예측하는 모형을 개발하였다. 질소가스를 이용한 플라즈마 반응로에서 15개의 서로 다른 플라즈마를 생성하여 OES 데이터를 수집하였고 15개 플라즈마의 상태를 분류할 수 있는 Gaussian Mixture Model(GMM)을 개발하였다. 총 7,296개 파장에서 측정된 분광강도(intensity)를 주성분분석(Pricipal Component Analysis)를 통해 2개의 주성분으로 차원 축소하여 GMM 모형을 개발하엿다. 모형의 정확도는 약 81.72%으로 플라즈마의 OES데이터에 대한 해석력은 뛰어났다.

IR Image Segmentation using GrabCut (GrabCut을 이용한 IR 영상 분할)

  • Lee, Hee-Yul;Lee, Eun-Young;Gu, Eun-Hye;Choi, Il;Choi, Byung-Jae;Ryu, Gang-Soo;Park, Kil-Houm
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.2
    • /
    • pp.260-267
    • /
    • 2011
  • This paper proposes a method for segmenting objects from the background in IR(Infrared) images based on GrabCut algorithm. The GrabCut algorithm needs the window encompassing the interesting known object. This procedure is processed by user. However, to apply it for object recognition problems in image sequences. the location of window should be determined automatically. For this, we adopted the Otsu' algorithm for segmenting the interesting but unknown objects in an image coarsely. After applying the Otsu' algorithm, the window is located automatically by blob analysis. The GrabCut algorithm needs the probability distributions of both the candidate object region and the background region surrounding closely the object for estimating the Gaussian mixture models(GMMs) of the object and the background. The probability distribution of the background is computed from the background window, which has the same number of pixels within the candidate object region. Experiments for various IR images show that the proposed method is proper to segment out the interesting object in IR image sequences. To evaluate performance of proposed segmentation method, we compare other segmentation methods.

Segmentation of Color Image using the Deterministic Annealing EM Algorithm (결정적 어닐링 EM 알고리즘을 이요한 칼라 영상의 분할)

  • Cho, Wan-Hyun;Park, Jong-Hyun;Park, Soon-Young
    • Journal of KIISE:Databases
    • /
    • v.28 no.3
    • /
    • pp.324-333
    • /
    • 2001
  • In this paper we present a novel color image segmentation algorithm based on a Gaussian Mixture Model(GMM). It is introduced a Deterministic Annealing Expectation Maximization(DAEM) algorithm which is developed using the principle of maximum entropy to overcome the local maxima problem associated with the standard EM algorithm. In our approach, the GMM is used to represent the multi-colored objects statistically and its parameters are estimated by DAEM algorithm. We also develop the automatic determination method of the number of components in Gaussian mixtures models. The segmentation of image is based on the maximum posterior probability distribution which is calculated by using the GMM. The experimental results show that the proposed DAEM can estimate the parameters more accurately than the standard EM and the determination method of the number of mixture models is very efficient. When tested on two natural images, the proposed algorithm performs much better than the traditional algorithm in segmenting the image fields.

  • PDF

Infrared Image Segmentation by Extracting and Merging Region of Interest (관심영역 추출과 통합에 의한 적외선 영상 분할)

  • Yeom, Seokwon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.26 no.6
    • /
    • pp.493-497
    • /
    • 2016
  • Infrared (IR) imaging is capable of detecting targets that are not visible at night, thus it has been widely used for the security and defense system. However, the quality of the IR image is often degraded by low resolution and noise corruption. This paper addresses target segmentation with the IR image. Multiple regions of interest (ROI) are extracted by the multi-level segmentation and targets are segmented from the individual ROI. Each level of the multi-level segmentation is composed of a k-means clustering algorithm an expectation-maximization (EM) algorithm, and a decision process. The k-means clustering algorithm initializes the parameters of the Gaussian mixture model (GMM) and the EM algorithm iteratively estimates those parameters. Each pixel is assigned to one of clusters during the decision. This paper proposes the selection and the merging of the extracted ROIs. ROI regions are selectively merged in order to include the overlapped ROI windows. In the experiments, the proposed method is tested on an IR image capturing two pedestrians at night. The performance is compared with conventional methods showing that the proposed method outperforms others.