• 제목/요약/키워드: gaussian mixture model

검색결과 419건 처리시간 0.023초

Text-to-speech 시스템에서의 화자 변환 기능 구현 (Implementation of the Voice Conversion in the Text-to-speech System)

  • 황철규;김형순
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1999년도 학술발표대회 논문집 제18권 2호
    • /
    • pp.33-36
    • /
    • 1999
  • 본 논문에서는 기존의 text-to-speech(TTS) 합성방식이 미리 정해진 화자에 의한 단조로운 합성음을 가지는 문제를 극복하기 위하여, 임의의 화자의 음색을 표현할 수 있는 화자 변환(Voice Conversion) 기능을 구현하였다. 구현된 방식은 화자의 음향공간을 Gaussian Mixture Model(GMM)로 모델링하여 연속 확률 분포에 따른 화자 변환을 가능케 했다. 원시화자(source)와 목적화자(target)간의 특징 벡터의 joint density function을 이용하여 목적화자의 음향공간 특징벡터와 변환된 벡터간의 제곱오류를 최소화하는 변환 함수를 구하였으며, 구해진 변환 함수로 벡터 mapping에 의한 스펙트럼 포락선을 변환했다. 운율 변환은 음성 신호를 정현파 모델에 의해서 모델링하고, 분석된 운율 정보(피치, 지속 시간)는 평균값을 고려해서 변환했다. 성능 평가를 위해서 VQ mapping 방법을 함께 구현하여 각각의 정규화된 켑스트럼 거리를 구해서 성능을 비교 평가하였다. 합성시에는 ABS-OLA 기반의 정현파 모델링 방식을 채택함으로써 자연스러운 합성음을 생성할 수 있었다.

  • PDF

음성/음악 분류를 위한 특징 비교 (The Comparison of features for Speech/Music Discrimination)

  • 이경록;서봉수;김진영
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 2000년도 하계학술발표대회 논문집 제19권 1호
    • /
    • pp.157-160
    • /
    • 2000
  • 본 논문에서는 멀티미디어 정보에서 원하는 정보를 추출하는 멀티미디어 인덱싱 중 오디오 인덱싱의 전처리 부격인 음성/음악 분류실험을 하였다. 오디오 인덱싱에 있어서 음성/음악 분류기는 원 오디오 신호에서 정보를 가진 음성 부분을 분리하는 역할을 한다. 실험에서는 음성/음악 분류에서 널리 쓰이는 멜캡스트럼(Mel Cepstrum), 정규화 로그 에너지(normalized log energy), 영교차(Zero-Crossings)를 특징 파라미터로 사용하였다[l, 2, 3]. 특징공간은 GMM(Gaussian Mixture Model)에 의해 모델링 되었고, 오디오 신호의 분류는 각각 3가지 분류항목(음성, 음악, 음성+음악)과 2가지 분류항목(음성, 음악)을 적용하였다. 실험결과 3가지 분류항목 적용시와 2가지 분류항목 적용시 모두 멜캡스트럼을 사용하였을 때 가장 좋은 결과를 보였다.

  • PDF

Statistical Extraction of Speech Features Using Independent Component Analysis and Its Application to Speaker Identification

  • Jang, Gil-Jin;Oh, Yung-Hwan
    • The Journal of the Acoustical Society of Korea
    • /
    • 제21권4E호
    • /
    • pp.156-163
    • /
    • 2002
  • We apply independent component analysis (ICA) for extracting an optimal basis to the problem of finding efficient features for representing speech signals of a given speaker The speech segments are assumed to be generated by a linear combination of the basis functions, thus the distribution of speech segments of a speaker is modeled by adapting the basis functions so that each source component is statistically independent. The learned basis functions are oriented and localized in both space and frequency, bearing a resemblance to Gabor wavelets. These features are speaker dependent characteristics and to assess their efficiency we performed speaker identification experiments and compared our results with the conventional Fourier-basis. Our results show that the proposed method is more efficient than the conventional Fourier-based features in that they can obtain a higher speaker identification rate.

해변에서의 사람 검출 알고리즘 (People Detection Algorithm in the Beach)

  • 최유정;김윤
    • 한국멀티미디어학회논문지
    • /
    • 제21권5호
    • /
    • pp.558-570
    • /
    • 2018
  • Recently, object detection is a critical function for any system that uses computer vision and is widely used in various fields such as video surveillance and self-driving cars. However, the conventional methods can not detect the objects clearly because of the dynamic background change in the beach. In this paper, we propose a new technique to detect humans correctly in the dynamic videos like shores. A new background modeling method that combines spatial GMM (Gaussian Mixture Model) and temporal GMM is proposed to make more correct background image. Also, the proposed method improve the accuracy of people detection by using SVM (Support Vector Machine) to classify people from the objects and KCF (Kernelized Correlation Filter) Tracker to track people continuously in the complicated environment. The experimental result shows that our method can work well for detection and tracking of objects in videos containing dynamic factors and situations.

골격 특징 및 색상 유사도를 이용한 가축 도난 감지 시스템 (Livestock Theft Detection System Using Skeleton Feature and Color Similarity)

  • 김준형;주영훈
    • 전기학회논문지
    • /
    • 제67권4호
    • /
    • pp.586-594
    • /
    • 2018
  • In this paper, we propose a livestock theft detection system through moving object classification and tracking method. To do this, first, we extract moving objects using GMM(Gaussian Mixture Model) and RGB background modeling method. Second, it utilizes a morphology technique to remove shadows and noise, and recognizes moving objects through labeling. Third, the recognized moving objects are classified into human and livestock using skeletal features and color similarity judgment. Fourth, for the classified moving objects, CAM (Continuously Adaptive Meanshift) Shift and Kalman Filter are used to perform tracking and overlapping judgment, and risk is judged to generate a notification. Finally, several experiments demonstrate the feasibility and applicability of the proposed method.

새로운 수렴특성을 이용한 클러스터 모델링 (A Cluster modeling using New Convergence properties)

  • 김승석;백찬수;김성수;유정웅
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 2004년도 학술대회 논문집 정보 및 제어부문
    • /
    • pp.382-384
    • /
    • 2004
  • In this parer, we propose a clustering that perform algorithm using new convergence properties. For detection and optimization of cluster, we use to similarity measure with cumulative probability and to inference the its parameters with MLE. A merits of using the cumulative probability in our method is very effectiveness that robust to noise or unnecessary data for inference the parameters. And we adopt similarity threshold to converge the number of cluster that is enable to past convergence and delete the other influence for this learning algorithm. In the simulation, we show effectiveness of our algorithm for convergence and optimization of cluster in riven data set.

  • PDF

음성학적으로 본 사상체질 (A Phonetic Study of 'Sasang Constitution')

  • 문승재;탁지현;황혜정
    • 대한음성학회지:말소리
    • /
    • 제55권
    • /
    • pp.1-14
    • /
    • 2005
  • Sasang Constitution, one branch of oriental medicine, claims that people can be classified into four different 'constitutions:' Taeyang, Taeum, Soyang, and Soeum. This study investigates whether the classification of the constitutions could be accurately made solely based on people's voice by analyzing the data from 46 different voices whose constitutions were already determined. Seven source-related parameters and four filter-related parameters were phonetically analyzed and the GMM(Gaussian mixture model) was tried on the data. Both the results from phonetic analyses and GMM showed that all the parameters (except one) failed to distinguish the constitutions of the people successfully. And even the single exception, B2 (the bandwidth of the second formant) did not provide us with sufficient reasons to be the source of distinction. This result seems to suggest one of the two conclusions: either the Sasang Constitutions cannot be substantiated with phonetic characteristics of peoples' voices with reliable accuracy, or we need to find yet some other parameters which haven't been conventionally proposed.

  • PDF

스테레오 데이터에 기반한 차원별 가중 보상에 의한 음성 인식 성능 향상 (Performance Improvement of Speech Recognition based on Stereo Data with Dimensionally Weighted Bias Compensation)

  • 김종현;송화전;김형순
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 2004년도 추계학술발표대회논문집 제23권 2호
    • /
    • pp.139-142
    • /
    • 2004
  • 훈련 과정과 인식 과정사이의 주변 잡음과 채널 특성으로 인한 환경의 불일치는 음성 인식 성능을 급격히 저하시킨다. 이러한 차이를 극복하기 위해 다양한 전처리 방법이 제안되어 왔으며, 최근에는 스테레오 데이터와 잡음 음성의 Gaussian Mixture Model(GMM)을 이용하여 보상벡터를 구하는 SPLICE 방법이 좋은 성능을 보여주고 있다. 하지만 차원별로 특징벡터를 보상해주는 추정된 보상벡터는 underestimation되는 경향이 있으며, 그 정도가 각각의 차원마다 달라짐이 관찰되었다. 본 논문에서는 SPLICE 방법에 기반하여 추정된 보상벡터와 실제 보상벡터 사이의 관계를 관찰하여 차원별로 다른 가중치를 적용하는 차원별 가중 보상 방법을 제안하였다. 제안한 방법은 Aurora2 Clean-condition인 경우 baseline 실험 결과에 비해 $68\%$의 높은 상대적인 인식 향상율을 얻었다.

  • PDF

MCE 학습 알고리즘을 이용한 문장독립형 화자식별의 성능 개선 (Performance Improvement of a Text-Independent Speaker Identification System Using MCE Training)

  • 김태진;최재길;권철홍
    • 대한음성학회지:말소리
    • /
    • 제57호
    • /
    • pp.165-174
    • /
    • 2006
  • In this paper we use a training algorithm, MCE (Minimum Classification Error), to improve the performance of a text-independent speaker identification system. The MCE training scheme takes account of possible competing speaker hypotheses and tries to reduce the probability of incorrect hypotheses. Experiments performed on a small set speaker identification task show that the discriminant training method using MCE can reduce identification errors by up to 54% over a baseline system trained using Bayesian adaptation to derive GMM (Gaussian Mixture Models) speaker models from a UBM (Universal Background Model).

  • PDF

Greedy Kernel PCA를 이용한 화자식별 (Speaker Identification Using Greedy Kernel PCA)

  • 김민석;양일호;유하진
    • 대한음성학회지:말소리
    • /
    • 제66호
    • /
    • pp.105-116
    • /
    • 2008
  • In this research, we propose a speaker identification system using a kernel method which is expected to model the non-linearity of speech features well. We have been using principal component analysis (PCA) successfully, and extended to kernel PCA, which is used for many pattern recognition tasks such as face recognition. However, we cannot use kernel PCA for speaker identification directly because the storage required for the kernel matrix grows quadratically, and the computational cost grows linearly (computing eigenvector of $l{\times}l$ matrix) with the number of training vectors I. Therefore, we use greedy kernel PCA which can approximate kernel PCA with small representation error. In the experiments, we compare the accuracy of the greedy kernel PCA with the baseline Gaussian mixture models using MFCCs and PCA. As the results with limited enrollment data show, the greedy kernel PCA outperforms conventional methods.

  • PDF