• Title/Summary/Keyword: 화자확인 시스템

Search Result 122, Processing Time 0.022 seconds

Implementation of Speech Enhancement System using Matched Filter Array (Matched filter Array를 이용한 음질 향상 시스템 구현)

  • 오승수;김기만
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 1999.11a
    • /
    • pp.173-176
    • /
    • 1999
  • Recently, speaker localizing estimation technique has been rising in teleconference systems. In this system, it is recognized speaker location using microphone array and camera is directed to speaker location automatically. In this paper, it was described to be able to enhance the speech qualify through microphone array, decrease computational loads using IIR filter as inverse filter, and confirmed to implement hardware using DSP processor.

  • PDF

Automatic User-identification verification system using speech signatures based on multi-user processing technology for secured electronic commerce systems (다중사용자 처리기술을 이용한 전자상거래용 화자확인 사용자 인증 시스템)

  • Jeong, Seok-Yeong;Yu, Wan-Sun;Kang, Sun-Mee
    • Annual Conference of KIPS
    • /
    • 2000.04a
    • /
    • pp.497-501
    • /
    • 2000
  • 전자상거래 시장이 활발해 지면서 인터넷 쇼핑몰 업체들은 보다 강력한 보안체제를 갖추기 위해 노력하고 있다. 특히 최근 생체정보의 인식사례가 선보이면서 이런 인식서비스를 전자상거래에 접목시키려는 노력이 활발히 이루어지고 있는 중이다. 그러나 다중사용자를 실시간으로 처리해야하는 전자상거래 서비스 특성상 부하가 많이 소요되는 인식엔진을 전자상거래용 서버에 포함시키는 것은 적지 않은 부담이 든다. 본 논문에서는 이러한 문제의 해결방안으로 고객의 목소리를 통한 사용자 인증 시스템을 별도의 다중 처리 시스템으로 구성하는 것을 제안하며 이의 구현 사례를 보이고자 한다. 부하가 많이 따르는 인식엔진 등의 서비스를 별도로 관리함으로 다중사용자 접속을 요구하는 많은 서비스에 유용한 해결 방안을 보이고자 한다. 본 서비스는 인터넷 쇼핑몰 프로그램 개발 업체인 (주)아이커머스 코리아의 전자상거래 솔루션과 연동하여 프로그램 개발이 완료된 상태이다.

  • PDF

Target Speaker Speech Restoration via Spectral bases Learning (주파수 특성 기저벡터 학습을 통한 특정화자 음성 복원)

  • Park, Sun-Ho;Yoo, Ji-Ho;Choi, Seung-Jin
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.3
    • /
    • pp.179-186
    • /
    • 2009
  • This paper proposes a target speech extraction which restores speech signal of a target speaker form noisy convolutive mixture of speech and an interference source. We assume that the target speaker is known and his/her utterances are available in the training time. Incorporating the additional information extracted from the training utterances into the separation, we combine convolutive blind source separation(CBSS) and non-negative decomposition techniques, e.g., probabilistic latent variable model. The nonnegative decomposition is used to learn a set of bases from the spectrogram of the training utterances, where the bases represent the spectral information corresponding to the target speaker. Based on the learned spectral bases, our method provides two postprocessing steps for CBSS. Channel selection step finds a desirable output channel from CBSS, which dominantly contains the target speech. Reconstruct step recovers the original spectrogram of the target speech from the selected output channel so that the remained interference source and background noise are suppressed. Experimental results show that our method substantially improves the separation results of CBSS and, as a result, successfully recovers the target speech.

A Speaker Pruning Method for Reducing Calculation Costs of Speaker Identification System (화자식별 시스템의 계산량 감소를 위한 화자 프루닝 방법)

  • 김민정;오세진;정호열;정현열
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.6
    • /
    • pp.457-462
    • /
    • 2003
  • In this paper, we propose a speaker pruning method for real-time processing and improving performance of speaker identification system based on GMM(Gaussian Mixture Model). Conventional speaker identification methods, such as ML (Maximum Likelihood), WMR(weighting Model Rank), and MWMR(Modified WMR) we that frame likelihoods are calculated using the whole frames of each input speech and all of the speaker models and then a speaker having the biggest accumulated likelihood is selected. However, in these methods, calculation cost and processing time become larger as the increase of the number of input frames and speakers. To solve this problem in the proposed method, only a part of speaker models that have higher likelihood are selected using only a part of input frames, and identified speaker is decided from evaluating the selected speaker models. In this method, fm can be applied for improving the identification performance in speaker identification even the number of speakers is changed. In several experiments, the proposed method showed a reduction of 65% on calculation cost and an increase of 2% on identification rate than conventional methods. These results means that the proposed method can be applied effectively for a real-time processing and for improvement of performance in speaker identification.

Face Detection based Real-time Eye Gaze Correction Method Using a Depth Camera (거리 카메라를 이용한 얼굴 검출 기반 실시간 시선 보정 방법)

  • Jo, Hoon;Ra, Moon-Soo;Kim, Whoi-Yul;Kim, Deuk-Hwa
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2012.11a
    • /
    • pp.151-154
    • /
    • 2012
  • 본 논문에서는 화상통신의 현실감을 증진시킬 수 있는 화자 간 시선 맞춤 시스템을 제안한다. 제안하는 방법은 Kinect 거리 카메라로부터 입력된 영상에서 화자의 얼굴 영역을 획득하여 화자의 시선이 카메라를 응시하도록 획득한 영역을 변환한 후에 원본 영상과 합성한다. Kinect 거리 카메라에서 획득한 얼굴 영역에는 다양한 형태의 잡음이 많아 미디언 필터와 모폴로지 연산을 통해 얼굴 영역의 잡음을 제거한다. 화자의 위치에 상관 없이 화자가 카메라를 응시하는 영상을 생성하기 위해서 Kinect 가 제공하는 거리 정보를 이용하여 시선 보정 각도와 회전 축을 획득한다. 시선이 보정된 얼굴 영역은 원본 영상에서 존재하지 않는 영역을 포함하고 있기 때문에, 원본 영상의 각 화소를 삼각형 메쉬로 구성한 후 해당 영역을 보간하여 최종적으로 시선이 보정된 영상을 생성한다. 제안하는 방법은 시선 맞춤 영상을 생성하는 데 필수적인 눈과 주변 얼굴 영역만 선택해서 변환하므로 영상의 왜곡이 적고 실시간 처리가 가능하다는 장점이 있다. 또한 카메라와 화자 사이의 거리 정보를 이용해 화자의 위치에 적응적인 시선 맞춤 영상을 생성할 수 있다. 실험을 통해 Intel i5 CPU 를 장착한 PC에서 $320{\times}240$ 크기의 영상을 사용할 경우 초당 약 35 프레임의 보정된 영상을 생성하여 제안하는 방법이 실시간 처리가 가능하다는 것을 확인하였다.

  • PDF

On Codebook Design to Improve Speaker Adaptation (음성 인식 시스템의 화자 적응 성능 향상을 위한 코드북 설계)

  • Yang, Tae-Young;Shin, Won-Ho;Kim, Weon-Goo;Youn, Dae-Hee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.2
    • /
    • pp.5-11
    • /
    • 1996
  • The purpose of this paper is to propose a method improving the performance of a semi-continuous hidden Markov model(SCHMM) speaker adaptation system which uses Bayesian Parameter reestimation approach. The performance of Bayesian speaker adaptation could be degraded in case that the features of a new speaker are severely different from those of a reference codebook. The excessive codewords of the reference codebook still remain after adaptation proess. which cause confusion in recognition process. To solve such problems, the proposed method uses formant information which is extracted from the cepstral coefficients of the reference codebook and adaptation data. The reference codebook is adapted to represent the formant distribution of a new speaker and it is used for Bayesian speaker adaptation as an initial codebook. The proposed method provides accurate correspondence between reference codebook and adaptation data. It was observed that the excessive codewords were not selected during recognition process. The experimental results showed that the proposed method improved the recognition performance.

  • PDF

The bootstrap VQ model for automatic speaker recognition system (VQ 방식의 화자인식 시스템 성능 향상을 위한 부쓰트랩 방식 적용)

  • Kyung YounJeong;Lee Jin-Ick;Lee Hwang-Soo
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.39-42
    • /
    • 2000
  • A bootstrap and aggregating (bagging) vector quantization (VQ) classifier is proposed for speaker recognition. This method obtains multiple training data sets by resampling the original training data set, and then integrates the corresponding multiple classifiers into a single classifier. Experiments involving a closed set, text-independent and speaker identification system are carried out using the TIMIT database. The proposed bagging VQ classifier shows considerably improved performance over the conventional VQ classifier.

  • PDF

Performance Improvement of Speaker Recognition Using Enhanced Feature Extraction in Glottal Flow Signals and Multiple Feature Parameter Combination (Glottal flow 신호에서의 향상된 특징추출 및 다중 특징파라미터 결합을 통한 화자인식 성능 향상)

  • Kang, Jihoon;Kim, Youngil;Jeong, Sangbae
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.12
    • /
    • pp.2792-2799
    • /
    • 2015
  • In this paper, we utilize source mel-frequency cepstral coefficients (SMFCCs), skewness, and kurtosis extracted in glottal flow signals to improve speaker recognition performance. Generally, because the high band magnitude response of glottal flow signals is somewhat flat, the SMFCCs are extracted using the response below the predefined cutoff frequency. The extracted SMFCC, skewness, and kurtosis are concatenated with conventional feature parameters. Then, dimensional reduction by the principal component analysis (PCA) and the linear discriminat analysis (LDA) is followed to compare performances with conventional systems under equivalent conditions. The proposed recognition system outperformed the conventional system for large scale speaker recognition experiments. Especially, the performance improvement was more noticeable for small Gaussan mixtures.

A Study on FCM Algorithm for the Performance Improvement of Speaker Adaptation System (화자적응 시스템의 성능향상을 위한 FCM 알고리즘에 대한 연구)

  • Bhang Ki-Duck;Jun Sun-Do;Kang Chul-Ho
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.32-35
    • /
    • 1999
  • 기존의 반연속 HMM의 파라미터들 중에서 평균 벡터와 분산 행렬은 Maximum Likelihood Estimation 방법을 사용하여 학습한다. 본 논문에서는 평균 벡터를 위하여 Fuzzy c-means(FCM) 알고리즘을 사용하였고 분산 행렬을 위하여 FCM 알고리즘의 평균 벡터를 적용, 변형한 새로운 함수를 사용하여 화자적응에 적용하였다. 이러한 평균 벡터와 분산 행렬의 추정 방법은 새로운 화자에 대한 적응 능력을 갖는다. 제안한 방법을 적용한 한국어 격리 단어에 대한 컴퓨터 모의 실험결과 새로운 화자에 대해 적응함을 확인하였다.

  • PDF

Design and Implementation of Speaker Verification System Using Voice (음성을 이용한 화자 검증기 설계 및 구현)

  • 지진구;윤성일
    • Journal of the Korea Society of Computer and Information
    • /
    • v.5 no.3
    • /
    • pp.91-98
    • /
    • 2000
  • In this paper we design implement the speaker verification system for verifying personal identification using voice. Filter bank magnitude was used as a feature parameter and code-book was made using LBG a1gorithm. The code book convert feature parameters into code sequence. The difference between reference pattern and input pattern measures using DTW(Dynamic Time Warping). The similarity measured using DTW and threshold value derived from deviation were used to discriminate impostor from client speaker.

  • PDF