• 제목/요약/키워드: 김화자

검색결과 184건 처리시간 0.02초

스코어 기반 관측신뢰도를 이용한 잡음환경하 화자식별 (Speaker Identification Using Score-based Confidence in Noisy Environments)

  • 민소희;송민규;나승유;최승호;김진영
    • 음성과학
    • /
    • 제14권4호
    • /
    • pp.145-156
    • /
    • 2007
  • The performance of speaker identification is severely degraded in noisy environments. Recently probability weighting method based on observation membership was proposed for overcoming the noise problem[1]. In the paper[1] the observation confidence was calculated from SNR with sigmoid function. However, estimating SNR needs additive calculation amount and estimated SNR is corrupted in dynamic noisy environments. In this paper we propose estimation methods of the observation confidence based on score-based reliabilities (SBR) of entropy and dispersion measures. Generally SBRs are obtained from speaker models' probabilities. The proposed methods are evaluated with ETRI speaker recognition DB. We compared the performances of the proposed methods with those in [1][8]. The experimental results show that the proposed methods can be successfully applied for the case where SNR is not available.

  • PDF

잡음 환경하에서 환경 군집화를 이용한 고속화자 적응 (Fast Speaker Adaptation in Noisy Environment using Environment Clustering)

  • 김영국;송화전;김형순
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2007년도 한국음성과학회 공동학술대회 발표논문집
    • /
    • pp.33-36
    • /
    • 2007
  • In this paper, we investigate a fast speaker adaptation method based on eigenvoice in several noisy environments. In order to overcome its weakness against noise, we propose a noisy environment clustering method which divides the noisy adaptation utterances into utterance groups with similar environments by the vector quantization based clustering using a cepstral mean as a feature vector. Then each utterance group is used for adaptation to make an environment dependent model. According to our experiment, we obtained 19-37 % relative improvement in error rate compared with the simultaneous speaker adaptation and environmental compensation method

  • PDF

한국어 연결숫자인식을 위한 숫자 모델링에 관한 연구 (A Study on Digit Modeling for Korean Connected Digit Recognition)

  • 김기성
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1998년도 제15회 음성통신 및 신호처리 워크샵(KSCSP 98 15권1호)
    • /
    • pp.293-297
    • /
    • 1998
  • 전화망에서의 연결 숫자 인식 시스템의 개발에 대한 내용을 다루며, 이 시스템에서 다양한 숫자 모델링 방법들을 구현하고 비겨하였다. Word 모델의 경우 문맥독립 whole-word 모델을 구현하였으며, sub-word 모델로는 triphone 모델과 불파음화 자음을 모음에 포함시킨 modified triphone 모델을 구현하였다. 그리고 tree-based clustering 방법을 sub-word 모델과 문맥종속 whole-word 모델에 적용하였다. 이와 같은 숫자모델들에 대해 연속 HMM을 이용하여 화자독립 연결숫자 인식 실험을 수행한 결과, 문맥종속 단어 모델이 문맥독립 단어 모델보다 우수한 성능을 나타냈으며, triphone 모델과 modified triphone 모델은 유사한 성능을 나타냈다. 특히 tree-based clustering 방법을 적용한 문맥종속 단어 모델이 4연 숫자열에 대해 99.8%의 단어 dsltlr률 및 99.1%의 숫자열 인식률로서 가장 우수한 성능을 나타내었다.

  • PDF

CDHMM의 상태당 가지 수를 가변시키는 화자적응에 관한 연구 (A study on the speaker adaptation in CDHMM usling variable number of mixtures in each state)

  • 김광태;서정일;홍재근
    • 전자공학회논문지S
    • /
    • 제35S권3호
    • /
    • pp.166-175
    • /
    • 1998
  • When we make a speaker adapted model using MAPE (maximum a posteriori estimation), the adapted model has one mixture in each state. This is because we cannot estimate a number of a priori distribution from a speaker-independent model in each state. If the model is represented by one mixture in each state, it is not well adadpted to specific speaker because it is difficult to represent various speech informationof the speaker with one mixture. In this paper, we suggest the method using several mixtures to well represent various speech information of the speaker in each state. But, because speaker-specific training dat is not sufficient, this method can't be used in every state. So, we make the number of mixtures in each state variable in proportion to the number of frames and to the determinant ofthe variance matrix in the state. Using the proposed method, we reduced the error rate than methods using one branch in each state.

  • PDF

한국인 화자의 영어 발화 속도와 피치, 강세 간의 관계 연구 (A Study on the Relation among English Speech Rate, Pitch and Stress by Korean Speakers)

  • 김지은
    • 말소리와 음성과학
    • /
    • 제6권3호
    • /
    • pp.101-108
    • /
    • 2014
  • This study investigates the relation among pitch range differences, speech rate and realization of stress. To identify the realization of the stress, vowel formants and durational differences of stressed and unstressed vowels are measured. The Korean learners were asked to read a textbook passage which includes nine sentences. The major results indicate that: (1) Korean speakers' pitch range is less than 50% of the native speakers; (2) There is a significantly negative relation between high-low pitch range and speech rate; (3) The vowel qualities and durations of the stressed and unstressed vowels are related to the speech rate. But these are not related to the high-low pitch range.

상보적인 빔형성에 기반한 대수적 마이크로폰 배열을 이용한 음성개선 (Speech Enhancement Using LLA Microphones Based on Complementary Beamforming)

  • 장병욱;권홍석;김시호;배건성
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 2001년도 추계학술발표대회 논문집 제20권 2호
    • /
    • pp.113-116
    • /
    • 2001
  • 본 연구에서는 상보적인 빔형성에 기반한 대수적 마이크로폰 배열을 이용한 음성개선 시스템을 제안한다. 사람들이 많이 모여있는 회의실이나 사무실 환경에서는 백색잡음 보다 음성잡음, 즉, 다른 화자의 음성신호가 더 큰 영향력을 가질 수 있다. 따라서 대수적인 마이크로폰 배열을 사용함으로써 기존의 빔형성 기법에 비하여 저주파 영역에서의 성능을 향상시키고자 하였다. 모의실험 결과, 백색 가우시안 잡음에 대해서는 별다른 성능저하 없이 저주파 성분이 강한 음성잡음에 대해서는 우수한 성능을 가짐을 알 수 있었다.

  • PDF

중국인 여성 화자의 한국어 평음 파열음 발음: 독립 문장과 문단의 비교 (Korean plain plosive produced by Chinese female speakers: Sentence vs. Paragraph)

  • 강반;김지은;이충우
    • 말소리와 음성과학
    • /
    • 제7권2호
    • /
    • pp.111-117
    • /
    • 2015
  • The purpose of this study is to investigate how Chinese learners of Korean produce Korean plain plosives differently in a reading passage and isolated sentences. There are several studies on Korean plosives produced by Chinese speakers, but the study comparing the production of reading passage and isolated sentences are rare. For these purposes, ten Chinese speakers' VOT values of Korean plain plosives were measured using Speech Analyzer. The results show that there is no significant difference between the plain plosive production of a reading passage and that of isolated sentences. In the further studies, the measurement of pitch with VOT is needed.

화자의 발음에 대한 통계적 모델의 적용에 관한 연구 (A study on application of the statistic model about an utterance of the speaker)

  • 김대식;배명진;윤재강
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 1988년도 전기.전자공학 학술대회 논문집
    • /
    • pp.25-28
    • /
    • 1988
  • A speech that play a part of important mediation in the man's conversation is the sound of representation to man's emotion and thought, then voice sound could be verified and identified a speaker's speech by individual property. This study indicates as distribution of pitch in searching for sample number of each pitch with eye in the sound waveform of speaker. We propose the algorithm that judge speaker's emotion state, personality, regional group, age, sex distinction, e.t.c., according to the deviation degree.

  • PDF

LPC Cepstrum과 집단화를 이용한 한국어 고립단어 인식에 관한 연구 (A Study on Korean isolated word recognition using LPC cepstrum and clustering)

  • 김진영;성굉모
    • 한국음향학회지
    • /
    • 제6권4호
    • /
    • pp.44-54
    • /
    • 1987
  • 본 논문은 화자독립 고립단어 인식에 있어서 LP모델의 문제점과 그 해결 방안으로서 cepstrum영역에 있어서 lifter를 이용한 해결에 대해서 고찰하였다. 한편, 각 인식 단어의 기준 패턴을 구하기 위한 방법으로서 집단화의 방법에 대해 논하였다. 집단화의 방법으로서는 UWA방법과 K-iteration방법을 변형시킨 KMA 방법을 제시 비교하였다. 인식실험결과 정현파 lifter와 KMA의 집단화 방법을 사용하였을 때 $95\%$의 최고 인식률을 보였다.

  • PDF

통계적 기법을 이용한 화자변화 검출 실험 (A Speaker Change Detection Experiment that Uses a Statistical Method)

  • 이경록;김진영
    • 음성과학
    • /
    • 제8권4호
    • /
    • pp.59-72
    • /
    • 2001
  • In this paper, we experimented with speaker change detection that uses a statistical method for NOD (News On Demand) service. A specified speaker's change can find out content of each data in speech if analysed because it means change of data contents in news data. Speaker change detection acts as preprocessor that divide input speech by speaker. This is an important preprocessor phase for speaker tracking. We detected speaker change using GLR(generalized likelihood ratio) distance base division and BIC (Bayesian information criterion) base division among matrix method. An experiment verified speaker change point using BIC base division after divide by speaker unit using GLR distance base method first. In the experimental result, FAR (False Alarm Rate) was 63.29 in high noise environment and FAR was 54.28 in low noise environment in MDR (Missed Detection Rate) 15% neighborhood.

  • PDF