Search | Korea Science

Histogram Enhancement for Robust Speaker Verification (강인한 화자 확인을 위한 히스토그램 개선 기법)

Choi, Jae-Kil;Kwon, Chul-Hong
- MALSORI
- /
- no.63
- /
- pp.153-170
- /
- 2007
It is well known that when there is an acoustic mismatch between the speech obtained during training and testing, the accuracy of speaker verification systems drastically deteriorates. This paper presents the use of MFCCs' histogram enhancement technique in order to improve the robustness of a speaker verification system. The technique transforms the features extracted from speech within an utterance such that their statistics conform to reference distributions. The reference distributions proposed in this paper are uniform distribution and beta distribution. The transformation modifies the contrast of MFCCs' histogram so that the performance of a speaker verification system is improved both in the clean training and testing environment and in the clean training and noisy testing environment.
PDF

Text-dependent Speaker Recognition System Using DTW & VQ (VQ와 DTW를 이용한 문장 의존형 화자인식 시스템)

Jung JongSoon;Oh SeYoung;Bae MyungJin
- Proceedings of the Acoustical Society of Korea Conference
- /
- autumn
- /
- pp.97-103
- /
- 2001
The speaker recognition method using DTW algorithm has the problem that is reducing the performance of the speaker recognition system as the time variation. So there are many proposed algorithms to solve these problems. This paper proposes the new method If make the reference pattern that is acceptable to intra-speaker variation by reference pattern normalization. And to avoid reducing performance of speaker recognition system, we use the modified reference pattern to recognize the system user. The used methods in this paper are VQ and DTW. As the result of simulation we can obtain the $97.5\%$ of recognition accuracy rate.
PDF

Rapid Speaker Adaptation Based on MAPLR with Adaptive Hybrid Priors Estimated from Reference Speakers (참조화자로부터 추정된 적응적 혼성 사전분포를 이용한 MAPLR 고속 화자적응)

Song, Young-Rok;Kim, Hyung-Soon
- The Journal of the Acoustical Society of Korea
- /
- v.30 no.6
- /
- pp.315-323
- /
- 2011
This paper proposes two methods of estimating prior distribution to improve the performance of rapid speaker adaptation based on maximum a posteriori linear regression (MAPLR). In general, prior distribution of the transformation matrix used in MAPLR adaptation is estimated from all of the training speakers who are employed to construct the speaker-independent model, and it is applied identically to all new speakers. In this paper, we propose a method in which prior distribution is estimated from a group of reference speakers, selected using adaptation data, so that the acoustic characteristics of the selected reference speakers may be similar to that of the new speaker. Additionally, in MAPLR adaptation with block-diagonal transformation matrix, we propose a method in which the mean matrix and covariance matrix of prior distribution are estimated from two groups of transformation matrices obtained from the same training speakers, respectively. To evaluate the performance of the proposed methods, we examine word accuracy according to the number of adaptation words in the isolated word recognition task. Experimental results show that, for very limited adaptation data, statistically significant performance improvement is obtained in comparison with the conventional MAPLR adaptation.
https://doi.org/10.7776/ASK.2011.30.6.315 인용 PDF KSCI

Definite Descriptions and Attitude Reports in Situation Semantics

Cho, Young-Soon
- Language and Information
- /
- v.3 no.1
- /
- pp.83-95
- /
- 1999
In this paper I will seek to show that situation theoretic analysis of the attitudes can finely describe references of definite descriptions in attitude reports: co-reference, mis-reference, and speaker's reference. Situation theoretic concepts of a proposition and a resource situation provide excellent means to account for these references: Proposition, which is the combination of a type and an assignment, can combine linguistic and non-linguistic information; Resource situation, sometimes realized as speaker's wrong knowledge situation about an individual, can serve to explain idiosyncratic aspects of attitude reports.
PDF

The Proposal of the Fuzzed Lyapunov Dimension at Speech Signal (음성에 대한 퍼지-리아프노프 차원의 제안)

In, Joon-Hawn;Yoo, Byong-Wook;Ryu, Seok-Han;Jung, Myong-Jin;Kim, Chang-Seok
- Journal of the Korean Institute of Telematics and Electronics T
- /
- v.36T no.4
- /
- pp.30-37
- /
- 1999
This study suggested the Fuzzy Lyapunov dimension. The Fuzzy Lyapunov dimension is to evaluate the quantitative variation of the attractor. In this paper the speaker recognition is evaluated by the Fuzzy Lyapunov dimension. It has been proved that the suggested Fuzzy Lyapunov dimension is superior in the discrimination characteristics between standard reference pattern attractors, and in reference to the test pattern attractor, it has been verified that it is the speaker recognition parameter which absorbs the pattern variation. In order to evaluate the Fuzzy Lyapunov dimension as speaker recognition parameter, the mistaken recognition according to discrimination error in each of speaker and standard reference pattern was estimated, and the validity of the speaker recognition parameter was experimental. As the result of the speaker recognition experiment, 97.0[%] of recognition ratio was obtained, and it was confirmed that the Fuzzy Lyapunov dimension was fit for the speaker recognition parameter.
PDF

A Study on the Creation Rule of Reference Templates to Recognize Speech for Speaker-independent (불특정 화자의 음성 인식을 위한 표준음 설정 방법에 관한 연구)

김계국;안태옥;이순협;이종악
- Journal of the Korean Institute of Telematics and Electronics
- /
- v.25 no.7
- /
- pp.715-722
- /
- 1988
It is very important that we create reference templates to recognize speech of speeker-independent as convergence as possible vocal tract variation of each speaker. We used to clustering technique for this and creation rule of reference templates to be cluster centers is key point of thema. In this paper, we created reference tempaltes using the minimax for existance and MMS technique suggested in this study. Also, we created reference template until top 3 and compared to recognition result. When we create 3 reference templates recognition rate is 91.6% for minimax and recognition rate is 95.8% for MMS.
PDF

A Proposition of the Fuzzy Correlation Dimension for Speaker Recognition (화자인식을 위한 퍼지상관차원 제안)

Yoo, Byong-Wook;Kim, Chang-Seok;Park, Hyun-Sook
- Journal of the Korean Institute of Telematics and Electronics S
- /
- v.36S no.1
- /
- pp.115-122
- /
- 1999
In this paper, we confirmed that a speech signal is a chaos signal, and in order to use it as a speaker recognition parameter, analyzed chaos dimension. In order to raise speaker identification and pattern recognition, by making up the strange attractor involving an individual's vocal tract characteristics very well and applying fuzzy membership function to correlation dimension, we proposed fuzzy correlation dimension. By estimating the correlation of the points making up an attractor are limited according space dimension value, fuzzy correlation dimension absorbed the variation of the reference pattern attractor and test pattern attractor. Concerning fuzzy correlation dimension, by estimating the distance according to the average value of discrimination error per each speaker and reference pattern, investigated the validity of speaker recognition parameter.
PDF

Forensic Automatic Speaker Identification System for Korean Speakers (과학수사를 위한 한국인 음성 특화 자동화자식별시스템)

Kim, Kyung-Wha;So, Byung-Min;Yu, Ha-Jin
- Phonetics and Speech Sciences
- /
- v.4 no.3
- /
- pp.95-101
- /
- 2012
In this paper, we introduce the automatic speaker identification system 'SPO(Supreme Prosecutors Office) Verifier'. SPO Verifier is a GMM(Gaussian mixture model)-UBM(universal background model) based automatic speaker recognition system and has been developed using Korean speakers' utterances. This system uses a channel compensation algorithm to compensate recording device characteristics. The system can give the users the ability to manage reference models with utterances from various environments to get more accurate recognition results. To evaluate the performance of SPO Verifier on Korean speakers, we compared this system with one of the most widely used commercial systems in the forensic field. The results showed that SPO Verifier shows lower EER(equal error rate) than that of the commercial system.
https://doi.org/10.13064/KSSS.2012.4.3.095 인용 PDF

Histogram Equalization Using Background Speakers' Utterances for Speaker Identification (화자 식별에서의 배경화자데이터를 이용한 히스토그램 등화 기법)

Kim, Myung-Jae;Yang, Il-Ho;So, Byung-Min;Kim, Min-Seok;Yu, Ha-Jin
- Phonetics and Speech Sciences
- /
- v.4 no.2
- /
- pp.79-86
- /
- 2012
In this paper, we propose a novel approach to improve histogram equalization for speaker identification. Our method collects all speech features of UBM training data to make a reference distribution. The ranks of the feature vectors are calculated in the sorted list of the collection of the UBM training data and the test data. We use the ranks to perform order-based histogram equalization. The proposed method improves the accuracy of the speaker recognition system with short utterances. We use four kinds of speech databases to evaluate the proposed speaker recognition system and compare the system with cepstral mean normalization (CMN), mean and variance normalization (MVN), and histogram equalization (HEQ). Our system reduced the relative error rate by 33.3% from the baseline system.
https://doi.org/10.13064/KSSS.2012.4.2.079 인용 PDF

I-vector similarity based speech segmentation for interested speaker to speaker diarization system (화자 구분 시스템의 관심 화자 추출을 위한 i-vector 유사도 기반의 음성 분할 기법)

Bae, Ara;Yoon, Ki-mu;Jung, Jaehee;Chung, Bokyung;Kim, Wooil
- The Journal of the Acoustical Society of Korea
- /
- v.39 no.5
- /
- pp.461-467
- /
- 2020
In noisy and multi-speaker environments, the performance of speech recognition is unavoidably lower than in a clean environment. To improve speech recognition, in this paper, the signal of the speaker of interest is extracted from the mixed speech signals with multiple speakers. The VoiceFilter model is used to effectively separate overlapped speech signals. In this work, clustering by Probabilistic Linear Discriminant Analysis (PLDA) similarity score was employed to detect the speech signal of the interested speaker, which is used as the reference speaker to VoiceFilter-based separation. Therefore, by utilizing the speaker feature extracted from the detected speech by the proposed clustering method, this paper propose a speaker diarization system using only the mixed speech without an explicit reference speaker signal. We use phone-dataset consisting of two speakers to evaluate the performance of the speaker diarization system. Source to Distortion Ratio (SDR) of the operator (Rx) speech and customer speech (Tx) are 5.22 dB and -5.22 dB respectively before separation, and the results of the proposed separation system show 11.26 dB and 8.53 dB respectively.
https://doi.org/10.7776/ASK.2020.39.5.461 인용 PDF KSCI

Search Result 87, Processing Time 0.031 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)