Search | Korea Science

Proposal of speaker change detection system considering speaker overlap (화자 겹침을 고려한 화자 전환 검출 시스템 제안)

Park, Jisu;Yun, Young-Sun;Cha, Shin;Park, Jeon Gue
- The Journal of the Acoustical Society of Korea
- /
- v.40 no.5
- /
- pp.466-472
- /
- 2021
Speaker Change Detection (SCD) refers to finding the moment when the main speaker changes from one person to the next in a speech conversation. In speaker change detection, difficulties arise due to overlapping speakers, inaccuracy in the information labeling, and data imbalance. To solve these problems, TIMIT corpus widely used in speech recognition have been concatenated artificially to obtain a sufficient amount of training data, and the detection of changing speaker has performed after identifying overlapping speakers. In this paper, we propose an speaker change detection system that considers the speaker overlapping. We evaluated and verified the performance using various approaches. As a result, a detection system similar to the X-Vector structure was proposed to remove the speaker overlapping region, while the Bi-LSTM method was selected to model the speaker change system. The experimental results show a relative performance improvement of 4.6 % and 13.8 % respectively, compared to the baseline system. Additionally, we determined that a robust speaker change detection system can be built by conducting related studies based on the experimental results, taking into consideration text and speaker information.
https://doi.org/10.7776/ASK.2021.40.5.466 인용 PDF KSCI

A Blind Segmentation Algorithm for Speaker Verification System (화자확인 시스템을 위한 분절 알고리즘)

김지운;김유진;민홍기;정재호
- The Journal of the Acoustical Society of Korea
- /
- v.19 no.3
- /
- pp.45-50
- /
- 2000
This paper proposes a delta energy method based on Parameter Filtering(PF), which is a speech segmentation algorithm for text dependent speaker verification system over telephone line. Our parametric filter bank adopts a variable bandwidth along with a fixed center frequency. Comparing with other methods, the proposed method turns out very robust to channel noise and background noise. Using this method, we segment an utterance into consecutive subword units, and make models using each subword nit. In terms of EER, the speaker verification system based on whole word model represents 6.1%, whereas the speaker verification system based on subword model represents 4.0%, improving about 2% in EER.
PDF

Quantization Based Speaker Normalization for DHMM Speech Recognition System (DHMM 음성 인식 시스템을 위한 양자화 기반의 화자 정규화)

신옥근
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.4
- /
- pp.299-307
- /
- 2003
There have been many studies on speaker normalization which aims to minimize the effects of speaker's vocal tract length on the recognition performance of the speaker independent speech recognition system. In this paper, we propose a simple vector quantizer based linear warping speaker normalization method based on the observation that the vector quantizer can be successfully used for speaker verification. For this purpose, we firstly generate an optimal codebook which will be used as the basis of the speaker normalization, and then the warping factor of the unknown speaker will be extracted by comparing the feature vectors and the codebook. Finally, the extracted warping factor is used to linearly warp the Mel scale filter bank adopted in the course of MFCC calculation. To test the performance of the proposed method, a series of recognition experiments are conducted on discrete HMM with thirteen mono-syllabic Korean number utterances. The results showed that about 29% of word error rate can be reduced, and that the proposed warping factor extraction method is useful due to its simplicity compared to other line search warping methods.
PDF KSCI

Application Example of Forensic Speaker Analysis Method for Voice-phishing Speech Files (보이스피싱 음성 파일에 대한 법과학적 화자 분석 방법의 적용 사례)

박남인;이중;전옥엽;김태훈
- Journal of Digital Forensics
- /
- v.13 no.1
- /
- pp.35-44
- /
- 2019
The voice-phishing is done by inducing victims to send money, only with voice through the personal information illegally obtained. The amount of damage caused by voice-phishing continues to increase every year, and it became a social problem. Recently, the Financial Supervisory Service (i.e. the FSS) in Republic of Korea has been collecting the voices of voice-phishing scamer from victims. In this paper, we describe an effective forensic speaker analysis method for detecting the voice from the same person compared with the large-scale speech files stored in database(DB), and apply the aforementioned forensic speaker analysis method with the collected voice-phising speech files from victims. At first, an i-vector of each speech file had been extracted from the DB, then, the cosine similarity matrix for the all speech files had been generated through the cosine distance among the extracted the i-vectors of all speech file in DB. In other words, it performed the speaker analysis as grouping a set of candidates with high common similarity among i-vectors of all speech files in DB. As a result of EER(Error Equal Rate) measurement for 6,724 speech files composed of 82 speakers, it was confirmed that the EER of the i-vector-based method is improved than that of the GMM-based method. Finally, as a result of comparing the collected 2,327 voice-phishing speech files collected by the FSS, it was shown that some of the speech files having similar voice features were grouped each other.

A Method on the Improvement of Speaker Enrolling Speed for a Multilayer Perceptron Based Speaker Verification System through Reducing Learning Data (다층신경망 기반 화자증명 시스템에서 학습 데이터 감축을 통한 화자등록속도 향상방법)

이백영;황병원;이태승
- The Journal of the Acoustical Society of Korea
- /
- v.21 no.6
- /
- pp.585-591
- /
- 2002
While the multilayer perceptron(MLP) provides several advantages against the existing pattern recognition methods, it requires relatively long time in learning. This results in prolonging speaker enrollment time with a speaker verification system that uses the MLP as a classifier. This paper proposes a method that shortens the enrollment time through adopting the cohort speakers method used in the existing parametric systems and reducing the number of background speakers required to learn the MLP, and confirms the effect of the method by showing the result of an experiment that applies the method to a continuant and MLP-based speaker verification system.
PDF KSCI

Speaker Verification for Spoken Digit Sequence by Probabilistic Neural Network (확률신경망에 의한 숫자음성열로부터의 화자확인)

Um, Ig-Tae;Kang, Kwon-Il;Kim, Moon-Hyn
- Annual Conference on Human and Language Technology
- /
- 1999.10e
- /
- pp.178-183
- /
- 1999
화자확인은 기본적으로 각 입력 음성에 대해 하나의 임계치를 기준으로 수락과 거부의 두 가지 결정을 내리나, 본 논문은 네 자리의 비밀번호를 음성으로 입력하였을 때 각 숫자음성에 대한 지역적인 결정을 두 개의 임계치를 이용하여 수락, 거부, 결정유보의 세 가지로 구분하고, 비밀번호 전체에 대한 판단 규칙을 제안하였다. 지역적 결정에 필요한 화자에 대한 신뢰척도의 측정치는 확률신경망을 통해 구하였다. 다섯 명의 화자를 대상으로 수행한 실험에서 하나의 임계치를 이용한 기존의 방식은 5.3%의 오류를 나타냈고, 본 논문에서 제안한 방식은 2.1%의 오류를 보였다.
PDF

A Study on the Mixed Model Approach and Symbol Probability Weighting Function for Maximization of Inter-Speaker Variation (화자간 변별력 최대화를 위한 혼합 모델 방식과 심볼 확률 가중함수에 관한 연구)

Chin Se-Hoon;Kang Chul-Ho
- The Journal of the Acoustical Society of Korea
- /
- v.24 no.7
- /
- pp.410-415
- /
- 2005
Recently, most of the speaker verification systems are based on the pattern recognition approach method. And performance of the pattern-classifier depends on how to classify a variety of speakers' feature parameters. In order to classify feature parameters efficiently and effectively, it is of great importance to enlarge variations between speakers and effectively measure distances between feature parameters. Therefore, this paper would suggest the positively mixed model scheme that can enlarge inter-speaker variation by searching the individual model with world model at the same time. During decision procedure, we can maximize inter-speaker variation by using the proposed mixed model scheme. We also make use of a symbol probability weighting function in this system so as to reduce vector quantization errors by measuring symbol probability derived from the distance rate of between the world codebook and individual codebook. As the result of our experiment using this method, we could halve the Detection Cost Function (DCF) of the system from $2.37\%\;to\;1.16\%$.
PDF KSCI

A Study on Noise-Robust Speaker Recognition Methods Based on Ensemble of Decision Scores (앙상블 기법을 이용한 잡음 환경에서의 화자인식 방법에 관한 연구)

Yang, Joon-Young;Chang, Joon-Hyuk
- Annual Conference of KIPS
- /
- 2018.05a
- /
- pp.457-459
- /
- 2018
화자인식 기술은 주어진 임의의 두 발화로부터 발화자의 일치 여부를 판단하여 등록된 화자의 목록으로부터 임의로 입력된 발화의 발화자를 식별하는 기술이다. 그러나, 배경잡음이나 반향이 존재하는 경우에는 음성신호가 왜곡되어 화자인식 성능이 저하될 수 있기 때문에 별도의 음성신호 전처리 알고리즘을 함께 사용할 수 있다. 본 논문에서는 배경잡음이 존재하는 환경에서 다수의 마이크로폰을 통해 수집한 음성신호에 대해 화자인식을 수행하는 방법으로써 parametric multi-channel Wiener filter (PMWF)를 이용한 화자일치 점수 앙상블 기법을 제안한다. 입력신호의 신호대잡음비를 기준으로 점수 결합 시 사용되는 결합계수를 정하고, Wiener filter 로 잡음을 제거하여 얻은 점수와 minimum variance distortionless response (MVDR) 빔포머를 통해 잡음을 제거하여 얻은 정수를 가중결합하는 방식으로 동일오류율을 측정한 결과, 각 전처리 알고리즘을 독립적으로 사용하여 점수를 계산한 경우보다 우수한 성능을 보임을 확인할 수 있었다.
https://doi.org/10.3745/PKIPS.y2018m05a.457 인용 PDF

Performance Improvement in GMM-based Text-Independent Speaker Verification System (GMM 기반의 문맥독립 화자 검증 시스템의 성능 향상)

Hahm Seong-Jun;Shen Guang-Hu;Kim Min-Jung;Kim Joo-Gon;Jung Ho-Youl;Chung Hyun-Yeol
- Proceedings of the Acoustical Society of Korea Conference
- /
- autumn
- /
- pp.131-134
- /
- 2004
본 논문에서는 GMM(Gaussian Mixture Model)을 이용한 문맥독립 화자 검증 시스템을 구현한 후, arctan 함수를 이용한 정규화 방법을 사용하여 화자검증실험을 수행하였다. 특징파라미터로서는 선형예측방법을 이용한 켑스트럼 계수와 회귀계수를 사용하고 화자의 발성 변이를 고려하여 CMN(Cepstral Mean Normalization)을 적용하였다. 화자모델 생성을 위한 학습단에서는 화자발성의 음향학적 특징을 잘 표현할 수 있는 GMM(Gaussian Mixture Model)을 이용하였고 화자 검증단에서는 ML(Maximum Likelihood)을 이용하여 유사도를 계산하고 기존의 정규화 방법과 arctan 함수를 이용한 방법에 의해 정규화된 점수(score)와 미리 정해진 문턱값과 비교하여 검증하였다. 화자 검증 실험결과, arctan 함수를 부가한 방법이 기존의 방법보다 항상 향상된 EER을 나타냄을 확인할 수 있었다.
PDF

Speaker Adaptation Using Linear Transformation Network in Speech Recognition (선형 변환망을 이용한 화자적응 음성인식)

이기희
- Journal of the Korea Society of Computer and Information
- /
- v.5 no.2
- /
- pp.90-97
- /
- 2000
This paper describes an speaker-adaptive speech recognition system which make a reliable recognition of speech signal for new speakers. In the Proposed method, an speech spectrum of new speaker is adapted to the reference speech spectrum by using Parameters of a 1st linear transformation network at the front of phoneme classification neural network. And the recognition system is based on semicontinuous HMM(hidden markov model) which use the multilayer perceptron as a fuzzy vector quantizer. The experiments on the isolated word recognition are performed to show the recognition rate of the recognition system. In the case of speaker adaptation recognition, the recognition rate show significant improvement for the unadapted recognition system.
PDF

Search Result 248, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)