Search | Korea Science

Noise Robust Speaker Verification Using Subband-Based Reliable Feature Selection (신뢰성 높은 서브밴드 특징벡터 선택을 이용한 잡음에 강인한 화자검증)

Kim, Sung-Tak;Ji, Mi-Kyong;Kim, Hoi-Rin
- MALSORI
- /
- no.63
- /
- pp.125-137
- /
- 2007
Recently, many techniques have been proposed to improve the noise robustness for speaker verification. In this paper, we consider the feature recombination technique in multi-band approach. In the conventional feature recombination for speaker verification, to compute the likelihoods of speaker models or universal background model, whole feature components are used. This computation method is not effective in a view point of multi-band approach. To deal with non-effectiveness of the conventional feature recombination technique, we introduce a subband likelihood computation, and propose a modified feature recombination using subband likelihoods. In decision step of speaker verification system in noise environments, a few very low likelihood scores of a speaker model or universal background model cause speaker verification system to make wrong decision. To overcome this problem, a reliable feature selection method is proposed. The low likelihood scores of unreliable feature are substituted by likelihood scores of the adaptive noise model. In here, this adaptive noise model is estimated by maximum a posteriori adaptation technique using noise features directly obtained from noisy test speech. The proposed method using subband-based reliable feature selection obtains better performance than conventional feature recombination system. The error reduction rate is more than 31 % compared with the feature recombination-based speaker verification system.
PDF

Speaker Verification with the Constraint of Limited Data

Kumari, Thyamagondlu Renukamurthy Jayanthi;Jayanna, Haradagere Siddaramaiah
- Journal of Information Processing Systems
- /
- v.14 no.4
- /
- pp.807-823
- /
- 2018
Speaker verification system performance depends on the utterance of each speaker. To verify the speaker, important information has to be captured from the utterance. Nowadays under the constraints of limited data, speaker verification has become a challenging task. The testing and training data are in terms of few seconds in limited data. The feature vectors extracted from single frame size and rate (SFSR) analysis is not sufficient for training and testing speakers in speaker verification. This leads to poor speaker modeling during training and may not provide good decision during testing. The problem is to be resolved by increasing feature vectors of training and testing data to the same duration. For that we are using multiple frame size (MFS), multiple frame rate (MFR), and multiple frame size and rate (MFSR) analysis techniques for speaker verification under limited data condition. These analysis techniques relatively extract more feature vector during training and testing and develop improved modeling and testing for limited data. To demonstrate this we have used mel-frequency cepstral coefficients (MFCC) and linear prediction cepstral coefficients (LPCC) as feature. Gaussian mixture model (GMM) and GMM-universal background model (GMM-UBM) are used for modeling the speaker. The database used is NIST-2003. The experimental results indicate that, improved performance of MFS, MFR, and MFSR analysis radically better compared with SFSR analysis. The experimental results show that LPCC based MFSR analysis perform better compared to other analysis techniques and feature extraction techniques.
https://doi.org/10.3745/JIPS.01.0030 인용 PDF KSCI

A Speaker Pruning Method for Reducing Calculation Costs of Speaker Identification System (화자식별 시스템의 계산량 감소를 위한 화자 프루닝 방법)

김민정;오세진;정호열;정현열
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.6
- /
- pp.457-462
- /
- 2003
In this paper, we propose a speaker pruning method for real-time processing and improving performance of speaker identification system based on GMM(Gaussian Mixture Model). Conventional speaker identification methods, such as ML (Maximum Likelihood), WMR(weighting Model Rank), and MWMR(Modified WMR) we that frame likelihoods are calculated using the whole frames of each input speech and all of the speaker models and then a speaker having the biggest accumulated likelihood is selected. However, in these methods, calculation cost and processing time become larger as the increase of the number of input frames and speakers. To solve this problem in the proposed method, only a part of speaker models that have higher likelihood are selected using only a part of input frames, and identified speaker is decided from evaluating the selected speaker models. In this method, fm can be applied for improving the identification performance in speaker identification even the number of speakers is changed. In several experiments, the proposed method showed a reduction of 65% on calculation cost and an increase of 2% on identification rate than conventional methods. These results means that the proposed method can be applied effectively for a real-time processing and for improvement of performance in speaker identification.
PDF KSCI

One-shot multi-speaker text-to-speech using RawNet3 speaker representation (RawNet3를 통해 추출한 화자 특성 기반 원샷 다화자 음성합성 시스템)

Sohee Han;Jisub Um;Hoirin Kim
- Phonetics and Speech Sciences
- /
- v.16 no.1
- /
- pp.67-76
- /
- 2024
Recent advances in text-to-speech (TTS) technology have significantly improved the quality of synthesized speech, reaching a level where it can closely imitate natural human speech. Especially, TTS models offering various voice characteristics and personalized speech, are widely utilized in fields such as artificial intelligence (AI) tutors, advertising, and video dubbing. Accordingly, in this paper, we propose a one-shot multi-speaker TTS system that can ensure acoustic diversity and synthesize personalized voice by generating speech using unseen target speakers' utterances. The proposed model integrates a speaker encoder into a TTS model consisting of the FastSpeech2 acoustic model and the HiFi-GAN vocoder. The speaker encoder, based on the pre-trained RawNet3, extracts speaker-specific voice features. Furthermore, the proposed approach not only includes an English one-shot multi-speaker TTS but also introduces a Korean one-shot multi-speaker TTS. We evaluate naturalness and speaker similarity of the generated speech using objective and subjective metrics. In the subjective evaluation, the proposed Korean one-shot multi-speaker TTS obtained naturalness mean opinion score (NMOS) of 3.36 and similarity MOS (SMOS) of 3.16. The objective evaluation of the proposed English and Korean one-shot multi-speaker TTS showed a prediction MOS (P-MOS) of 2.54 and 3.74, respectively. These results indicate that the performance of our proposed model is improved over the baseline models in terms of both naturalness and speaker similarity.
https://doi.org/10.13064/KSSS.2024.16.1.067 인용 PDF

Statistical Extraction of Speech Features Using Independent Component Analysis and Its Application to Speaker Identification

Jang, Gil-Jin;Oh, Yung-Hwan
- The Journal of the Acoustical Society of Korea
- /
- v.21 no.4E
- /
- pp.156-163
- /
- 2002
We apply independent component analysis (ICA) for extracting an optimal basis to the problem of finding efficient features for representing speech signals of a given speaker The speech segments are assumed to be generated by a linear combination of the basis functions, thus the distribution of speech segments of a speaker is modeled by adapting the basis functions so that each source component is statistically independent. The learned basis functions are oriented and localized in both space and frequency, bearing a resemblance to Gabor wavelets. These features are speaker dependent characteristics and to assess their efficiency we performed speaker identification experiments and compared our results with the conventional Fourier-basis. Our results show that the proposed method is more efficient than the conventional Fourier-based features in that they can obtain a higher speaker identification rate.
PDF KSCI

저음부의 출력을 위한 voice coil의 질량

이계호
- 전기의세계
- /
- v.14 no.4
- /
- pp.22-27
- /
- 1965
우리나라의 speaker제작의 역사는 얼마되지 않을 뿐더러 자급자족의 영역을 못 벗어나고 있다. Weston speaker, 자사내에서 자급자족하고 있는 금성사제 speaker등 완전히 질량의 단계에 못들어가고 있는 실정이다. 그나마 성능면에서도 뒤떨어짐을 면할 수 없는 실정이다 .부속품 예컨데 자철 coil접착제등 거이 매매회사에서 직수입하고 있는 형편인것 같다. 물론 종합공업이 발달해야만 되겠기에 단편적 발달은 기하기 어려울 줄 안다. 그러나 speak의 조립 공정에서 끝마치고 있는 형편이고 보면 한심할 지경이다. 물론 speaker제작이라고 하는 것은 극히 delicacy한 면을 가지고 있다. 즉 효율은 $B^{2}$에 비례하는것은 당연하지만 효율을 높이고저 공기부하의 저항을 크게 할려면 corn의 입경이 크게 되므로 자연적으로 진동계의 질량이 크게되며 또 voice coil의 저항을 적게하자니 coil가 필연적으로 굵게되어 길이가 짧게 될 뿐만아니라 air gap 이 크게되어 자속밀도 B가 적게 되는 상반관계가 있다. Cabinet, baffle, corn, corrugation, speaker system등을 가장 합리적으로 설계할 필요가 있다. 이 실험결과를 보고함으로서 speaker연구에 대한 욕구를 갖고 의욕적인 계기가 되었으면 한다. Voice coil 의 turn수를 바뀌었을때 impedance을 측정하였으며 turn수와 음압level 및 주파수 관계를 알아보았다. 그리하여 중형 speaker의 주파수특성을 알아 설계에 기여하고저 한다.
PDF

Confidence Measure of Forensic Speaker Identification System According to Pitch Variances (과학수사용 화자 식별 시스템의 피치 차이에 따른 신뢰성 척도)

Kim, Min-Seok;Kim, Kyung-Wha;Yang, IL-Ho;Yu, Ha-Jin
- Phonetics and Speech Sciences
- /
- v.2 no.3
- /
- pp.135-139
- /
- 2010
Forensic speaker identification needs high accuracy and reliability. However, the current level of speaker identification does not reach its demand. Therefore, the confidence evaluation of results is one of the issues in forensic speaker identification. In this paper, we propose a new confidence measure of forensic speaker identification system. This is based on pitch differences between the registered utterances of the identified speaker and the test utterance. In the experiments, we evaluate this confidence measure by speech identification tasks on various environments. As the results, the proposed measure can be a good measure indicating if the result is reliable or not.
PDF

A Study on Text Choice for Web-Based Speaker Verification System (웹 기반의 화자확인시스템을 위한 문장선정에 관한 연구)

안기모;이재희;강철호
- The Journal of the Acoustical Society of Korea
- /
- v.19 no.6
- /
- pp.34-40
- /
- 2000
In text-dependent speaker verification system, which text choice for speaker to utter is very important factor for performance improvement. In this paper, building a consonant mixture system using classification method of korean phonetic value is proposed. When it is applied to the web-based speaker verification system, it can cope with abrupt change of speaker's voice information and have the optimal performance in speaker verification system.
PDF

Large Scale Voice Dialling using Speaker Adaptation (화자 적응을 이용한 대용량 음성 다이얼링)

Kim, Weon-Goo
- Journal of Institute of Control, Robotics and Systems
- /
- v.16 no.4
- /
- pp.335-338
- /
- 2010
A new method that improves the performance of large scale voice dialling system is presented using speaker adaptation. Since SI (Speaker Independent) based speech recognition system with phoneme HMM uses only the phoneme string of the input sentence, the storage space could be reduced greatly. However, the performance of the system is worse than that of the speaker dependent system due to the mismatch between the input utterance and the SI models. A new method that estimates the phonetic string and adaptation vectors iteratively is presented to reduce the mismatch between the training utterances and a set of SI models using speaker adaptation techniques. For speaker adaptation the stochastic matching methods are used to estimate the adaptation vectors. The experiments performed over actual telephone line shows that proposed method shows better performance as compared to the conventional method. with the SI phonetic recognizer.
https://doi.org/10.5302/J.ICROS.2010.16.4.335 인용 PDF KSCI

Rapid Speaker Adaptation for Continuous Speech Recognition Using Merging Eigenvoices (Eigenvoice 병합을 이용한 연속 음성 인식 시스템의 고속 화자 적응)

Choi, Dong-Jin;Oh, Yung-Hwan
- MALSORI
- /
- no.53
- /
- pp.143-156
- /
- 2005
Speaker adaptation in eigenvoice space is a popular method for rapid speaker adaptation. To improve the performance of the method, the number of speaker dependent models should be increased and eigenvoices should be re-estimated. However, principal component analysis takes much time to find eigenvoices, especially in a continuous speech recognition system. This paper describes a method to reduce computation time to estimate eigenvoices only for supplementary speaker dependent models and to merge them with the used eigenvoices. Experiment results show that the computation time is reduced by 73.7% while the performance is almost the same in case that the number of speaker dependent models is the same as used ones.
PDF

Search Result 1,678, Processing Time 0.064 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)