• Title/Summary/Keyword: Background Speaker

Search Result 70, Processing Time 0.022 seconds

A Method on the Improvement of Speaker Enrolling Speed for a Multilayer Perceptron Based Speaker Verification System through Reducing Learning Data (다층신경망 기반 화자증명 시스템에서 학습 데이터 감축을 통한 화자등록속도 향상방법)

  • 이백영;황병원;이태승
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.6
    • /
    • pp.585-591
    • /
    • 2002
  • While the multilayer perceptron(MLP) provides several advantages against the existing pattern recognition methods, it requires relatively long time in learning. This results in prolonging speaker enrollment time with a speaker verification system that uses the MLP as a classifier. This paper proposes a method that shortens the enrollment time through adopting the cohort speakers method used in the existing parametric systems and reducing the number of background speakers required to learn the MLP, and confirms the effect of the method by showing the result of an experiment that applies the method to a continuant and MLP-based speaker verification system.

Semantic and pragmatic aspects of the delimiter to (한정사 '도'의 의미-화용론)

  • Kim, Yong-Beom
    • Language and Information
    • /
    • v.3 no.2
    • /
    • pp.85-96
    • /
    • 1999
  • This paper deals with questions involving the polysemous meanings of Korean delimiter to, which include existence of a sister item, polar values, emphasis, reciprocality, and concession among others. In this paper it is argued that the basic meaning of to is the implication of a sister proposition and that various other meanings can be pragmatically derived from the basic meaning. The pragmatic notion of emphasis is defined formally and it is shown that various meanings of to can be accounted for by investigating how the speaker exploits the background knowledge which the speaker and the listener share in a speech context. According to what type of the context is made use of by the speaker, the various polysemous meanings are analyzed as involving either a simple implicature or a series of implicatures, i.e., scalar implicatures, so the various meanings of the delimiter can be attributable to the different attunement of the speaker to different kinds of context.

  • PDF

A study on User Experience of Artificial Intelligence speaker (인공지능 스피커(AI speaker) 사례 분석을 통한 고찰)

  • Jo, Gyu-Eun;Kim, Seung-In
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.8
    • /
    • pp.127-133
    • /
    • 2018
  • The purpose of this study is to analyze the technology trend of artificial intelligent speaker(AI speaker) and to suggest direction of domestic AI speaker through the case study of AI speaker. As a research method, technical background was studied through literature, and then, case of AI speaker was investigated. As a result, It attempts to extend it to the visual interface. One of these attempts is attention to the built-in screen AI speaker. AI speakers should be a platform for humans and computers to interact with, not just convenience facilities. Based on the implications presented in this study, we hope to be able to use it as a reference for predicting the service development direction of domestic artificial intelligent speakers in the future.

A study on speech disentanglement framework based on adversarial learning for speaker recognition (화자 인식을 위한 적대학습 기반 음성 분리 프레임워크에 대한 연구)

  • Kwon, Yoohwan;Chung, Soo-Whan;Kang, Hong-Goo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.5
    • /
    • pp.447-453
    • /
    • 2020
  • In this paper, we propose a system to extract effective speaker representations from a speech signal using a deep learning method. Based on the fact that speech signal contains identity unrelated information such as text content, emotion, background noise, and so on, we perform a training such that the extracted features only represent speaker-related information but do not represent speaker-unrelated information. Specifically, we propose an auto-encoder based disentanglement method that outputs both speaker-related and speaker-unrelated embeddings using effective loss functions. To further improve the reconstruction performance in the decoding process, we also introduce a discriminator popularly used in Generative Adversarial Network (GAN) structure. Since improving the decoding capability is helpful for preserving speaker information and disentanglement, it results in the improvement of speaker verification performance. Experimental results demonstrate the effectiveness of our proposed method by improving Equal Error Rate (EER) on benchmark dataset, Voxceleb1.

Fast Speaker Identification Using a Universal Background Model Clustering Method (Universal Background Model 클러스터링 방법을 이용한 고속 화자식별)

  • Park, Jumin;Suh, Youngjoo;Kim, Hoirin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.33 no.3
    • /
    • pp.216-224
    • /
    • 2014
  • In this paper, we propose a new method to drastically reduce computational complexity in Gaussian Mixture Model (GMM)-based Speaker Identification (SI). Generally, GMM-based SI systems have very high computational complexity proportional to the length of the test utterance, the number of enrolled speakers, and the GMM size. These make the SI systems difficult to be used in various real applications in spite of their broad applicability. Thus, a trade-off between computational complexity and identification accuracy is considered as a primary issue for practical applications. In order to reduce computational complexity sharply with a little loss of accuracy, we introduce a method based on the Universal Background Model (UBM) clustering approach and then we show that it can be used successfully in real-time applications. In experiments with the proposed algorithm, we obtained a speed-up factor of 6 with a negligible loss of accuracy.

Scream Sound Detection Based on Universal Background Model Under Various Sound Environments (다양한 소리 환경에서 UBM 기반의 비명 소리 검출)

  • Chung, Yong-Joo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.12 no.3
    • /
    • pp.485-492
    • /
    • 2017
  • GMM has been one of the most popular methods for scream sound detection. In the conventional GMM, the whole training data is divided into scream sound and non-scream sound, and the GMM is trained for each of them in the training process. Motivated by the idea that the process of scream sound detection is very similar to that of speaker recognition, the UBM which has been used quite successfully in speaker recognition, is proposed for use in scream sound detection in this study. We could find that UBM shows better performance than the traditional GMM from the experimental results.

Noise Rabust Speaker Verification Using Sub-Band Weighting (서브밴드 가중치를 이용한 잡음에 강인한 화자검증)

  • Kim, Sung-Tak;Ji, Mi-Kyong;Kim, Hoi-Rin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.3
    • /
    • pp.279-284
    • /
    • 2009
  • Speaker verification determines whether the claimed speaker is accepted based on the score of the test utterance. In recent years, methods based on Gaussian mixture models and universal background model have been the dominant approaches for text-independent speaker verification. These speaker verification systems based on these methods provide very good performance under laboratory conditions. However, in real situations, the performance of speaker verification system is degraded dramatically. For overcoming this performance degradation, the feature recombination method was proposed, but this method had a drawback that whole sub-band feature vectors are used to compute the likelihood scores. To deal with this drawback, a modified feature recombination method which can use each sub-band likelihood score independently was proposed in our previous research. In this paper, we propose a sub-band weighting method based on sub-band signal-to-noise ratio which is combined with previously proposed modified feature recombination. This proposed method reduces errors by 28% compared with the conventional feature recombination method.

Noise-Robust Speaker Recognition Using Subband Likelihoods and Reliable-Feature Selection

  • Kim, Sung-Tak;Ji, Mi-Kyong;Kim, Hoi-Rin
    • ETRI Journal
    • /
    • v.30 no.1
    • /
    • pp.89-100
    • /
    • 2008
  • We consider the feature recombination technique in a multiband approach to speaker identification and verification. To overcome the ineffectiveness of conventional feature recombination in broadband noisy environments, we propose a new subband feature recombination which uses subband likelihoods and a subband reliable-feature selection technique with an adaptive noise model. In the decision step of speaker recognition, a few very low unreliable feature likelihood scores can cause a speaker recognition system to make an incorrect decision. To overcome this problem, reliable-feature selection adjusts the likelihood scores of an unreliable feature by comparison with those of an adaptive noise model, which is estimated by the maximum a posteriori adaptation technique using noise features directly obtained from noisy test speech. To evaluate the effectiveness of the proposed methods in noisy environments, we use the TIMIT database and the NTIMIT database, which is the corresponding telephone version of TIMIT database. The proposed subband feature recombination with subband reliable-feature selection achieves better performance than the conventional feature recombination system with reliable-feature selection.

  • PDF

A Research on the Vibration Characteristics of Vehicle due to Speaker Sound at Low Frequency (저주파 스피커 출력음 대비 차량 진동 특성 연구)

  • Kim, Ki-Chang;Kim, Chan-Mook
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2007.05a
    • /
    • pp.909-917
    • /
    • 2007
  • Recently the trend of automobile industry is that IQS evaluation index against a sensitivity quality is increasing. To reduce rattle noise due to speaker sound at low frequencies, it is required the advanced investigation of a package tray panel and a door module panel. This paper optimized the design parameters of package tray panel according to the theoretical background about robust design and suggested the design guideline for resonance avoidance and the reduction of vibrational sensitivity considering the excitation frequency of woofer speaker. In addition, it is suggested the design guideline of a door module panel through the sensitivity analysis in case of the speaker excitation. Finally, the design factor analysis of the quality deviation of a mother-car will make it possible to guarantee the stable characteristics of vehicle vibration in the early stage of vehicle development. These improvements can lead to shortening the time needed to develop better vehicles.

  • PDF

Evaluation of Frequency Warping Based Features and Spectro-Temporal Features for Speaker Recognition (화자인식을 위한 주파수 워핑 기반 특징 및 주파수-시간 특징 평가)

  • Choi, Young Ho;Ban, Sung Min;Kim, Kyung-Wha;Kim, Hyung Soon
    • Phonetics and Speech Sciences
    • /
    • v.7 no.1
    • /
    • pp.3-10
    • /
    • 2015
  • In this paper, different frequency scales in cepstral feature extraction are evaluated for the text-independent speaker recognition. To this end, mel-frequency cepstral coefficients (MFCCs), linear frequency cepstral coefficients (LFCCs), and bilinear warped frequency cepstral coefficients (BWFCCs) are applied to the speaker recognition experiment. In addition, the spectro-temporal features extracted by the cepstral-time matrix (CTM) are examined as an alternative to the delta and delta-delta features. Experiments on the NIST speaker recognition evaluation (SRE) 2004 task are carried out using the Gaussian mixture model-universal background model (GMM-UBM) method and the joint factor analysis (JFA) method, both based on the ALIZE 3.0 toolkit. Experimental results using both the methods show that BWFCC with appropriate warping factor yields better performance than MFCC and LFCC. It is also shown that the feature set including the spectro-temporal information based on the CTM outperforms the conventional feature set including the delta and delta-delta features.