• Title/Summary/Keyword: generic speaker models

Search Result 2, Processing Time 0.016 seconds

Sequential Speaker Classification Using Quantized Generic Speaker Models (양자화 된 범용 화자모델을 이용한 연속적 화자분류)

  • Kwon, Soon-Il
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.44 no.1
    • /
    • pp.26-32
    • /
    • 2007
  • In sequential speaker classification, the lack of prior information about the speakers poses a challenge for model initialization. To address the challenge, a predetermined generic model set, called Sample Speaker Models, was previously proposed. This approach can be useful for accurate speaker modeling without requiring initial speaker data. However, an optimal method for sampling the models from a generic model pool is still required. To solve this problem, the Speaker Quantization method, motivated by vector quantization, is proposed. Experimental results showed that the new approach outperformed the random sampling approach with 25% relative improvement in error rate on switchboard telephone conversations.

Speaker Tracking Using Eigendecomposition and an Index Tree of Reference Models

  • Moattar, Mohammad Hossein;Homayounpour, Mohammad Mehdi
    • ETRI Journal
    • /
    • v.33 no.5
    • /
    • pp.741-751
    • /
    • 2011
  • This paper focuses on online speaker tracking for telephone conversations and broadcast news. Since the online applicability imposes some limitations on the tracking strategy, such as data insufficiency, a reliable approach should be applied to compensate for this shortage. In this framework, a set of reference speaker models are used as side information to facilitate online tracking. To improve the indexing accuracy, adaptation approaches in eigenvoice decomposition space are proposed in this paper. We believe that the eigenvoice adaptation techniques would help to embed the speaker space in the models and hence enrich the generality of the selected speaker models. Also, an index structure of the reference models is proposed to speed up the search in the model space. The proposed framework is evaluated on 2002 Rich Transcription Broadcast News and Conversational Telephone Speech corpus as well as a synthetic dataset. The indexing errors of the proposed framework on telephone conversations, broadcast news, and synthetic dataset are 8.77%, 9.36%, and 12.4%, respectively. Using the index tree structure approach, the run time of the proposed framework is improved by 22%.