• Title/Summary/Keyword: speaker variability

Search Result 33, Processing Time 0.023 seconds

F-ratio of Speaker Variability in Emotional Speech

  • Yi, So-Pae
    • Speech Sciences
    • /
    • v.15 no.1
    • /
    • pp.63-72
    • /
    • 2008
  • Various acoustic features were extracted and analyzed to estimate the inter- and intra-speaker variability of emotional speech. Tokens of vowel /a/ from sentences spoken with different modes of emotion (sadness, neutral, happiness, fear and anger) were analyzed. All of the acoustic features (fundamental frequency, spectral slope, HNR, H1-A1 and formant frequency) indicated greater contribution to inter- than intra-speaker variability across all emotions. Each acoustic feature of speech signal showed a different degree of contribution to speaker discrimination in different emotional modes. Sadness and neutral indicated greater speaker discrimination than other emotional modes (happiness, fear, anger in descending order of F-ratio). In other words, the speaker specificity was better represented in sadness and neutral than in happiness, fear and anger with any of the acoustic features.

  • PDF

Inter-speaker and intra-speaker variability on sound change in contemporary Korean

  • Kim, Mi-Ryoung
    • Phonetics and Speech Sciences
    • /
    • v.9 no.3
    • /
    • pp.25-32
    • /
    • 2017
  • Besides their effect on the f0 contour of the following vowel, Korean stops are undergoing a sound change in which a partial or complete consonantal merger on voice onset time (VOT) is taking place between aspirated and lax stops. Many previous studies on sound change have mainly focused on group-normative effects, that is, effects that are representative of the population as a whole. Few systematic quantitative studies of change in adult individuals have been carried out. The current study examines whether the sound change holds for individual speakers. It focuses on inter-speaker and intra-speaker variability on sound change in contemporary Korean. Speech data were collected for thirteen Seoul Korean speakers studying abroad in America. In order to minimize the possible effects of speech production, socio-phonetic factors such as age, gender, dialect, speech rate, and L2 exposure period were controlled when recruiting participants. The results showed that, for nine out of thirteen speakers, the consonantal merger is taking place between the aspirated and lax stop in terms of VOT. There were also intra-speaker variations on the merger in three aspects: First, is the consonantal (VOT) merger between the two stops is in progress or not? Second, are VOTs for aspirated stops getting shorter or not (i.e., the aspirated-shortening process)? Third, are VOTs for lax stops getting longer or not (i.e., the lax-lengthening process)? The results of remarkable inter-speaker and intra-speaker variability indicate a synchronous speech sound change of the stop system in contemporary Korean. Some speakers are early adopters or active propagators of sound change whereas others are not. Further study is necessary to see whether the inter-speaker differences exceed intra-speaker differences in sound change.

Analysis of the Voice Quality in Emotional Speech Using Acoustical Parameters (음향 파라미터에 의한 정서적 음성의 음질 분석)

  • Jo, Cheol-Woo;Li, Tao
    • MALSORI
    • /
    • v.55
    • /
    • pp.119-130
    • /
    • 2005
  • The aim of this paper is to investigate some acoustical characteristics of the voice quality features from the emotional speech database. Six different parameters are measured and compared for 6 different emotions (normal, happiness, sadness, fear, anger, boredom) and from 6 different speakers. Inter-speaker variability and intra-speaker variability are measured. Some intra-speaker consistency of the parameter change across the emotions are observed, but inter-speaker consistency are not observed.

  • PDF

A Study on Adaptive Model Updating and a Priori Threshold Decision for Speaker Verification System (화자 확인 시스템을 위한 적응적 모델 갱신과 사전 문턱치 결정에 관한 연구)

  • 진세훈;이재희;강철호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.5
    • /
    • pp.20-26
    • /
    • 2000
  • In speaker verification system the HMM(hidden Markov model) parameter updating using small amount of data and the priori threshold decision are crucial factor for dealing with long-term variability in people voices. In the paper we present the speaker model updating technique which can be adaptable to the session-to-intra speaker variability and the priori threshold determining technique. The proposed technique decreases verification error rates which the session-to-session intra-speaker variability can bring by adapting new speech data to speaker model parameter through Baum Welch re-estimation. And in this study the proposed priori threshold determining technique is decided by a hybrid score measurement which combines the world model based technique and the cohen model based technique together. The results show that the proposed technique can lead a better performance and the difference of performance is small between the posteriori threshold decision based approach and the proposed priori threshold decision based approach.

  • PDF

A Robust Method for Speech Replay Attack Detection

  • Lin, Lang;Wang, Rangding;Yan, Diqun;Dong, Li
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.1
    • /
    • pp.168-182
    • /
    • 2020
  • Spoofing attacks, especially replay attacks, pose great security challenges to automatic speaker verification (ASV) systems. Current works on replay attacks detection primarily focused on either developing new features or improving classifier performance, ignoring the effects of feature variability, e.g., the channel variability. In this paper, we first establish a mathematical model for replay speech and introduce a method for eliminating the negative interference of the channel. Then a novel feature is proposed to detect the replay attacks. To further boost the detection performance, four post-processing methods using normalization techniques are investigated. We evaluate our proposed method on the ASVspoof 2017 dataset. The experimental results show that our approach outperforms the competing methods in terms of detection accuracy. More interestingly, we find that the proposed normalization strategy could also improve the performance of the existing algorithms.

An EMG Study of the Feature 'Tensity'

  • Kim, Dae-Won
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.5 no.1
    • /
    • pp.22-28
    • /
    • 1994
  • Previous studies reveal that in English there is no EMG evidence fur the feature tense-lax distinction. The technique of electro-myography(EMG) was used to see if the existing claim holds true, particularly in unstressed syllable. It was found that in unstressed syllable, the peak EMG amplitude from the orbicularis oris superior muscle was significantly greater in /p/ than in /b/, while in stressed syllable this difference was negligible. It was hypothesized that in stressed syllable, /p/ and /b/ may be differentiated by the EMG activities from a muscle other than the orbicularis oris superior muscle, e.g. the respiratory muscles relating to 'aspiration' or depressor anguli oris muscle. In Korean, there was a clear labial gestures for the feature tense-lax distinction. The phoneme-sensitive manifestation of stress and some possible reasons for the inter-speaker variability in the data and the variability within a given speaker were discussed.

  • PDF

Changes in Features of Korean Vowels with Age and Sex of Speakers and Their Recognition (한국어 단모음의 성별, 연령별 특징변화 및 인식)

  • 이용주;김경태;차균현
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.25 no.12
    • /
    • pp.1503-1512
    • /
    • 1988
  • As the basic analysis to solve the within-and cross-speaker variability in phoneme based speech recognition, changes in pitch and formant frequencies of 8 Korean vowels with age and sex of speaker has been investigated by analyzing a large number fo samples. Conclusions obtained are as follows: 1) Changes in pitch frequency with age and sex of speaker for children are hard to distinguish and the difference of before and after the voice change is analyzed approximately 0.2 oct. for female an 0.9 oct. for male. 2) While most of the formants of vowel considerably change with the age of speaker, the change becomes smaller as the age becomes older. 3) While there is an indirect correlation between pitch and formant with change in age, it is hard to see a direct correlation. 4) When the objects of the recognition experiment by pitch and formants are various speakers in each age and sex, pitch also works as an efficient recognition parameter.

  • PDF

An Improvement of Korean Speech Recognition Using a Compensation of the Speaking Rate by the Ratio of a Vowel length (모음길이 비율에 따른 발화속도 보상을 이용한 한국어 음성인식 성능향상)

  • 박준배;김태준;최성용;이정현
    • Proceedings of the IEEK Conference
    • /
    • 2003.11b
    • /
    • pp.195-198
    • /
    • 2003
  • The accuracy of automatic speech recognition system depends on the presence of background noise and speaker variability such as sex, intonation of speech, and speaking rate. Specially, the speaking rate of both inter-speaker and intra-speaker is a serious cause of mis-recognition. In this paper, we propose the compensation method of the speaking rate by the ratio of each vowel's length in a phrase. First the number of feature vectors in a phrase is estimated by the information of speaking rate. Second, the estimated number of feature vectors is assigned to each syllable of the phrase according to the ratio of its vowel length. Finally, the process of feature vector extraction is operated by the number that assigned to each syllable in the phrase. As a result the accuracy of automatic speech recognition was improved using the proposed compensation method of the speaking rate.

  • PDF

Phonological Status of Korean /w/: Based on the Perception Test

  • Kang, Hyun-Sook
    • Phonetics and Speech Sciences
    • /
    • v.4 no.3
    • /
    • pp.13-23
    • /
    • 2012
  • The sound /w/ has been traditionally regarded as an independent segment in Korean regardless of the phonological contexts in which it occurs. There have been, however, some questions regarding whether it is an independent phoneme in /CwV/ context (cf. Kang 2006). The present pilot study examined how Korean /w/ is realized in $/S^*wV/$ context by performing some perception tests. Our assumption was that if Korean /w/ is a part of the preceding complex consonant like $/C^w/$, it should be more or less uniformly articulated and perceived as such. If /w/ is an independent segment, it will be realized with speaker variability. Experiments I and II examined the identification rates as "labialized" of the spliced original stimuli of $/S^*-V/$ and $/S^{w*}-^wV/$, and the cross-spliced stimuli $/S^{w*}-V/$ and $/S^*-^wV/$. The results showed that round qualities of /w/ are perceived at significantly different temporal point with speaker and context variability. We therefore conclude that /w/ in $/S^*wV/$ context is an independent segment, not a part of the preceding segment. Full-scale examination of the production test in the future should be performed to verify the conclusion we suggested in this paper.

Speaker verification with ECAPA-TDNN trained on new dataset combined with Voxceleb and Korean (Voxceleb과 한국어를 결합한 새로운 데이터셋으로 학습된 ECAPA-TDNN을 활용한 화자 검증)

  • Keumjae Yoon;Soyoung Park
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.2
    • /
    • pp.209-224
    • /
    • 2024
  • Speaker verification is becoming popular as a method of non-face-to-face identity authentication. It involves determining whether two voice data belong to the same speaker. In cases where the criminal's voice remains at the crime scene, it is vital to establish a speaker verification system that can accurately compare the two voice evidence. In this study, to achieve this, a new speaker verification system was built using a deep learning model for Korean language. High-dimensional voice data with a high variability like background noise made it necessary to use deep learning-based methods for speaker matching. To construct the matching algorithm, the ECAPA-TDNN model, known as the most famous deep learning system for speaker verification, was selected. A large dataset of the voice data, Voxceleb, collected from people of various nationalities without Korean. To study the appropriate form of datasets necessary for learning the Korean language, experiments were carried out to find out how Korean voice data affects the matching performance. The results showed that when comparing models learned only with Voxceleb and models learned with datasets combining Voxceleb and Korean datasets to maximize language and speaker diversity, the performance of learning data, including Korean, is improved for all test sets.