• Title/Summary/Keyword: speaker overlapping

Search Result 7, Processing Time 0.02 seconds

Proposal of speaker change detection system considering speaker overlap (화자 겹침을 고려한 화자 전환 검출 시스템 제안)

  • Park, Jisu;Yun, Young-Sun;Cha, Shin;Park, Jeon Gue
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.5
    • /
    • pp.466-472
    • /
    • 2021
  • Speaker Change Detection (SCD) refers to finding the moment when the main speaker changes from one person to the next in a speech conversation. In speaker change detection, difficulties arise due to overlapping speakers, inaccuracy in the information labeling, and data imbalance. To solve these problems, TIMIT corpus widely used in speech recognition have been concatenated artificially to obtain a sufficient amount of training data, and the detection of changing speaker has performed after identifying overlapping speakers. In this paper, we propose an speaker change detection system that considers the speaker overlapping. We evaluated and verified the performance using various approaches. As a result, a detection system similar to the X-Vector structure was proposed to remove the speaker overlapping region, while the Bi-LSTM method was selected to model the speaker change system. The experimental results show a relative performance improvement of 4.6 % and 13.8 % respectively, compared to the baseline system. Additionally, we determined that a robust speaker change detection system can be built by conducting related studies based on the experimental results, taking into consideration text and speaker information.

Fast Sequential Probability Ratio Test Method to Obtain Consistent Results in Speaker Verification (화자확인에서 일정한 결과를 얻기 위한 빠른 순시 확률비 테스트 방법)

  • Kim, Eun-Young;Seo, Chang-Woo;Jeon, Sung-Chae
    • Phonetics and Speech Sciences
    • /
    • v.2 no.2
    • /
    • pp.63-68
    • /
    • 2010
  • A new version of sequential probability ratio test (SPRT) which has been investigated in utterance-length control is proposed to obtain uniform response results in speaker verification (SV). Although SPRTs can obtain fast responses in SV tests, differences in the performance may occur depending on the compositions of consonants and vowels in the sentences used. In this paper, a fast sequential probability ratio test (FSPRT) method that shows consistent performances at all times regardless of the compositions of vocalized sentences for SV will be proposed. In generating frames, the FSPRT will first conduct SV test processes with only generated frames without any overlapping and if the results do not satisfy discrimination criteria, the FSPRT will sequentially use frames applied with overlapping. With the progress of processes as such, the test will not be affected by the compositions of sentences for SV and thus fast response outcomes and even consistent performances can be obtained. Experimental results show that the FSPRT has better performance to the SPRT method while requiring less complexity with equal error rates (EER).

  • PDF

Framework Switching of Speaker Overlap Detection System (화자 겹침 검출 시스템의 프레임워크 전환 연구)

  • Kim, Hoinam;Park, Jisu;Cha, Shin;Son, Kyung A;Yun, Young-Sun;Park, Jeon Gue
    • Journal of Software Assessment and Valuation
    • /
    • v.17 no.1
    • /
    • pp.101-113
    • /
    • 2021
  • In this paper, we introduce a speaker overlap system and look at the process of converting the existed system on the specific framework of artificial intelligence. Speaker overlap is when two or more speakers speak at the same time during a conversation, and can lead to performance degradation in the fields of speech recognition or speaker recognition, and a lot of research is being conducted because it can prevent performance degradation. Recently, as application of artificial intelligence is increasing, there is a demand for switching between artificial intelligence frameworks. However, when switching frameworks, performance degradation is observed due to the unique characteristics of each framework, making it difficult to switch frameworks. In this paper, the process of converting the speaker overlap detection system based on the Keras framework to the pytorch-based system is explained and considers components. As a result of the framework switching, the pytorch-based system showed better performance than the existing Keras-based speaker overlap detection system, so it can be said that it is valuable as a fundamental study on systematic framework conversion.

A Phonetic Investigation of Korean Monophthongs in the Early Twentieth Century (20세기 초 한국어 단모음의 음향음성학적 연구)

  • Han, Jeong-Im;Kim, Joo-Yeon
    • Phonetics and Speech Sciences
    • /
    • v.6 no.1
    • /
    • pp.31-38
    • /
    • 2014
  • The current study presents an instrumental phonetic analysis of Korean monophthong vowels in the early twentieth century Seoul Korean, based on audio recordings of elementary school textbooks Botonghakgyo Joseoneodokbon (Korean Reading Textbook for Elementary School). The data examined in this study were a list of the Korean mono syllables (Banjeol), and a short passage, recorded by one 41-year-old male speaker in 1935, as well as a short passage recorded by one 11-year-old male speaker in 1935. The Korean monophthongs were examined in terms of acoustic analysis of the vowel formants (F1, F2) and compared to those recorded by 18 male speakers of Seoul Korean in 2013. The results show that in 1935, 1) /e/ and /ɛ/ were clearly separated in the vowel space; 2) /o/ and /u/ were also clearly separated without any overlapping values; 3) some tokens of /y/ and /ø/ were produced as monophthongs, not as diphthongs. Based on the results, we can observe the historical change of the Korean vowels over 80-90 years such as 1) /e/ and /ɛ/ have been merged; and 2) /o/ has been raised and overlapped with /u/.

The Correlation of VOT and f0 In the Perception of Korean Obstruents (한국어 장애음 지각에서의 VOT와 F0의 상관 관계)

  • Kim Midam
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.163-167
    • /
    • 2003
  • The present thesis examines the correlation of VOT and F0 in the three-way distinction of Korean obstruents, conducting production and perception tests. In the production test, one female native speaker of Korean with a Seoul dialect (the author) recorded 15 repetitions of a monosyllabic word list including /ka, kha, k*a, pa, pha, p*a, ta, tha, t*a, ca, cha, c*a/ in random order, VOT and F0 of the following vowels were measured, and the result was significant for the three-way distinction with a strong correlation between VOT and F0, and also in the VOT-F0 plot, no overlapping among the domains was observed. As for the perception test, I manipulated the data recorded in the production test, heightening or lowering their F0 values. In all, 14 subjects (seven males and seven females) participated in the identification test. The result was as follows: the fortis stimuli were not influenced by F0 changes, and the VOT and F0 values at the lenis-aspirated boundary were negatively correlated. From these results I concluded the following: 1) VOT and F0 can distinguish the three domains of Korean obstruents without overlapping; 2) the fortis perception does not need F0 as its acoustic cue; and 3) VOT and F0 in the distinction between the lenis and aspirated are in the phonetic trading relation[2].

  • PDF

A Study on the Speech Recognition for DDD Area - Name Using Vector Quantization with Time Information (시간 정보와 VQ를 이용한 DDD 지역명 인식에 관한 연구)

  • LEE S. K.;LEE K. S.;ANN T. O.;CHO H. J.;BYON Y. C.;KIM S. H.
    • The Journal of the Acoustical Society of Korea
    • /
    • v.8 no.5
    • /
    • pp.102-112
    • /
    • 1989
  • In this paper, we proposed the study on speaker-independent isolated word recognition for DDD area-name using vector quantization and chose total 146 DDD area-name to recognize words for application of dialing system. We made the codebook using 12th LPC cepstrum coefficients and used the minsum and the minimax method to find the centroid and we applied 3 splitting rule to a codebook generation. The single section and the multi section with time information were used to generate the codebooks and the over-lapped section codebook was used, too. From the experiment result, we proved that the minsum method was better than the minimax method and the evaluation of the system yielded an accuracy of about 90 percents In case of speaker-independent.

  • PDF

Korean Word Recognition using the Transition Matrix of VQ-Code and DHMM (VQ코드의 천이 행렬과 이산 HMM을 이용한 한국어 단어인식)

  • Chung, Kwang-Woo;Hong, Kwang-Seok;Park, Byung-Chul
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.4
    • /
    • pp.40-49
    • /
    • 1994
  • In this paper, we propose methods for improving the performance of word recognition system. The ray stratey of the first method is to apply the inertia to the feature vector sequences of speech signal to stabilize the transitions between VQ cdoes. The second method is generating the new observation probabilities using the transition matrix of VQ codes as weights at the observation probability of the output symbol, so as to take into account the time relation between neighboring frames in DHMM. By applying the inertia to the feature vector sequences, we can reduce the overlapping of probability distribution of the response paths for each word and stabilize state transitions in the HMM. By using the transition matrix of VQ codes as weights in conventional DHMM. we can divide the probability distribution of feature vectors more and more, and restrict the feature distribution to a suitable region so that the performance of recognition system can improve. To evaluate the performance of the proposed methods, we carried out experiments for 50 DDD area names. As a result, the proposed methods improved the recognition rate by $4.2\%$ in the speaker-dependent test and $12.45\%$ in the speaker-independent test, respectively, compared with the conventional DHMM.

  • PDF