• Title/Summary/Keyword: Speech Processing

Search Result 950, Processing Time 0.034 seconds

A Closed-Form Solution of Linear Spectral Transformation for Robust Speech Recognition

  • Kim, Dong-Hyun;Yook, Dong-Suk
    • ETRI Journal
    • /
    • v.31 no.4
    • /
    • pp.454-456
    • /
    • 2009
  • The maximum likelihood linear spectral transformation (ML-LST) using a numerical iteration method has been previously proposed for robust speech recognition. The numerical iteration method is not appropriate for real-time applications due to its computational complexity. In order to reduce the computational cost, the objective function of the ML-LST is approximated and a closed-form solution is proposed in this paper. It is shown experimentally that the proposed closed-form solution for the ML-LST can provide rapid speaker and environment adaptation for robust speech recognition.

Phonological Process and Word Recognition in Continuous Speech: Evidence from Coda-neutralization (음운 현상과 연속 발화에서의 단어 인지 - 종성중화 작용을 중심으로)

  • Kim, Sun-Mi;Nam, Ki-Chun
    • Phonetics and Speech Sciences
    • /
    • v.2 no.2
    • /
    • pp.17-25
    • /
    • 2010
  • This study explores whether Koreans exploit their native coda-neutralization process when recognizing words in Korean continuous speech. According to the phonological rules in Korean, coda-neutralization process must come before the liaison process, as long as the latter(i.e. liaison process) occurs between 'words', which results in liaison-consonants being coda-neutralized ones such as /b/, /d/, or /g/, rather than non-neutralized ones like /p/, /t/, /k/, /ʧ/, /ʤ/, or /s/. Consequently, if Korean listeners use their native coda-neutralization rules when processing speech input, word recognition will be hampered when non-neutralized consonants precede vowel-initial targets. Word-spotting and word-monitoring tasks were conducted in Experiment 1 and 2, respectively. In both experiments, listeners recognized words faster and more accurately when vowel-initial target words were preceded by coda-neutralized consonants than when preceded by coda non-neutralized ones. The results show that Korean listeners exploit the coda-neutralization process when processing their native spoken language.

  • PDF

Consecutive Vowel Segmentation of Korean Speech Signal using Phonetic-Acoustic Transition Pattern (음소 음향학적 변화 패턴을 이용한 한국어 음성신호의 연속 모음 분할)

  • Park, Chang-Mok;Wang, Gi-Nam
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2001.10a
    • /
    • pp.801-804
    • /
    • 2001
  • This article is concerned with automatic segmentation of two adjacent vowels for speech signals. All kinds of transition case of adjacent vowels can be characterized by spectrogram. Firstly the voiced-speech is extracted by the histogram analysis of vowel indicator which consists of wavelet low pass components. Secondly given phonetic transcription and transition pattern spectrogram, the voiced-speech portion which has consecutive vowels automatically segmented by the template matching. The cross-correlation function is adapted as a template matching method and the modified correlation coefficient is calculated for all frames. The largest value on the modified correlation coefficient series indicates the boundary of two consecutive vowel sounds. The experiment is performed for 154 vowel transition sets. The 154 spectrogram templates are gathered from 154 words(PRW Speech DB) and the 161 test words(PBW Speech DB) which are uttered by 5 speakers were tested. The experimental result shows the validity of the method.

  • PDF

BERT-Based Logits Ensemble Model for Gender Bias and Hate Speech Detection

  • Sanggeon Yun;Seungshik Kang;Hyeokman Kim
    • Journal of Information Processing Systems
    • /
    • v.19 no.5
    • /
    • pp.641-651
    • /
    • 2023
  • Malicious hate speech and gender bias comments are common in online communities, causing social problems in our society. Gender bias and hate speech detection has been investigated. However, it is difficult because there are diverse ways to express them in words. To solve this problem, we attempted to detect malicious comments in a Korean hate speech dataset constructed in 2020. We explored bidirectional encoder representations from transformers (BERT)-based deep learning models utilizing hyperparameter tuning, data sampling, and logits ensembles with a label distribution. We evaluated our model in Kaggle competitions for gender bias, general bias, and hate speech detection. For gender bias detection, an F1-score of 0.7711 was achieved using an ensemble of the Soongsil-BERT and KcELECTRA models. The general bias task included the gender bias task, and the ensemble model achieved the best F1-score of 0.7166.

Improvement of Speech Recognition System using Entropy Rejection (앤트로피 거절을 활용한 음성인식 시스템의 성능 향상)

  • 송점동
    • The Journal of Information Technology
    • /
    • v.2 no.2
    • /
    • pp.139-144
    • /
    • 1999
  • This thesis is a study on using of entropy information about the additional words in the after processing step to promote an accuracy in speech recognition system. The exsisting ratio of Woodo detective method changes the efficiency of speech recognition system according to speech data and increases the probability of producing error recognition because of similarity of value of Woodo in the additional words. But we could obtain the accurate speech recognition system which heightens discrimination becoming independent of speech data by using of after processing method refusing a candidate which entropy price is lower among words except words we could recognize than entropy Price of each additional word. As a result of this experiment when the false alarm is 20 percent, we could put out the maximum 3.6 percent efficiency of recognition system through this after processing method by entropy more than the method by ratio of Woods.

  • PDF

A Study on Objective Speech Quality Measure under CDMA Telephone Networks Environment (CDMA 통신망에서의 객관적 음질 평가 척도에 관한 연구)

  • 김광수;김민정;석수영;정호열;정현열
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.2 no.4
    • /
    • pp.53-58
    • /
    • 2001
  • In this paper to develop objective speech quality measure for CDMA telephone network environments, recent developed measures are investigated first. But those measures show low performances in CDMA telephone networks. To solve this problem, new objective speech quality measure adopting noise masking threshold is proposed and studied. To acquire better performance, scaled noise masking threshold calculation for speech signals is employed instead of conventional tone signals. To verify effectiveness of proposed method performance comparison experiments are carried out for CDMA telephone network speech databases, for the results proposed methods show improved performances compared to existing meaures.

  • PDF

A Robust Non-Speech Rejection Algorithm

  • Ahn, Young-Mok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.17 no.1E
    • /
    • pp.10-13
    • /
    • 1998
  • We propose a robust non-speech rejection algorithm using the three types of pitch-related parameters. The robust non-speech rejection algorithm utilizes three kinds of pitch parameters : (1) pitch range, (2) difference of the successive pitch range, and (3) the number of successive pitches satisfying constraints related with the previous two parameters. The acceptance rate of the speech commands was 95% for -2.8dB signal-to-noise ratio (SNR) speech database that consisted of 2440 utterances. The rejection rate of the non-speech sounds was 100% while the acceptance rate of the speech commands was 97% in an office environment.

  • PDF

Multiple Acoustic Cues for Stop Recognition

  • Yun, Weon-Hee
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.3-16
    • /
    • 2003
  • ㆍAcoustic characteristics of stops in speech with contextual variability ㆍPosibility of stop recognition by post processing technique ㆍFurther work - Speech database - Modification of decoder - automatic segmentation of acoustic parameters

  • PDF

Implementation of Korean TTS System based on Natural Language Processing (자연어 처리 기반 한국어 TTS 시스템 구현)

  • Kim Byeongchang;Lee Gary Geunbae
    • MALSORI
    • /
    • no.46
    • /
    • pp.51-64
    • /
    • 2003
  • In order to produce high quality synthesized speech, it is very important to get an accurate grapheme-to-phoneme conversion and prosody model from texts using natural language processing. Robust preprocessing for non-Korean characters should also be required. In this paper, we analyzed Korean texts using a morphological analyzer, part-of-speech tagger and syntactic chunker. We present a new grapheme-to-phoneme conversion method for Korean using a hybrid method with a phonetic pattern dictionary and CCV (consonant vowel) LTS (letter to sound) rules, for unlimited vocabulary Korean TTS. We constructed a prosody model using a probabilistic method and decision tree-based method. The probabilistic method atone usually suffers from performance degradation due to inherent data sparseness problems. So we adopted tree-based error correction to overcome these training data limitations.

  • PDF

The Research of Reducing the Fixed Codebook Search Time of G.723.1 MP-MLQ (G.733.1 MP-MLQ 고정 코드북 검색 시간 단축에 관한 연구)

  • 김정진;장경아;목진덕;배명진;홍성훈;성유나
    • Proceedings of the IEEK Conference
    • /
    • 1999.11a
    • /
    • pp.1131-1134
    • /
    • 1999
  • In general CELP type vocoders provide good speech quality around 4.8kbps. Among them, G.723.1 developed for Internet Phone and videoconferencing includes two vocoders, 5.3kbps ACELP and 6.3kbps MO-MLQ. Since 6.3kbps MP-MLQ requires large amount of computation for fixed codebook search, it is difficult to realize real time processing. In order to improve the problem this paper proposes the new method that reduces the processing time up to about 50% of codebook search time. We first decide the grid bit, then search the codebook. Grid bit is selected by comparison between synthetic speech, which is synthesized with only odd or even pulses of target vector. and DC removed original speech. As a result, we reduced the total processing time of G.723.1 MP-MLQ up to about 26.08%. In objective quality test 11.19㏈ of segSNR was obtained, and in subjective quality test there was almost no speech degradation.

  • PDF