• 제목/요약/키워드: Speech Processing

검색결과 959건 처리시간 0.029초

A New Pruning Method for Synthesis Database Reduction Using Weighted Vector Quantization

  • Kim, Sanghun;Lee, Youngjik;Keikichi Hirose
    • The Journal of the Acoustical Society of Korea
    • /
    • 제20권4E호
    • /
    • pp.31-38
    • /
    • 2001
  • A large-scale synthesis database for a unit selection based synthesis method usually retains redundant synthesis unit instances, which are useless to the synthetic speech quality. In this paper, to eliminate those instances from the synthesis database, we proposed a new pruning method called weighted vector quantization (WVQ). The WVQ reflects relative importance of each synthesis unit instance when clustering the similar instances using vector quantization (VQ) technique. The proposed method was compared with two conventional pruning methods through the objective and subjective evaluations of the synthetic speech quality: one to simply limit maximum number of instance, and the other based on normal VQ-based clustering. The proposed method showed the best performance under 50% reduction rates. Over 50% of reduction rates, the synthetic speech quality is not seriously but perceptibly degraded. Using the proposed method, the synthesis database can be efficiently reduced without serious degradation of the synthetic speech quality.

  • PDF

A Closed-Form Solution of Linear Spectral Transformation for Robust Speech Recognition

  • Kim, Dong-Hyun;Yook, Dong-Suk
    • ETRI Journal
    • /
    • 제31권4호
    • /
    • pp.454-456
    • /
    • 2009
  • The maximum likelihood linear spectral transformation (ML-LST) using a numerical iteration method has been previously proposed for robust speech recognition. The numerical iteration method is not appropriate for real-time applications due to its computational complexity. In order to reduce the computational cost, the objective function of the ML-LST is approximated and a closed-form solution is proposed in this paper. It is shown experimentally that the proposed closed-form solution for the ML-LST can provide rapid speaker and environment adaptation for robust speech recognition.

음운 현상과 연속 발화에서의 단어 인지 - 종성중화 작용을 중심으로 (Phonological Process and Word Recognition in Continuous Speech: Evidence from Coda-neutralization)

  • 김선미;남기춘
    • 말소리와 음성과학
    • /
    • 제2권2호
    • /
    • pp.17-25
    • /
    • 2010
  • This study explores whether Koreans exploit their native coda-neutralization process when recognizing words in Korean continuous speech. According to the phonological rules in Korean, coda-neutralization process must come before the liaison process, as long as the latter(i.e. liaison process) occurs between 'words', which results in liaison-consonants being coda-neutralized ones such as /b/, /d/, or /g/, rather than non-neutralized ones like /p/, /t/, /k/, /ʧ/, /ʤ/, or /s/. Consequently, if Korean listeners use their native coda-neutralization rules when processing speech input, word recognition will be hampered when non-neutralized consonants precede vowel-initial targets. Word-spotting and word-monitoring tasks were conducted in Experiment 1 and 2, respectively. In both experiments, listeners recognized words faster and more accurately when vowel-initial target words were preceded by coda-neutralized consonants than when preceded by coda non-neutralized ones. The results show that Korean listeners exploit the coda-neutralization process when processing their native spoken language.

  • PDF

음소 음향학적 변화 패턴을 이용한 한국어 음성신호의 연속 모음 분할 (Consecutive Vowel Segmentation of Korean Speech Signal using Phonetic-Acoustic Transition Pattern)

  • 박창목;왕지남
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2001년도 추계학술발표논문집 (상)
    • /
    • pp.801-804
    • /
    • 2001
  • This article is concerned with automatic segmentation of two adjacent vowels for speech signals. All kinds of transition case of adjacent vowels can be characterized by spectrogram. Firstly the voiced-speech is extracted by the histogram analysis of vowel indicator which consists of wavelet low pass components. Secondly given phonetic transcription and transition pattern spectrogram, the voiced-speech portion which has consecutive vowels automatically segmented by the template matching. The cross-correlation function is adapted as a template matching method and the modified correlation coefficient is calculated for all frames. The largest value on the modified correlation coefficient series indicates the boundary of two consecutive vowel sounds. The experiment is performed for 154 vowel transition sets. The 154 spectrogram templates are gathered from 154 words(PRW Speech DB) and the 161 test words(PBW Speech DB) which are uttered by 5 speakers were tested. The experimental result shows the validity of the method.

  • PDF

BERT-Based Logits Ensemble Model for Gender Bias and Hate Speech Detection

  • Sanggeon Yun;Seungshik Kang;Hyeokman Kim
    • Journal of Information Processing Systems
    • /
    • 제19권5호
    • /
    • pp.641-651
    • /
    • 2023
  • Malicious hate speech and gender bias comments are common in online communities, causing social problems in our society. Gender bias and hate speech detection has been investigated. However, it is difficult because there are diverse ways to express them in words. To solve this problem, we attempted to detect malicious comments in a Korean hate speech dataset constructed in 2020. We explored bidirectional encoder representations from transformers (BERT)-based deep learning models utilizing hyperparameter tuning, data sampling, and logits ensembles with a label distribution. We evaluated our model in Kaggle competitions for gender bias, general bias, and hate speech detection. For gender bias detection, an F1-score of 0.7711 was achieved using an ensemble of the Soongsil-BERT and KcELECTRA models. The general bias task included the gender bias task, and the ensemble model achieved the best F1-score of 0.7166.

앤트로피 거절을 활용한 음성인식 시스템의 성능 향상 (Improvement of Speech Recognition System using Entropy Rejection)

  • 송점동
    • 정보학연구
    • /
    • 제2권2호
    • /
    • pp.139-144
    • /
    • 1999
  • 본 논문은 음성인식 시스템에서 정확도를 높이기 위해 후처리 단계에서 후보 단어들의 엔트로피 정보를 이용하였다. 기존의 우도비 검출방법은 음성 데이터에 따라 음성인식 시스템의 성능이 변하고 N개의 후보단어들의 우도값이 비슷하여 오인식 발생확률이 높았다. 그러나 본 눈문에서는 각 후보 단어들의 엔트로피 값보다 인식대상 단어 외의 단어들의 엔트로피 값이 상대적으로 낮은 후보를 거절하는 후처리 방법을 사용하여 음성 데이터에 독립적이면서도 변별력을 높인 정확한 음성인식 시스템을 얻을 수 있었다. 실험 결과 본 논문에서 제안하는 엔트로피에 의한 후처리 방법은 우도비에 의한 방법보다 인식 시스템의 성능을 false alarm이 20%일 때 최대 3.6% 향상시킬 수 있었다.

  • PDF

CDMA 통신망에서의 객관적 음질 평가 척도에 관한 연구 (A Study on Objective Speech Quality Measure under CDMA Telephone Networks Environment)

  • 김광수;김민정;석수영;정호열;정현열
    • 융합신호처리학회논문지
    • /
    • 제2권4호
    • /
    • pp.53-58
    • /
    • 2001
  • 이동전화망을 위한 신뢰성 높은 객관적 통화품질평가 척도개발을 위하여 Bark Spectral Distortion Perceptual Speech Quality Measure의 성능을 분석하여 이 척도들을 실제 환경에서 수집된 음성 데이터에 대해서 실험한 결과, 성능의 저하가 나타났다. 본 논문에서는 인간의 심리음향학적 특성인 마스킹을 적용하는 방안을 제안하여 그 유효성을 실험으로 확인하였다. 이때, masking threshold 계산에 tone 신호를 사용하기 때문에 음성신호에 대하여 계산할 경우 문제점이 있을 수 있으므로 scaling 을 적용하는 새로이 제안하였다. 디지털 이동통신망에서 수집된 음성 데이터에 대한 성능평가 결과, 기존의 척도들 보다 더 높은 성능을 보임을 확인하였다.

  • PDF

A Robust Non-Speech Rejection Algorithm

  • Ahn, Young-Mok
    • The Journal of the Acoustical Society of Korea
    • /
    • 제17권1E호
    • /
    • pp.10-13
    • /
    • 1998
  • We propose a robust non-speech rejection algorithm using the three types of pitch-related parameters. The robust non-speech rejection algorithm utilizes three kinds of pitch parameters : (1) pitch range, (2) difference of the successive pitch range, and (3) the number of successive pitches satisfying constraints related with the previous two parameters. The acceptance rate of the speech commands was 95% for -2.8dB signal-to-noise ratio (SNR) speech database that consisted of 2440 utterances. The rejection rate of the non-speech sounds was 100% while the acceptance rate of the speech commands was 97% in an office environment.

  • PDF

Multiple Acoustic Cues for Stop Recognition

  • Yun, Weon-Hee
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2003년도 10월 학술대회지
    • /
    • pp.3-16
    • /
    • 2003
  • ㆍAcoustic characteristics of stops in speech with contextual variability ㆍPosibility of stop recognition by post processing technique ㆍFurther work - Speech database - Modification of decoder - automatic segmentation of acoustic parameters

  • PDF

자연어 처리 기반 한국어 TTS 시스템 구현 (Implementation of Korean TTS System based on Natural Language Processing)

  • 김병창;이근배
    • 대한음성학회지:말소리
    • /
    • 제46호
    • /
    • pp.51-64
    • /
    • 2003
  • In order to produce high quality synthesized speech, it is very important to get an accurate grapheme-to-phoneme conversion and prosody model from texts using natural language processing. Robust preprocessing for non-Korean characters should also be required. In this paper, we analyzed Korean texts using a morphological analyzer, part-of-speech tagger and syntactic chunker. We present a new grapheme-to-phoneme conversion method for Korean using a hybrid method with a phonetic pattern dictionary and CCV (consonant vowel) LTS (letter to sound) rules, for unlimited vocabulary Korean TTS. We constructed a prosody model using a probabilistic method and decision tree-based method. The probabilistic method atone usually suffers from performance degradation due to inherent data sparseness problems. So we adopted tree-based error correction to overcome these training data limitations.

  • PDF