Search | Korea Science

A New Pruning Method for Synthesis Database Reduction Using Weighted Vector Quantization

Kim, Sanghun;Lee, Youngjik;Keikichi Hirose
- The Journal of the Acoustical Society of Korea
- /
- v.20 no.4E
- /
- pp.31-38
- /
- 2001
A large-scale synthesis database for a unit selection based synthesis method usually retains redundant synthesis unit instances, which are useless to the synthetic speech quality. In this paper, to eliminate those instances from the synthesis database, we proposed a new pruning method called weighted vector quantization (WVQ). The WVQ reflects relative importance of each synthesis unit instance when clustering the similar instances using vector quantization (VQ) technique. The proposed method was compared with two conventional pruning methods through the objective and subjective evaluations of the synthetic speech quality: one to simply limit maximum number of instance, and the other based on normal VQ-based clustering. The proposed method showed the best performance under 50% reduction rates. Over 50% of reduction rates, the synthetic speech quality is not seriously but perceptibly degraded. Using the proposed method, the synthesis database can be efficiently reduced without serious degradation of the synthetic speech quality.
PDF

A Closed-Form Solution of Linear Spectral Transformation for Robust Speech Recognition

Kim, Dong-Hyun;Yook, Dong-Suk
- ETRI Journal
- /
- v.31 no.4
- /
- pp.454-456
- /
- 2009
The maximum likelihood linear spectral transformation (ML-LST) using a numerical iteration method has been previously proposed for robust speech recognition. The numerical iteration method is not appropriate for real-time applications due to its computational complexity. In order to reduce the computational cost, the objective function of the ML-LST is approximated and a closed-form solution is proposed in this paper. It is shown experimentally that the proposed closed-form solution for the ML-LST can provide rapid speaker and environment adaptation for robust speech recognition.
https://doi.org/10.4218/etrij.09.0209.0012 인용 PDF

Phonological Process and Word Recognition in Continuous Speech: Evidence from Coda-neutralization (음운 현상과 연속 발화에서의 단어 인지 - 종성중화 작용을 중심으로)

Kim, Sun-Mi;Nam, Ki-Chun
- Phonetics and Speech Sciences
- /
- v.2 no.2
- /
- pp.17-25
- /
- 2010
This study explores whether Koreans exploit their native coda-neutralization process when recognizing words in Korean continuous speech. According to the phonological rules in Korean, coda-neutralization process must come before the liaison process, as long as the latter(i.e. liaison process) occurs between 'words', which results in liaison-consonants being coda-neutralized ones such as /b/, /d/, or /g/, rather than non-neutralized ones like /p/, /t/, /k/, /ʧ/, /ʤ/, or /s/. Consequently, if Korean listeners use their native coda-neutralization rules when processing speech input, word recognition will be hampered when non-neutralized consonants precede vowel-initial targets. Word-spotting and word-monitoring tasks were conducted in Experiment 1 and 2, respectively. In both experiments, listeners recognized words faster and more accurately when vowel-initial target words were preceded by coda-neutralized consonants than when preceded by coda non-neutralized ones. The results show that Korean listeners exploit the coda-neutralization process when processing their native spoken language.
PDF

Consecutive Vowel Segmentation of Korean Speech Signal using Phonetic-Acoustic Transition Pattern (음소 음향학적 변화 패턴을 이용한 한국어 음성신호의 연속 모음 분할)

Park, Chang-Mok;Wang, Gi-Nam
- Proceedings of the Korea Information Processing Society Conference
- /
- 2001.10a
- /
- pp.801-804
- /
- 2001
This article is concerned with automatic segmentation of two adjacent vowels for speech signals. All kinds of transition case of adjacent vowels can be characterized by spectrogram. Firstly the voiced-speech is extracted by the histogram analysis of vowel indicator which consists of wavelet low pass components. Secondly given phonetic transcription and transition pattern spectrogram, the voiced-speech portion which has consecutive vowels automatically segmented by the template matching. The cross-correlation function is adapted as a template matching method and the modified correlation coefficient is calculated for all frames. The largest value on the modified correlation coefficient series indicates the boundary of two consecutive vowel sounds. The experiment is performed for 154 vowel transition sets. The 154 spectrogram templates are gathered from 154 words(PRW Speech DB) and the 161 test words(PBW Speech DB) which are uttered by 5 speakers were tested. The experimental result shows the validity of the method.
PDF

BERT-Based Logits Ensemble Model for Gender Bias and Hate Speech Detection

Sanggeon Yun;Seungshik Kang;Hyeokman Kim
- Journal of Information Processing Systems
- /
- v.19 no.5
- /
- pp.641-651
- /
- 2023
Malicious hate speech and gender bias comments are common in online communities, causing social problems in our society. Gender bias and hate speech detection has been investigated. However, it is difficult because there are diverse ways to express them in words. To solve this problem, we attempted to detect malicious comments in a Korean hate speech dataset constructed in 2020. We explored bidirectional encoder representations from transformers (BERT)-based deep learning models utilizing hyperparameter tuning, data sampling, and logits ensembles with a label distribution. We evaluated our model in Kaggle competitions for gender bias, general bias, and hate speech detection. For gender bias detection, an F1-score of 0.7711 was achieved using an ensemble of the Soongsil-BERT and KcELECTRA models. The general bias task included the gender bias task, and the ensemble model achieved the best F1-score of 0.7166.
https://doi.org/10.3745/JIPS.04.0287 인용 PDF

Improvement of Speech Recognition System using Entropy Rejection (앤트로피 거절을 활용한 음성인식 시스템의 성능 향상)

송점동
- The Journal of Information Technology
- /
- v.2 no.2
- /
- pp.139-144
- /
- 1999
This thesis is a study on using of entropy information about the additional words in the after processing step to promote an accuracy in speech recognition system. The exsisting ratio of Woodo detective method changes the efficiency of speech recognition system according to speech data and increases the probability of producing error recognition because of similarity of value of Woodo in the additional words. But we could obtain the accurate speech recognition system which heightens discrimination becoming independent of speech data by using of after processing method refusing a candidate which entropy price is lower among words except words we could recognize than entropy Price of each additional word. As a result of this experiment when the false alarm is 20 percent, we could put out the maximum 3.6 percent efficiency of recognition system through this after processing method by entropy more than the method by ratio of Woods.
PDF

A Study on Objective Speech Quality Measure under CDMA Telephone Networks Environment (CDMA 통신망에서의 객관적 음질 평가 척도에 관한 연구)

김광수;김민정;석수영;정호열;정현열
- Journal of the Institute of Convergence Signal Processing
- /
- v.2 no.4
- /
- pp.53-58
- /
- 2001
In this paper to develop objective speech quality measure for CDMA telephone network environments, recent developed measures are investigated first. But those measures show low performances in CDMA telephone networks. To solve this problem, new objective speech quality measure adopting noise masking threshold is proposed and studied. To acquire better performance, scaled noise masking threshold calculation for speech signals is employed instead of conventional tone signals. To verify effectiveness of proposed method performance comparison experiments are carried out for CDMA telephone network speech databases, for the results proposed methods show improved performances compared to existing meaures.
PDF

A Robust Non-Speech Rejection Algorithm

Ahn, Young-Mok
- The Journal of the Acoustical Society of Korea
- /
- v.17 no.1E
- /
- pp.10-13
- /
- 1998
We propose a robust non-speech rejection algorithm using the three types of pitch-related parameters. The robust non-speech rejection algorithm utilizes three kinds of pitch parameters : (1) pitch range, (2) difference of the successive pitch range, and (3) the number of successive pitches satisfying constraints related with the previous two parameters. The acceptance rate of the speech commands was 95% for -2.8dB signal-to-noise ratio (SNR) speech database that consisted of 2440 utterances. The rejection rate of the non-speech sounds was 100% while the acceptance rate of the speech commands was 97% in an office environment.
PDF

Multiple Acoustic Cues for Stop Recognition

Yun, Weon-Hee
- Proceedings of the KSPS conference
- /
- 2003.10a
- /
- pp.3-16
- /
- 2003
ㆍAcoustic characteristics of stops in speech with contextual variability ㆍPosibility of stop recognition by post processing technique ㆍFurther work - Speech database - Modification of decoder - automatic segmentation of acoustic parameters
PDF

Implementation of Korean TTS System based on Natural Language Processing (자연어 처리 기반 한국어 TTS 시스템 구현)

Kim Byeongchang;Lee Gary Geunbae
- MALSORI
- /
- no.46
- /
- pp.51-64
- /
- 2003
In order to produce high quality synthesized speech, it is very important to get an accurate grapheme-to-phoneme conversion and prosody model from texts using natural language processing. Robust preprocessing for non-Korean characters should also be required. In this paper, we analyzed Korean texts using a morphological analyzer, part-of-speech tagger and syntactic chunker. We present a new grapheme-to-phoneme conversion method for Korean using a hybrid method with a phonetic pattern dictionary and CCV (consonant vowel) LTS (letter to sound) rules, for unlimited vocabulary Korean TTS. We constructed a prosody model using a probabilistic method and decision tree-based method. The probabilistic method atone usually suffers from performance degradation due to inherent data sparseness problems. So we adopted tree-based error correction to overcome these training data limitations.
PDF

Search Result 960, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)