Search | Korea Science

Improvement of Bit Rate applying the Speaking Rate and PSOLA Technique of Speech in CELP Vocoder (음성신호의 발성율과 PSOLA기법을 적용한 음성 보코더 전송률 개선에 관한 연구)

장경아;서지호;배명진
- Proceedings of the IEEK Conference
- /
- 2003.11a
- /
- pp.45-48
- /
- 2003
In general, speech coding methods are classified into the following three categories: the waveform coding, the source coding and the hybrid coding. Fast speaking is possible to encode with a few information compared with slow speaking rate. In case of speaking rate, low frequency band is more important than high frequency band while listening. Speech vocoding technique is developing to way with low bit rate and complexity and high sound quality. the CELP type of vocoder support very good sound quality with low bit rate but these vocoders don't consider about the speaking rate. When we consider speaking rate and encode the frame depending on the speaking rate, the bit rate is able to reduce the bit rate than the conventional vocoder. We propose the technique to estimate the speaking rate and applied PSOLA technique in case of the frame of slow speaking rate. As a result of simulation bit rate can be reduced about 300 bps.
PDF

Speech Fluency Characteristics of Adults in Their Manhood and Senescence (장.노년기 성인의 유창성 특성 연구)

Jeon, Hee-Sook;Kim, Hyo-Jung;Shin, Myung-Sun;Chang, Hyun-Jin
- The Journal of the Korea Contents Association
- /
- v.11 no.3
- /
- pp.318-326
- /
- 2011
With the increase of senior population, adults in their manhood and senescence with neurogenic defects also increase as well; thus, it is necessary to conduct foundational research on speech fluency to rehabilitate adults with neurogenic language disorders. Thereupon, this study analyzes the characteristics of speech fluency comparatively by age and sex with the subjects of normal adults in their 50's to 70's. According to the result of collecting language samples from total 90 adults, 30 (15 males, 15 females) in each age group of the 50's, 60's, and 70's and comparing the speech rate and disfluency frequency, first, adults in their 70's showed slower speech rate than those in their 50's or 60's. And those in their 50's, 60's, and 70's indicated no difference in their speech rate by sex. Second, there was no difference in normal disfluency and total disfluency among the adults in the 50's, 60's, and 70's. Also, there was no difference among the age groups by sex, either. Third, there was no correlation between speech rate of all the age groups and disfluency frequency.
https://doi.org/10.5392/JKCA.2011.11.3.318 인용 PDF KSCI

Comparing the effects of letter-based and syllable-based speaking rates on the pronunciation assessment of Korean speakers of English (철자 기반과 음절 기반 속도가 한국인 영어 학습자의 발음 평가에 미치는 영향 비교)

Hyunsong Chung
- Phonetics and Speech Sciences
- /
- v.15 no.4
- /
- pp.1-10
- /
- 2023
This study investigated the relative effectiveness of letter-based versus syllable-based measures of speech rate and articulation rate in predicting the articulation score, prosody fluency, and rating sum using "English speech data of Koreans for education" from AI Hub. We extracted and analyzed 900 utterances from the training data, including three balanced age groups (13, 19, and 26 years old). The study built three models that best predicted the pronunciation assessment scores using linear mixed-effects regression and compared the predicted scores with the actual scores from the validation data (n=180). The correlation coefficients between them were also calculated. The findings revealed that syllable-based measures of speech and articulation rates were more effective than letter-based measures in all three pronunciation assessment categories. The correlation coefficients between the predicted and actual scores ranged from .65 to .68, indicating the models' good predictive power. However, it remains inconclusive whether speech rate or articulation rate is more effective.
https://doi.org/10.13064/KSSS.2023.15.4.001 인용 PDF

Korean Speech Recognition Based on Syllable (음절을 기반으로한 한국어 음성인식)

Lee, Young-Ho;Jeong, Hong
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.31B no.1
- /
- pp.11-22
- /
- 1994
For the conventional systme based on word, it is very difficult to enlarge the number of vocabulary. To cope with this problem, we must use more fundamental units of speech. For example, syllables and phonemes are such units, Korean speech consists of initial consonants, middle vowels and final consonants and has characteristic that we can obtain syllables from speech easily. In this paper, we show a speech recognition system with the advantage of the syllable characteristics peculiar to the Korean speech. The algorithm of recognition system is the Time Delay Neural Network. To recognize many recognition units, system consists of initial consonants, middle vowels, and final consonants recognition neural network. At first, our system recognizes initial consonants, middle vowels and final consonants. Then using this results, system recognizes isolated words. Through experiments, we got 85.12% recognition rate for 2735 data of initial consonants, 86.95% recognition rate for 3110 data of middle vowels, and 90.58% recognition rate for 1615 data of final consonants. And we got 71.2% recognition rate for 250 data of isolated words.
PDF

A Study of Energy Parameter without Windowing Influence in Speech Signal (윈도우의 영향이 제거된 에너지 파라미터에 관한 연구)

조태수;신동성;배명진
- Proceedings of the IEEK Conference
- /
- 2001.06d
- /
- pp.277-280
- /
- 2001
The preprocessing is very important course in speech signal processing. It influence the compression-rate in speech coding and the recognition-rate in speech recognition etc. In this paper, we propose that minimizing window-influence method with pitch period and start points. The proposed method is available for voiced detection and word labeling.
PDF

Speech Recognition for twenty questions game (스무고개 게임을 위한 음성인식)

노용완;윤재선;홍광석
- Proceedings of the IEEK Conference
- /
- 2002.06d
- /
- pp.203-206
- /
- 2002
In this paper, we present a sentence speech recognizer for twenty questions game. The proposed approaches for speaker-independent sentence speech recognition can be divided into two steps. One is extraction of the number of syllables in eojeol for candidate reduction, and the other is knowledge based language model for sentence recognition. For twenty questions game, we implemented speech recognizer using 956 sentences and 1095 eojeols. The results obtained in our experiments were 87% sentence recognition rate and 90.15% eojeol recognition rate.
PDF

Complexity Reduction Algorithm of Speech Coder(EVRC) for CDMA Digital Cellular System

Min, So-Yeon
- Journal of Korea Multimedia Society
- /
- v.10 no.12
- /
- pp.1551-1558
- /
- 2007
The standard of evaluating function of speech coder for mobile telecommunication can be shown in channel capacity, noise immunity, encryption, complexity and encoding delay largely. This study is an algorithm to reduce complexity applying to CDMA(Code Division Multiple Access) mobile telecommunication system, which has a benefit of keeping the existing advantage of telecommunication quality and low transmission rate. This paper has an objective to reduce the computing complexity by controlling the frequency band nonuniform during the changing process of LSP(Line Spectrum Pairs) parameters from LPC(Line Predictive Coding) coefficients used for EVRC(Enhanced Variable-Rate Coder, IS-127) speech coders. Its experimental result showed that when comparing the speech coder applied by the proposed algorithm with the existing EVRC speech coder, it's decreased by 45% at average. Also, the values of LSP parameters, Synthetic speech signal and Spectrogram test result were obtained same as the existing method.
PDF

Improved speech emotion recognition using histogram equalization and data augmentation techniques (히스토그램 등화와 데이터 증강 기법을 이용한 개선된 음성 감정 인식)

Heo, Woon-Haeng;Kwon, Oh-Wook
- Phonetics and Speech Sciences
- /
- v.9 no.2
- /
- pp.77-83
- /
- 2017
We propose a new method to reduce emotion recognition errors caused by variation in speaker characteristics and speech rate. Firstly, for reducing variation in speaker characteristics, we adjust features from a test speaker to fit the distribution of all training data by using the histogram equalization (HE) algorithm. Secondly, for dealing with variation in speech rate, we augment the training data with speech generated in various speech rates. In computer experiments using EMO-DB, KRN-DB and eNTERFACE-DB, the proposed method is shown to improve weighted accuracy relatively by 34.7%, 23.7% and 28.1%, respectively.
https://doi.org/10.13064/KSSS.2017.9.2.077 인용 PDF KSCI

Implementation of Wideband Waveform Interpolation Coder for TTS DB Compression (TTS DB 압축을 위한 광대역 파형보간 부호기 구현)

Yang, Hee-Sik;Hahn, Min-Soo
- MALSORI
- /
- v.55
- /
- pp.143-158
- /
- 2005
The adequate compression algorithm is essential to achieve high quality embedded TTS system. in this paper, we Propose waveform interpolation coder for TTS corpus compression after many speech coder investigation. Unlike speech coders in communication system, compression rate and anality are more important factors in TTS DB compression than other performance criteria. Thus we select waveform interpolation algorithm because it provides good speech quality under high compression rate at the cost of complexity. The implemented coder has bit rate 6kbps with quality degradation 0.47. The performance indicates that the waveform interpolation is adequate for TTS DB compression with some further study.
PDF

A Study on Recognition of Korean Postpositions and Suffixes in Continuous Speech (한국어 연속음성에서의 조사 및 어미 인식에 관한 연구)

Song, Min-Suck;Lee, Ki-Young
- Speech Sciences
- /
- v.6
- /
- pp.181-195
- /
- 1999
This study proposes a method of recognizing postpositions and suffixes in Korean spoken language, using prosodic information. We detect grammatical boundaries automatically at first, by using prosodic information of the accentual phrase, and then we recognize grammatical function words by backward-tracking from the boundaries. The experiment employs 300 sentential speech data of 10 men's and 5 women's voice spoken in standard Korean, in which 1080 accentual phrases and 11 postpositions and suffixes are included. The result shows the recognition rate of postpositions in two cases. In one case in which only correctly detected boundaries are included, the recognition rate is 97.5%, and in the other case in which all detected boundaries are included, the recognition rate is 74.8%.
PDF

Search Result 1,242, Processing Time 0.032 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)