Search | Korea Science

Synthetic Speech Quality Improvement By Glottal parameter Interpolation - Preliminary study on open quotient interpolation in the speech corpus - (성대특성 보간에 의한 합성음의 음질향상 - 음성코퍼스 내 개구간 비 보간을 위한 기초연구 -)

Bae, Jae-Hyun;Oh, Yung-Hwa
- Proceedings of the KSPS conference
- /
- 2005.11a
- /
- pp.63-66
- /
- 2005
For the Large Corpus based TTS the consistency of the speech corpus is very important. It is because the inconsistency of the speech quality in the corpus may result in a distortion at the concatenation point. And because of this inconsistency, large corpus must be tuned repeatedly One of the reasons for the inconsistency of the speech corpus is the different glottal characteristics of the speech sentence in the corpus. In this paper, we adjusted the glottal characteristics of the speech in the corpus to prevent this distortion. And the experimental results are showed.
PDF

Recursive Estimation using the Hidden Filter Model for Enhancing Noisy Speech

Kang, Yeong-Tae
- The Journal of the Acoustical Society of Korea
- /
- v.15 no.3E
- /
- pp.27-30
- /
- 1996
A recursive estimation for the enhancement of white noise contaminated speech is proposed. This method is based on the Kalman filter with time-varying parametric model for the clean speech signal. Then, hidden filter model are used to model the clean speech signal. An approximation improvement of 4-5 dB in SNR is achieved at 5 and 10 dB input SNR, respectively.
PDF

Evaluation for speech signal based on human sense and signal quality

Mekada, Yoshito;Hasegawa, Hiroshi;Kumagai, Takeshi;Kasuga, Masao
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 1997.06a
- /
- pp.13-18
- /
- 1997
Each reproducing speech signal has each particular signal property, because of the processing of encoding and decoding for communications through various media. In this paper, we examine the correlation between speech signal quality and sensory pleasure for the sensory improvement of that signal. In experiments, we evaluate the quality of speech signals through various media by psychological auditory test and physical features of these signals.
PDF

Recognition Performance Improvement for Noisy-speech by Parallel Model Compensation Adaptation Using Frequency-variant added with ML (최대우도를 부가한 주파수 변이 PMC 방법의 잡음 음성 인식 성능개선)

Choi, Sook-Nam;Chung, Hyun-Yeol
- Journal of Korea Multimedia Society
- /
- v.16 no.8
- /
- pp.905-913
- /
- 2013
The Parallel Model Compensation Using Frequency-variant: FV-PMC for noise-robust speech recognition is a method to classify the noises, which are expected to be intermixed with input speech when recognized, into several groups of noises by setting average frequency variant as a threshold value; and to recognize the noises depending on the classified groups. This demonstrates the excellent performance considering noisy speech categorized as good using the standard threshold value. However, it also holds a problem to decrease the average speech recognition rate with regard to unclassified noisy speech, for it conducts the process of speech recognition, combined with noiseless model as in the existing PMC. To solve this problem, this paper suggests a enhanced method of recognition to prevent the unclassified through improving the extent of rating scales with use of maximum likelihood so that the noise groups, including input noisy speech, can be classified into more specific groups, which leads to improvement of the recognition rate. The findings from recognition experiments using Aurora 2.0 database showed the improved results compared with those from the method of the previous FV-PMC.
https://doi.org/10.9717/kmms.2013.16.8.905 인용 PDF KSCI

A Study on the Improvement of DTW with Speech Silence Detection (음성의 묵음구간 검출을 통한 DTW의 성능개선에 관한 연구)

Kim, Jong-Kuk;Jo, Wang-Rae;Bae, Myung-Jin
- Speech Sciences
- /
- v.10 no.4
- /
- pp.117-124
- /
- 2003
Speaker recognition is the technology that confirms the identification of speaker by using the characteristic of speech. Such technique is classified into speaker identification and speaker verification: The first method discriminates the speaker from the preregistered group and recognize the word, the second verifies the speaker who claims the identification. This method that extracts the information of speaker from the speech and confirms the individual identification becomes one of the most efficient technology as the service via telephone network is popularized. Some problems, however, must be solved for the real application as follows; The first thing is concerning that the safe method is necessary to reject the imposter because the recognition is not performed for the only preregistered customer. The second thing is about the fact that the characteristic of speech is changed as time goes by, So this fact causes the severe degradation of recognition rate and the inconvenience of users as the number of times to utter the text increases. The last thing is relating to the fact that the common characteristic among speakers causes the wrong recognition result. The silence parts being included the center of speech cause that identification rate is decreased. In this paper, to make improvement, We proposed identification rate can be improved by removing silence part before processing identification algorithm. The methods detecting speech area are zero crossing rate, energy of signal detect end point and starting point of the speech and process DTW algorithm by using two methods in this paper. As a result, the proposed method is obtained about 3% of improved recognition rate compare with the conventional methods.
PDF

Speech Recognition Performance Improvement using Gamma-tone Feature Extraction Acoustic Model (감마톤 특징 추출 음향 모델을 이용한 음성 인식 성능 향상)

Ahn, Chan-Shik;Choi, Ki-Ho
- Journal of Digital Convergence
- /
- v.11 no.7
- /
- pp.209-214
- /
- 2013
Improve the recognition performance of speech recognition systems as a method for recognizing human listening skills were incorporated into the system. In noisy environments by separating the speech signal and noise, select the desired speech signal. but In terms of practical performance of speech recognition systems are factors. According to recognized environmental changes due to noise speech detection is not accurate and learning model does not match. In this paper, to improve the speech recognition feature extraction using gamma tone and learning model using acoustic model was proposed. The proposed method the feature extraction using auditory scene analysis for human auditory perception was reflected In the process of learning models for recognition. For performance evaluation in noisy environments, -10dB, -5dB noise in the signal was performed to remove 3.12dB, 2.04dB SNR improvement in performance was confirmed.
https://doi.org/10.14400/JDPM.2013.11.7.209 인용 PDF

Syllable-timing Interferes with Korean Learners' Speech of Stress-timed English

Lee, Ok-Hwa;Kim, Jong-Mi
- Speech Sciences
- /
- v.12 no.4
- /
- pp.95-112
- /
- 2005
We investigate Korean learners' speech-timing of English before and after instruction in comparison with native speech, in an attempt to resolve disagreements in the literature as to whether speech-timing is measurable (Lehiste, 1977; Roach, 1982; Dauer, 1983 vs. Low et al., 2000; Yun 2002; Jian, 2004). We measured the pair-wise variability between the adjacent stressed and unstressed syllables within a foot as well as that among adjacent feet in approximately 555 English sentences, which were read by 29 native speakers and 41 Korean learners in the intermediate proficiency level. The results show that in comparison with native American English, Korean learner speech is before instruction significantly (p<.001) smaller for the pair-wise variability between the adjacent stressed and unstressed syllables within a foot; and significantly (p=.01) bigger for the variability among adjacent feet within the utterance. The learner speech after instruction showed significant (p=.01) improvement in the pair-wise variability of syllable sequence toward native speech values. The variability among adjacent feet was progressively smaller for learner speech before and after instruction and for native speech (p=.03). We thus conclude that the speech timing difference between Korean English and American English is measurable in terms of the duration. of stressed and unstressed syllables and that the latter is stress-timed and the former is syllable-timing interfered.
PDF

Result of Voice Analysis after Laryngeal Microsurgery for Vocal Polyp in Elderly (노인에서 성대 용종의 후두 미세수술 후 음성검사 결과)

Choi, Jeong-Im;Yeo, Jang-Ok;Jin, Sung-Min;Lee, Sang-Hyuk
- Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
- /
- v.22 no.1
- /
- pp.47-51
- /
- 2011
Background and Objectives: Vocal polyps arc one of the most frequent benign laryngeal diseases. They arc usually found at the midpoint of the vocal fold. They are mainly caused by vocal overuse. Vocal polyps arc usually removed surgically. Generally, age-related changes to speech are attributed to change in anatomy and physiology of the speech mechanism. These changes result in increased variability in the acoustic properties of speech with age. Still, not 'all studies of age-related changes in speech have taken differences between the young group and adult group after laryngeal microsurgery into account. The aim of this investigation was to compare improvement of acoustic analysis in young patients and elderly patients with vocal polyps, before and after the laryngeal microsurgery. Materials and Method: One hundred and twenty-eight patients who underwent laryngeal microsurgery for vocal polyps from 2008 through 2011 were reviewed retrospectively. 105 of the 128 patients under age 60 were classified as adult group (AG), and remaining 23 patients as elderly group (EG). The speech of AG and EG were evaluated before and after surgery for identification of differences for age group across measures of fundamental frequency (F0), Jitter, Shimmer and Maximum phonation time (MPT). Results: There were not significant differences between two groups for improvement of F0, Jitter, Shimmer, NHR, and MPT before and after surgery. The findings suggest that elderly group compares quite well with adult group in effectiveness of surgery. However, comparison between elderly group and young group (Age under 40) there was significant difference of improvement in Jitter and Shimmer. Conclusion: In general, the results of the present research showed significant improvement in vocal quality after phonosurgery of vocal polyp in both elderly and adult group. However, comparison of improvement between elderly group and young group, there were significant differences of improvement in jitter and shimmer. Therefore, in treatment planning of elderly group, we should consider age related changes of vocal cord.
PDF

Performance Evaluation of Speech Onset Representation Characteristic of Cochlear Implants Speech Processor using Spike Train Decoding (Spike Train Decoding에 기반한 인공와우 어음처리기의 음성시작점 정보 전달특성 평가)

Kim, Doo-Hee;Kim, Jin-Ho;Kim, Kyung-Hwan
- Journal of Biomedical Engineering Research
- /
- v.28 no.5
- /
- pp.694-702
- /
- 2007
The adaptation effect originating from the chemical synapse between auditory nerve and inner hair cell gives advantage in accurate representation of temporal cues of incoming speech such as speech onset. Thus it is expected that the modification of conventional speech processing strategies of cochlear implant(CI) by incorporating the adaptation effect will result in considerable improvement of speech perception performance such as consonant perception score. Our purpose in this paper was to evaluate our new CI speech processing strategy incorporating the adaptation effect by the observation of auditory nerve responses. By classifying the presence or absence of speech from the auditory nerve responses, i. e. spike trains, we could quantitatively compare speech onset detection performances of conventional and improved strategies. We could verify the effectiveness of the adaptation effect in improving the speech onset representation characteristics.
https://doi.org/10.9718/JBER.2007.28.5.694 인용 PDF KSCI

Vocabulary Coverage Improvement for Embedded Continuous Speech Recognition Using Knowledgebase (지식베이스를 이용한 임베디드용 연속음성인식의 어휘 적용률 개선)

Kim, Kwang-Ho;Lim, Min-Kyu;Kim, Ji-Hwan
- MALSORI
- /
- v.68
- /
- pp.115-126
- /
- 2008
In this paper, we propose a vocabulary coverage improvement method for embedded continuous speech recognition (CSR) using knowledgebase. A vocabulary in CSR is normally derived from a word frequency list. Therefore, the vocabulary coverage is dependent on a corpus. In the previous research, we presented an improved way of vocabulary generation using part-of-speech (POS) tagged corpus. We analyzed all words paired with 101 among 152 POS tags and decided on a set of words which have to be included in vocabularies of any size. However, for the other 51 POS tags (e.g. nouns, verbs), the vocabulary inclusion of words paired with such POS tags are still based on word frequency counted on a corpus. In this paper, we propose a corpus independent word inclusion method for noun-, verb-, and named entity(NE)-related POS tags using knowledgebase. For noun-related POS tags, we generate synonym groups and analyze their relative importance using Google search. Then, we categorize verbs by lemma and analyze relative importance of each lemma from a pre-analyzed statistic for verbs. We determine the inclusion order of NEs through Google search. The proposed method shows better coverage for the test short message service (SMS) text corpus.
PDF

Search Result 610, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)