• 제목/요약/키워드: Speech improvement

검색결과 609건 처리시간 0.026초

성대특성 보간에 의한 합성음의 음질향상 - 음성코퍼스 내 개구간 비 보간을 위한 기초연구 - (Synthetic Speech Quality Improvement By Glottal parameter Interpolation - Preliminary study on open quotient interpolation in the speech corpus -)

  • 배재현;오영환
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2005년도 추계 학술대회 발표논문집
    • /
    • pp.63-66
    • /
    • 2005
  • For the Large Corpus based TTS the consistency of the speech corpus is very important. It is because the inconsistency of the speech quality in the corpus may result in a distortion at the concatenation point. And because of this inconsistency, large corpus must be tuned repeatedly One of the reasons for the inconsistency of the speech corpus is the different glottal characteristics of the speech sentence in the corpus. In this paper, we adjusted the glottal characteristics of the speech in the corpus to prevent this distortion. And the experimental results are showed.

  • PDF

Recursive Estimation using the Hidden Filter Model for Enhancing Noisy Speech

  • Kang, Yeong-Tae
    • The Journal of the Acoustical Society of Korea
    • /
    • 제15권3E호
    • /
    • pp.27-30
    • /
    • 1996
  • A recursive estimation for the enhancement of white noise contaminated speech is proposed. This method is based on the Kalman filter with time-varying parametric model for the clean speech signal. Then, hidden filter model are used to model the clean speech signal. An approximation improvement of 4-5 dB in SNR is achieved at 5 and 10 dB input SNR, respectively.

  • PDF

Evaluation for speech signal based on human sense and signal quality

  • Mekada, Yoshito;Hasegawa, Hiroshi;Kumagai, Takeshi;Kasuga, Masao
    • 한국방송∙미디어공학회:학술대회논문집
    • /
    • 한국방송공학회 1997년도 Proceedings International Workshop on New Video Media Technology
    • /
    • pp.13-18
    • /
    • 1997
  • Each reproducing speech signal has each particular signal property, because of the processing of encoding and decoding for communications through various media. In this paper, we examine the correlation between speech signal quality and sensory pleasure for the sensory improvement of that signal. In experiments, we evaluate the quality of speech signals through various media by psychological auditory test and physical features of these signals.

  • PDF

최대우도를 부가한 주파수 변이 PMC 방법의 잡음 음성 인식 성능개선 (Recognition Performance Improvement for Noisy-speech by Parallel Model Compensation Adaptation Using Frequency-variant added with ML)

  • 최숙남;정현열
    • 한국멀티미디어학회논문지
    • /
    • 제16권8호
    • /
    • pp.905-913
    • /
    • 2013
  • 잡음에 강건한 음성 인식을 위한 주파수 변이를 이용한 PMC( Parallel Model Compensation Using Frequency-variant, FV-PMC) 방법은 인식시 입력음성에 혼입이 예상되는 잡음들을 평균 주파수 변이도를 임계치로 하여 몇 가지 잡음 군으로 분류한 후 각 잡음 군 별로 인식을 수행하는 방법이다. 이 방법은 기준 임계치를 이용하여 양호하게 분류된 잡음 음성들에 대해서는 매우 우수한 성능을 보이나, 미 분류된 잡음 음성들에 대해서는 기존의 PMC 방법에서와 같이 무잡음 모델과 결합하여 음성 인식을 수행함으로 인해 평균 음성 인식률이 낮아지는 문제점이 있다. 이러한 문제점을 해결하기 위하여 본 논문에서는 기존의 방법에서 사용하였던 평균주파수 임계치 방법 대신에 최대 우도를 부가하여 미분류를 방지함으로써 입력 잡음음성에 포함되는 잡음의 군별 잡음 분류 율을 높여 인식률을 제고하는 개선된 주파수 변이 PMC 인식방법을 제안하였다. Aurora 2.0 데이터베이스를 이용한 인식실험결과, 기존의 FV-PMC 방법에 비해 향상된 결과를 확인할 수 있었다.

음성의 묵음구간 검출을 통한 DTW의 성능개선에 관한 연구 (A Study on the Improvement of DTW with Speech Silence Detection)

  • 김종국;조왕래;배명진
    • 음성과학
    • /
    • 제10권4호
    • /
    • pp.117-124
    • /
    • 2003
  • Speaker recognition is the technology that confirms the identification of speaker by using the characteristic of speech. Such technique is classified into speaker identification and speaker verification: The first method discriminates the speaker from the preregistered group and recognize the word, the second verifies the speaker who claims the identification. This method that extracts the information of speaker from the speech and confirms the individual identification becomes one of the most efficient technology as the service via telephone network is popularized. Some problems, however, must be solved for the real application as follows; The first thing is concerning that the safe method is necessary to reject the imposter because the recognition is not performed for the only preregistered customer. The second thing is about the fact that the characteristic of speech is changed as time goes by, So this fact causes the severe degradation of recognition rate and the inconvenience of users as the number of times to utter the text increases. The last thing is relating to the fact that the common characteristic among speakers causes the wrong recognition result. The silence parts being included the center of speech cause that identification rate is decreased. In this paper, to make improvement, We proposed identification rate can be improved by removing silence part before processing identification algorithm. The methods detecting speech area are zero crossing rate, energy of signal detect end point and starting point of the speech and process DTW algorithm by using two methods in this paper. As a result, the proposed method is obtained about 3% of improved recognition rate compare with the conventional methods.

  • PDF

감마톤 특징 추출 음향 모델을 이용한 음성 인식 성능 향상 (Speech Recognition Performance Improvement using Gamma-tone Feature Extraction Acoustic Model)

  • 안찬식;최기호
    • 디지털융복합연구
    • /
    • 제11권7호
    • /
    • pp.209-214
    • /
    • 2013
  • 음성 인식 시스템에서는 인식 성능 향상을 위한 방법으로 인간의 청취 능력을 인식 시스템에 접목하였으며 잡음 환경에서 음성 신호와 잡음을 분리하여 원하는 음성 신호만을 선택할 수 있도록 구성되었다. 하지만 실용적 측면에서 음성 인식 시스템의 성능 저하 요인으로 인식 환경 변화에 따른 잡음으로 인한 음성 검출이 정확하지 못하여 일어나는 것과 학습 모델이 일치하지 않는 것을 들 수 있다. 따라서 본 논문에서는 음성 인식 향상을 위해 감마톤을 이용하여 특징을 추출하고 음향 모델을 이용한 학습 모델을 제안하였다. 제안한 방법은 청각 장면 분석을 이용한 특징을 추출을 통해 인간의 청각 인지 능력을 반영하였으며 인식을 위한 학습 모델 과정에서 음향 모델을 이용하여 인식 성능을 향상시켰다. 성능 평가를 위해 잡음 환경의 -10dB, -5dB 신호에서 잡음 제거를 수행하여 SNR을 측정한 결과 3.12dB, 2.04dB의 성능이 향상됨을 확인하였다.

Syllable-timing Interferes with Korean Learners' Speech of Stress-timed English

  • Lee, Ok-Hwa;Kim, Jong-Mi
    • 음성과학
    • /
    • 제12권4호
    • /
    • pp.95-112
    • /
    • 2005
  • We investigate Korean learners' speech-timing of English before and after instruction in comparison with native speech, in an attempt to resolve disagreements in the literature as to whether speech-timing is measurable (Lehiste, 1977; Roach, 1982; Dauer, 1983 vs. Low et al., 2000; Yun 2002; Jian, 2004). We measured the pair-wise variability between the adjacent stressed and unstressed syllables within a foot as well as that among adjacent feet in approximately 555 English sentences, which were read by 29 native speakers and 41 Korean learners in the intermediate proficiency level. The results show that in comparison with native American English, Korean learner speech is before instruction significantly (p<.001) smaller for the pair-wise variability between the adjacent stressed and unstressed syllables within a foot; and significantly (p=.01) bigger for the variability among adjacent feet within the utterance. The learner speech after instruction showed significant (p=.01) improvement in the pair-wise variability of syllable sequence toward native speech values. The variability among adjacent feet was progressively smaller for learner speech before and after instruction and for native speech (p=.03). We thus conclude that the speech timing difference between Korean English and American English is measurable in terms of the duration. of stressed and unstressed syllables and that the latter is stress-timed and the former is syllable-timing interfered.

  • PDF

노인에서 성대 용종의 후두 미세수술 후 음성검사 결과 (Result of Voice Analysis after Laryngeal Microsurgery for Vocal Polyp in Elderly)

  • 최정임;여장옥;진성민;이상혁
    • 대한후두음성언어의학회지
    • /
    • 제22권1호
    • /
    • pp.47-51
    • /
    • 2011
  • Background and Objectives: Vocal polyps arc one of the most frequent benign laryngeal diseases. They arc usually found at the midpoint of the vocal fold. They are mainly caused by vocal overuse. Vocal polyps arc usually removed surgically. Generally, age-related changes to speech are attributed to change in anatomy and physiology of the speech mechanism. These changes result in increased variability in the acoustic properties of speech with age. Still, not 'all studies of age-related changes in speech have taken differences between the young group and adult group after laryngeal microsurgery into account. The aim of this investigation was to compare improvement of acoustic analysis in young patients and elderly patients with vocal polyps, before and after the laryngeal microsurgery. Materials and Method: One hundred and twenty-eight patients who underwent laryngeal microsurgery for vocal polyps from 2008 through 2011 were reviewed retrospectively. 105 of the 128 patients under age 60 were classified as adult group (AG), and remaining 23 patients as elderly group (EG). The speech of AG and EG were evaluated before and after surgery for identification of differences for age group across measures of fundamental frequency (F0), Jitter, Shimmer and Maximum phonation time (MPT). Results: There were not significant differences between two groups for improvement of F0, Jitter, Shimmer, NHR, and MPT before and after surgery. The findings suggest that elderly group compares quite well with adult group in effectiveness of surgery. However, comparison between elderly group and young group (Age under 40) there was significant difference of improvement in Jitter and Shimmer. Conclusion: In general, the results of the present research showed significant improvement in vocal quality after phonosurgery of vocal polyp in both elderly and adult group. However, comparison of improvement between elderly group and young group, there were significant differences of improvement in jitter and shimmer. Therefore, in treatment planning of elderly group, we should consider age related changes of vocal cord.

  • PDF

Spike Train Decoding에 기반한 인공와우 어음처리기의 음성시작점 정보 전달특성 평가 (Performance Evaluation of Speech Onset Representation Characteristic of Cochlear Implants Speech Processor using Spike Train Decoding)

  • 김두희;김진호;김경환
    • 대한의용생체공학회:의공학회지
    • /
    • 제28권5호
    • /
    • pp.694-702
    • /
    • 2007
  • The adaptation effect originating from the chemical synapse between auditory nerve and inner hair cell gives advantage in accurate representation of temporal cues of incoming speech such as speech onset. Thus it is expected that the modification of conventional speech processing strategies of cochlear implant(CI) by incorporating the adaptation effect will result in considerable improvement of speech perception performance such as consonant perception score. Our purpose in this paper was to evaluate our new CI speech processing strategy incorporating the adaptation effect by the observation of auditory nerve responses. By classifying the presence or absence of speech from the auditory nerve responses, i. e. spike trains, we could quantitatively compare speech onset detection performances of conventional and improved strategies. We could verify the effectiveness of the adaptation effect in improving the speech onset representation characteristics.

지식베이스를 이용한 임베디드용 연속음성인식의 어휘 적용률 개선 (Vocabulary Coverage Improvement for Embedded Continuous Speech Recognition Using Knowledgebase)

  • 김광호;임민규;김지환
    • 대한음성학회지:말소리
    • /
    • 제68권
    • /
    • pp.115-126
    • /
    • 2008
  • In this paper, we propose a vocabulary coverage improvement method for embedded continuous speech recognition (CSR) using knowledgebase. A vocabulary in CSR is normally derived from a word frequency list. Therefore, the vocabulary coverage is dependent on a corpus. In the previous research, we presented an improved way of vocabulary generation using part-of-speech (POS) tagged corpus. We analyzed all words paired with 101 among 152 POS tags and decided on a set of words which have to be included in vocabularies of any size. However, for the other 51 POS tags (e.g. nouns, verbs), the vocabulary inclusion of words paired with such POS tags are still based on word frequency counted on a corpus. In this paper, we propose a corpus independent word inclusion method for noun-, verb-, and named entity(NE)-related POS tags using knowledgebase. For noun-related POS tags, we generate synonym groups and analyze their relative importance using Google search. Then, we categorize verbs by lemma and analyze relative importance of each lemma from a pre-analyzed statistic for verbs. We determine the inclusion order of NEs through Google search. The proposed method shows better coverage for the test short message service (SMS) text corpus.

  • PDF