• 제목/요약/키워드: Speech characteristics

검색결과 970건 처리시간 0.024초

확률적 스펙트럼 차감법을 이용한 잡은 환경에서의 음성인식 (Noisy Speech Recognition using Probabilistic Spectral Subtraction)

  • 지상문;오영환
    • 한국음향학회지
    • /
    • 제16권6호
    • /
    • pp.94-99
    • /
    • 1997
  • 본 논문에서는 잡음환경에서의 음성인식을 위하여 잡음의 확률적 특성과 음성모델을 이용하는 확률적 스펙트럼 차감법을 제안한다. 기존의 스펙트럼 차감법은 음성이 존재하지 않는 구간에서 추정한 잡음을 잡음음성에서 차감하여 잡음을 제거함로, 추정한 잡음의 형태가 음성인식기에 입력되는 잡음음성에 포함된 잡음과 상이한 특성을 나타낼 경우에는 효과적인 잡음의 제거가 불가능하다. 이러한 단점을 보완하기 위해서 여러 가지 형태를 가지는 잡음의 원형을 사용하여, 잡음음성에서 잡음을 제거하는 방법을 사용하였다. 잡음의 확률적인 특성을 여러 개의 잡음원형으로 나타내므로, 스펙트럼 차감법은 입력음성에 대해서 확률적으로 수행되어 잡음이 제거된 다중의 스펙트럼을 출력하게 되고, 인식시에는 조용한 환경의 음성으로 학습된 음성모델에 따른 최적의 스펙트럼을 이용하여 인식을 수행한다. 또한 정적인 파라미터와 동적인 특징파라미터를 동시에 고려하여 잡음을 영향을 최소화하므로 보다 효과적인 잡음처리가 가능하다. 제안한 방법의 타당성을 실험적으로 검증하기 위해서, 잡음환경의 음성인식에 적용하였다. SNR 10 dB인 50개의 고립단어에 대한 실험결과, 잡음처리를 하지 않았을 경우 72.75%, 스펙트럼 차감법은 80.25%, 제안한 방법을 사용하였을 경우는 86.25%의 인식률을 얻음으로써, 효과적인 잡음처리 방법임을 확인할 수 있었다.

  • PDF

장.노년기 성인의 유창성 특성 연구 (Speech Fluency Characteristics of Adults in Their Manhood and Senescence)

  • 전희숙;김효정;신명선;장현진
    • 한국콘텐츠학회논문지
    • /
    • 제11권3호
    • /
    • pp.318-326
    • /
    • 2011
  • 노인인구 증가와 함께 신경학적 결함을 가진 장 노년기 성인들이 증가하면서 신경 말 언어장애 성인들 의 재활을 위하여 구어 유창성에 대한 기초연구가 필요하다. 그래서 본 연구에서는 신경언어장애가 없는 50대에서 70대 정상 성인을 대상으로 언어표본을 수집하여 연령 및 성별로 구어 유창성의 특성을 비교 분석하였다. 50대, 60대, 70대 각 연령대별 30명(남15명, 여15명)씩 총 90명의 언어 표본을 수집하여 구어 속도, 비유창성 빈도 등을 비교한 결과, 첫째, 70대 성인의 구어 속도가 50대 및 60대 성인의 구어 속도보다 느렸다. 그리고 50대, 60대 및 70대 모두 성별 간 구어 속도에서 차이가 없었다. 둘째, 50대, 60대 및 70대 성인들 간 정상적 비유창성 및 전체 비유창성의 빈도에서 차이가 없었다. 각 연령대별 성별 간에도 차이가 없었다. 셋째, 모든 연령대 성인들의 구어 속도와 비유창성 빈도와 상관이 없었다.

말소리와 성격 이미지 (Speech sound and personality impression)

  • 이은영;유혜옥
    • 말소리와 음성과학
    • /
    • 제9권4호
    • /
    • pp.59-67
    • /
    • 2017
  • Regardless of their intention, listeners tend to assess speakers' personalities based on the sounds of the speech they hear. Assessment criteria, however, have not been fully investigated to indicate whether there is any relationship between the acoustic cue of produced speech sounds and perceived personality impression. If properly investigated, the potential relationship between these two will provide crucial insights on the aspects of human communications and further on human-computer interaction. Since human communications have distinctive characteristics of simultaneity and complexity, this investigation would be the identification of minimum essential factors among the sounds of speech and perceived personality impression. The purpose of this study, therefore, is to identify significant associations between the speech sounds and perceived personality impression of speaker by the listeners. Twenty eight subjects participated in the experiment and eight acoustic parameters were extracted by using Praat from the recorded sounds of the speech. The subjects also completed the Neo-five Factor Inventory test so that their personality traits could be measured. The results of the experiment show that four major factors(duration average, pitch difference value, pitch average and intensity average) play crucial roles in defining the significant relationship.

The realization of English rhythm by Busan Korean speakers

  • Choe, Wook Kyung
    • 말소리와 음성과학
    • /
    • 제11권4호
    • /
    • pp.81-87
    • /
    • 2019
  • The purpose of the current study is to investigate the realization of speech rhythm in English as spoken by Korean learners of English. The study particularly aims to examine the rhythm metrics of English read speech by learners who speak Busan or the South Kyungsang dialect of Korean. Twenty-four learners whose L1 is Busan Korean and eight native speakers of English read a passage wherein five sentences were segmented and labeled as vocalic and intervocalic intervals. Various rhythm metrics such as %V, Varcos, and Pairwise Variability Indexes (PVIs) were calculated. The results show that Korean learners read English sentences with significantly more vocalic and consonantal intervals at a slower speech rate than native English speakers. The analyses of rhythm metrics revealed that when the speech rate was not normalized, Korean learners' English showed more variability in the length of consonantal and vocalic intervals. However, speech-rate-normalized rhythm metrics for vocalic intervals indicated that Korean learners transferred their L1 rhythmic structures (a syllable-timed language) into their L2 speech (a stress-timed language). Overall, the results suggest that Korean learners' English reflects the rhythmic characteristics of their L1. The effect of the learners' L1 dialect on the realization of L2 speech rhythm is also speculated.

인공와우 어음처리방식을 위한 적응효과 알고리즘의 음성개시점 검출 특성 비교 (Comparison of Speech Onset Detection Characteristics of Adaptation Algorithms for Cochlear Implant Speech Processor)

  • 최성진;김진호;김경환
    • 대한의용생체공학회:의공학회지
    • /
    • 제29권1호
    • /
    • pp.25-31
    • /
    • 2008
  • It is well known that temporal information, i.e speech onset, about input speech can be represented to the response nerve signal of auditory nerve better depending on the adaptation effect occurred in the auditory nerve synapse. In addition, the performance of a speech processor of cochlear implant can be improved by the adaptation effect. In this paper, we observed the emphasis characteristic of speech onset in the recently proposed adaptation algorithm, analyzed the characteristic of performance change according to the variation of parameters and compared with transient emphasis spectral maxima (TESM) is the previous typical strategy. When observing false peaks which are generated everywhere except speech onset, in the case of the proposed model, the false peak were generated much less than in the case of the TESM and it is more distinguishable under noise.

A 3-Level Endpoint Detection Algorithm for Isolated Speech Using Time and Frequency-based Features

  • Eng, Goh Kia;Ahmad, Abdul Manan
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 2004년도 ICCAS
    • /
    • pp.1291-1295
    • /
    • 2004
  • This paper proposed a new approach for endpoint detection of isolated speech, which proves to significantly improve the endpoint detection performance. The proposed algorithm relies on the root mean square energy (rms energy), zero crossing rate and spectral characteristics of the speech signal where the Euclidean distance measure is adopted using cepstral coefficients to accurately detect the endpoint of isolated speech. The algorithm offers better performance than traditional energy-based algorithm. The vocabulary for the experiment includes English digit from one to nine. These experimental results were conducted by 360 utterances from a male speaker. Experimental results show that the accuracy of the algorithm is quite acceptable. Moreover, the computation overload of this algorithm is low since the cepstral coefficients parameters will be used in feature extraction later of speech recognition procedure.

  • PDF

Performance Improvement of Adaptive Noise Cancellation Using a Speech Detector

  • Park, Jang-Sik
    • The Journal of the Acoustical Society of Korea
    • /
    • 제15권2E호
    • /
    • pp.39-44
    • /
    • 1996
  • The performance of two-channel adaptive noise canceller is ofter degraded by the weights perturbation due to the speech signal. In this paper, an adaptive noise canceller employing a speech detector and two adaptation algorithms which are switched according to the speech detector is proposed. When highly correlated speech signal is detected, the tap weights of the adaptive filter are adapted by the sign algorithm. On the other hand, the weights are adapted by the NLMS algorithm when silence is detected or when the characteristics of the noise propagation channel is changed. The employed speech detector utilizes the power ratio of the input and the output of an adaptive linear prediction-error filter. According to the computer simulation, the proposed method yields better performance than conventional ones.

  • PDF

뇌졸중 환자의 말장애와 삼킴장애 치료 (Improving Speech and Swallowing Functions in Patients with Stroke)

  • 권미선
    • 대한후두음성언어의학회지
    • /
    • 제27권1호
    • /
    • pp.11-13
    • /
    • 2016
  • Dysphagia incidence can be up to 90% of patients after CVA disease and most of the patients demonstrate speech problems as well as dysphagia. The term of swallowing includes the entire process of deglutition from the placement of food in the mouth until the food enters to the esophagus through the oral and pharyngeal cavities. Swallowing functions share common anatomic structures and characteristics of physiology with speech in many aspects. Therefore, speech-language pathologists can help people with swallowing disorders. Herein the approaches and rationales for improving speech and swallowing functions in patients with stroke need to be discussed depending on the lesion sites of the brain.

  • PDF

MFCC의 단구간 시간 평균을 이용한 음성/음악 판별 파라미터 성능 향상 (Improving Speech/Music Discrimination Parameter Using Time-Averaged MFCC)

  • 최무열;김형순
    • 대한음성학회지:말소리
    • /
    • 제64호
    • /
    • pp.155-169
    • /
    • 2007
  • Discrimination between speech and music is important in many multimedia applications. In our previous work, focusing on the spectral change characteristics of speech and music, we presented a method using the mean of minimum cepstral distances (MMCD), and it showed a very high discrimination performance. In this paper, to further improve the performance, we propose to employ time-averaged MFCC in computing the MMCD. Our experimental results show that the proposed method enhances the discrimination between speech and music. Moreover, the proposed method overcomes the weakness of the conventional MMCD method whose performance is relatively sensitive to the choice of the frame interval to compute the MMCD.

  • PDF

한국어 자동 발음열 생성 시스템을 위한 예외 발음 연구 (A Study on Exceptional Pronunciations For Automatic Korean Pronunciation Generator)

  • 김선희
    • 대한음성학회지:말소리
    • /
    • 제48호
    • /
    • pp.57-67
    • /
    • 2003
  • This paper presents a systematic description of exceptional pronunciations for automatic Korean pronunciation generation. An automatic pronunciation generator in Korean is an essential part of a Korean speech recognition system and a TTS (Text-To-Speech) system. It is composed of a set of regular rules and an exceptional pronunciation dictionary. The exceptional pronunciation dictionary is created by extracting the words that have exceptional pronunciations, based on the characteristics of the words of exceptional pronunciation through phonological research and the systematic analysis of the entries of Korean dictionaries. Thus, the method contributes to improve performance of automatic pronunciation generator in Korean as well as the performance of speech recognition system and TTS system in Korean.

  • PDF