• 제목/요약/키워드: Connected speech

검색결과 147건 처리시간 0.021초

연결숫자음 전화음성 인식에서의 오인식 유형 분석 (Analysis of Error Patterns in Korean Connected Digit Telephone Speech Recognition)

  • 김민성;정성윤;손종목;배건성;김상훈
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2003년도 5월 학술대회지
    • /
    • pp.115-118
    • /
    • 2003
  • Channel distortion and coarticulation effect in the connected digit telephone speech make it difficult to recognize, and degrade recognition performance in the telephone environment. In this paper, as a basic research to improve the recognition performance of Korean connected digit telephone, error patterns are investigated and analyzed. Telephone digit speech database released by SITEC with HTK system is used for recognition experiments. Both DWFBA and MRTCN methods are used for feature extraction and channel compensation, respectively. Experimental results are discussed with our findings.

  • PDF

3-5세 일반아동의 말소리에 대한 융합적 분석: 단어와 자발화를 중심으로 (Convergent Analysis on the Speech Sound of Typically Developing Children Aged 3 to 5 : Focused on Word Level and Connected Speech Level)

  • 김윤주;박현주
    • 한국융합학회논문지
    • /
    • 제9권6호
    • /
    • pp.125-132
    • /
    • 2018
  • 본 연구는 단어 및 자발화 평가를 통해 학령전 아동의 말소리 산출 특성과 평가 관련 양상을 살펴보고자 하였다. 이를 위해 3-5세 일반아동 72명(연령별 각각 24명)을 대상으로 아동용발음검사(APAC)를 실시하고, 연령과 성별에 따른 자음정확도와 명료도의 차이, 자음정확도와 명료도 간 상관관계, 자음 위치 및 조음 방법에 따른 말소리 오류 패턴을 분석하였다. 연구 결과, 자음정확도와 명료도는 연령에 따라 증가하였으나 성별에 따른 차이는 없었고, 상관관계는 5세 집단에서 통계적으로 유의했으며, 말소리 오류 패턴 또한 두 평가에서 다르게 나타났다. 본 연구 결과는 아동의 말소리 산출이 언어단위에 따라 다르게 나타나기에, 이들의 말소리 능력을 적절히 파악하려면 단어뿐 아니라 자발화 평가가 병행되어야 함을 보여주었다. 이는 단어에 대한 자음정확도만으로 언어장애 등급을 판정하는 현재 기준에 대한 재검토와 추가적인 연구가 필요함을 시사한다.

훈련데이터 기반의 temporal filter를 적용한 4연숫자 전화음성 인식 (Recognition of Korean Connected Digit Telephone Speech Using the Training Data Based Temporal Filter)

  • 정성윤;배건성
    • 대한음성학회지:말소리
    • /
    • 제53호
    • /
    • pp.93-102
    • /
    • 2005
  • The performance of a speech recognition system is generally degraded in telephone environment because of distortions caused by background noise and various channel characteristics. In this paper, data-driven temporal filters are investigated to improve the performance of a specific recognition task such as telephone speech. Three different temporal filtering methods are presented with recognition results for Korean connected-digit telephone speech. Filter coefficients are derived from the cepstral domain feature vectors using the principal component analysis. According to experimental results, the proposed temporal filtering method has shown slightly better performance than the previous ones.

  • PDF

내전형연축성 발성장애 음성에 대한 켑스트럼과 스펙트럼 분석 (Cepstral and spectral analysis of voices with adductor spasmodic dysphonia)

  • 심희정;정훈;;최병흔;허정화;고도흥
    • 말소리와 음성과학
    • /
    • 제8권2호
    • /
    • pp.73-80
    • /
    • 2016
  • The purpose of this study was to analyze perceptual and spectral/cepstral measurements in patients with adductor spasmodic dysphonia(ADSD). Sixty participants with gender and age matched individuals(30 ADSD and 30 controls) were recorded in reading a sentence and sustained the vowel /a/. Acoustic data were analyzed acoustically by measuring CPP, L/H ratio, mean CPP F0 and CSID, and auditory-perceptual ratings were measured using GRBAS. The main results can be summarized as below: (a) the CSID for the connected speech was significantly higher than for the sustained vowel (b) the G, R and S for the connected speech were significantly higher than for the sustained vowel (c) Spectral/cepstral parameters were significantly correlated with the perceptual parameters, and (d) the ROC analysis showed that the threshold of 13.491 for the CSID achieved a good classification for ADSD, with 86.7% sensitivity and 96.7% specificity. Spectral and cepstral analysis for the connected speech is especially meaningful on cases where perceptual analysis and clinical evaluation alone are insufficient.

음성장애와 샘플유형에 따른 GRBAS 측정치 및 shimmer 비교 (Differences in GRBAS scales and shimmer according to vocal sample types in people with vocal disorders)

  • 신유정;홍기환;심현섭
    • 말소리와 음성과학
    • /
    • 제3권3호
    • /
    • pp.149-155
    • /
    • 2011
  • The purpose of the present study was to identify the differences in GRBAS scales between vocal sample types (sustained vowels and connected speech) for specific laryngeal conditions (vocal nodules, vocal polyps and vocal paralysis) and the relations between GRBAS scale and Shimmer value in each vocal sample type. In this study, the total of 60 voice samples of 30 patients (10 vocal nodules, 10 vocal polyps, 10 vocal paralysis) were examined and MDVP (Multi-dimensional Voice Program) was used to analyze Shimmer value. Three listeners rated two types of samples which were sorted randomly based on GRBAS scale. Three-way ANOVA, one-way ANOVA and paired t-test were used. The outcome of this study was as follow. 1) GRBAS scales varied in vocal sample types. Listeners tended to assess voices as better quality when they listened connected speech rather than sustained vowels. 2) G score of GRBAS and Shimmer were positively correlated with statistical significance. This results show that 1) vocal specialists should consider the sample types in evaluating the severity of voice problem and 2) G score could be a simple and clear method.

  • PDF

한국어 숫자음 전화음성의 채널왜곡에 따른 특징파라미터의 변이 분석 및 인식실험 (Analysis of Feature Parameter Variation for Korean Digit Telephone Speech according to Channel Distortion and Recognition Experiment)

  • 정성윤;손종목;김민성;배건성
    • 대한음성학회지:말소리
    • /
    • 제43호
    • /
    • pp.179-188
    • /
    • 2002
  • Improving the recognition performance of connected digit telephone speech still remains a problem to be solved. As a basic study for it, this paper analyzes the variation of feature parameters of Korean digit telephone speech according to channel distortion. As a feature parameter for analysis and recognition MFCC is used. To analyze the effect of telephone channel distortion depending on each call, MFCCs are first obtained from the connected digit telephone speech for each phoneme included in the Korean digit. Then CMN, RTCN, and RASTA are applied to the MFCC as channel compensation techniques. Using the feature parameters of MFCC, MFCC+CMN, MFCC+RTCN, and MFCC+RASTA, variances of phonemes are analyzed and recognition experiments are done for each case. Experimental results are discussed with our findings and discussions

  • PDF

훈련데이터 기반의 temporal filter를 적용한 한국어 4연숫자 전화음성의 인식실험 (Recognition experiment of Korean connected digit telephone speech using the temporal filter based on training speech data)

  • 정성윤;김민성;손종목;배건성;강점자
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2003년도 10월 학술대회지
    • /
    • pp.149-152
    • /
    • 2003
  • In this paper, data-driven temporal filter methods[1] are investigated for robust feature extraction. A principal component analysis technique is applied to the time trajectories of feature sequences of training speech data to get appropriate temporal filters. We did recognition experiments on the Korean connected digit telephone speech database released by SITEC, with data-driven temporal filters. Experimental results are discussed with our findings.

  • PDF

연속음 처리를 위한 프랙탈 차원 방법 고찰 (Fractal Dimension Method for Connected-digit Recognition)

  • 김태식
    • 음성과학
    • /
    • 제10권2호
    • /
    • pp.45-55
    • /
    • 2003
  • Strange attractor can be used as a presentation method for signal processing. Fractal dimension is well known method that extract features from attractor. Even though the method provides powerful capabilities for speech processing, there is drawback which should be solved in advance. Normally, the size of the raw signal should be long enough for processing if we use the fractal dimension method. However, in the area of connected-digits problem, normally, syllable or semi-syllable based processing is applied. In this case, there is no evidence that we have sufficient data or not to extract characteristics of attractor. This paper discusses the relationship between the size of the signal data and the calculation result of fractal dimension, and also discusses the efficient way to be applied to connected-digit recognition.

  • PDF

구강 개방 상태에 따른 말 명료도 및 말 용인도 특성 (Characteristics of speech intelligibility and speech acceptability connected with mouth opening condition)

  • 송윤경
    • 말소리와 음성과학
    • /
    • 제3권3호
    • /
    • pp.141-148
    • /
    • 2011
  • There are many factors that affect speech intelligibility and speech acceptability. Structural anomalies and neuromotor pathologies are known for the reasons of abnormal speech sounds. And there are minor variations related to oral mechanism. Speaking with restricted mouth opening related to therapeutic procedure or habitual speech pattern might affect the quality of speech sounds. So this study compared speech intelligibility and speech acceptability of recorded 24 words in two conditions (restricted mouth opening condition and normal mouth opening condition) by 30 normal hearing adults. The results showed that speech intelligibility and speech acceptability were significantly lower in restricted mouth opening condition. And speech acceptability was significantly lower than speech intelligibility in restricted mouth opening condition. Speech acceptability in restricted mouth opening condition was significantly lower especially in open vowel. These findings indicated that the mouth opening condition could affect vowel shape and could be an adverse effect on speech intelligibility and speech acceptability.

  • PDF

채널보상기법 및 특징파라미터에 따른 한국어 연속숫자음 전화음성의 인식성능 비교 (Comparison of the recognition performance of Korean connected digit telephone speech depending on channel compensation methods and feature parameters)

  • 정성윤;김민성;손종목;배건성;김상훈
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2002년도 11월 학술대회지
    • /
    • pp.201-204
    • /
    • 2002
  • As a preliminary study for improving recognition performance of the connected digit telephone speech, we investigate feature parameters as well as channel compensation methods of telephone speech. The CMN and RTCN are examined for telephone channel compensation, and the MFCC, DWFBA, SSC and their delta-features are examined as feature parameters. Recognition experiments with database we collected show that in feature level DWFBA is better than MFCC and for channel compensation RTCN is better than CMN. The DWFBA+Delta_ Mel-SSC feature shows the highest recognition rate.

  • PDF