• 제목/요약/키워드: speech rates

검색결과 271건 처리시간 0.022초

비원어민 한국어 말하기 숙련도 평가와 평가항목의 상관관계 (Correlation analysis of linguistic factors in non-native Korean speech and proficiency evaluation)

  • 양승희;정민화
    • 말소리와 음성과학
    • /
    • 제9권3호
    • /
    • pp.49-56
    • /
    • 2017
  • Much research attention has been directed to identify how native speakers perceive non-native speakers' oral proficiency. To investigate the generalizability of previous findings, this study examined segmental, phonological, accentual, and temporal correlates of native speakers' evaluation of L2 Korean proficiency produced by learners with various levels and nationalities. Our experiment results show that proficiency ratings by native speakers significantly correlate not only with rate of speech, but also with the segmental accuracies. The influence of segmental errors has the highest correlation with the proficiency of L2 Korean speech. We further verified this finding within substitution, deletion, insertion error rates. Although phonological accuracy was expected to be highly correlated with the proficiency score, it was the least influential measure. Another new finding in this study is that the role of pitch and accent has been underemphasized so far in the non-native Korean speech perception studies. This work will serve as the groundwork for the development of automatic assessment module in Korean CAPT system.

고정소수점 DSP(ADSP-2181)을 이용한 실시간 G.723.1 음성부호화기 개발에 관한 연구 (A Study on the Development of the Real-Time G.723.1 Speech Codec Using a Fixed-Point DSP(ADSP-2181))

  • 박정재;정익주
    • 음성과학
    • /
    • 제3권
    • /
    • pp.177-186
    • /
    • 1998
  • This paper describes the procedure of implementing a real-time speech codec, G.723.1 which was developed by DSP Group and standardized by ITU-T, using fixed-point DSP, ADSP-2181. This codec has two bit rates associated with it, 5.3 and 6.3 kbit/s. We implemented only one bit rate, 6.3 kbit/s, of the two with fixed-point 32-bit precision. According to the result of the experiment, the amount of computational burden is about 55 MIPS and its quality is similar to the result of the PC simulation with floating-point arithmetic. In this paper, we proposed a method to use a fixed-point DSP and a procedure for developing a real-time speech codec using DSPs and finally developed a G.723.l speech codec for ADSP-2181.

  • PDF

마비성 조음장애의 임상적 양상에 관한 고찰 (Some Clinical Aspects of Dysarthria)

  • 김현기;김완호;서정환;홍기환;신효근;고도흥
    • 음성과학
    • /
    • 제3권
    • /
    • pp.38-49
    • /
    • 1998
  • Dysarthrias are a sort of neuromotor disorders because of the weakness of neuromotor controls. They are classified in six types on the basis of Mayo clinic research: flaccid, spastic, ataxic, hypokinetic, hypekinetic and mixed types. Five dysarthria types are investigated in this study. MRI, EMG, neuropathological tests are essential diagnostic processing. Visi-Pitch and Spectrgraphy, CSL are used for assessing dysarthria speech. Maximum phonation time, diadochokinetic rate, Voice Onset Time and substitution rate are the speech evaluation parameters. Maximum phonation time and diadochokinetic rates are the lowest in case of spastic and ataxic dysarthrias. Spastic dysarthria shows the substituted glottalized consonants. However, flaccid, ataxic and hypokinetic dysarthrias show the substituted aspirated consonants. VOT is the longest for hypokinetic dysarthria and the shortest for ataxic dysarthria. Jitter shows higher percentage in comparison with control group. Speech evaluation using experimental phonetic instruments help create on international standardization of speech evaluation for speech disorders.

  • PDF

음성 특징 파라미터를 이용한 SVM 기반 육체피로도 진단모델 (An SVM-based physical fatigue diagnostic model using speech features)

  • 김태훈;권철홍
    • 말소리와 음성과학
    • /
    • 제8권2호
    • /
    • pp.17-22
    • /
    • 2016
  • This paper devises a model to diagnose physical fatigue using speech features. This paper presents a machine learning method through an SVM algorithm using the various feature parameters. The parameters used include the significant speech parameters, questionnaire responses, and bio-signal parameters obtained before and after the experiment imposing the fatigue. The results showed that performance rates of 95%, 100%, and 90%, respectively, were observed from the proposed model using three types of the parameters relevant to the fatigue. These results suggest that the method proposed in this study can be used as the physical fatigue diagnostic model, and that fatigue can be easily diagnosed by speech technology.

유창성장애 성인의 말속도와 유창성 측정에 관한 연구 (Measurements of Speaking Rate and Fluency in Stuttering Adults)

  • 신문자
    • 음성과학
    • /
    • 제7권3호
    • /
    • pp.273-284
    • /
    • 2000
  • The purpose of this study was to investigate speech rate and fluency in stuttering adults. It was suggested that a measurement guideline of speech rate and fluency for collecting clinically meaningful data be used. Subjects included 10 adults who stutter (mean age=25;8). Syllables were used as the unit of measurement for analyzing the duration of speech. The mean rate was 241 SPM (syllables per minute) for reading, and 196 SPM for spontaneous speaking. Fluency was also measured in both cases. The correlation between rate of speech and fluency was high (r=0.92). A strong positive correlation was found between different investigators in measuring speech rates and fluencies.

  • PDF

히스토그램 등화와 데이터 증강 기법을 이용한 개선된 음성 감정 인식 (Improved speech emotion recognition using histogram equalization and data augmentation techniques)

  • 허운행;권오욱
    • 말소리와 음성과학
    • /
    • 제9권2호
    • /
    • pp.77-83
    • /
    • 2017
  • We propose a new method to reduce emotion recognition errors caused by variation in speaker characteristics and speech rate. Firstly, for reducing variation in speaker characteristics, we adjust features from a test speaker to fit the distribution of all training data by using the histogram equalization (HE) algorithm. Secondly, for dealing with variation in speech rate, we augment the training data with speech generated in various speech rates. In computer experiments using EMO-DB, KRN-DB and eNTERFACE-DB, the proposed method is shown to improve weighted accuracy relatively by 34.7%, 23.7% and 28.1%, respectively.

RECOGNIZING SIX EMOTIONAL STATES USING SPEECH SIGNALS

  • Kang, Bong-Seok;Han, Chul-Hee;Youn, Dae-Hee;Lee, Chungyong
    • 한국감성과학회:학술대회논문집
    • /
    • 한국감성과학회 2000년도 춘계 학술대회 및 국제 감성공학 심포지움 논문집 Proceeding of the 2000 Spring Conference of KOSES and International Sensibility Ergonomics Symposium
    • /
    • pp.366-369
    • /
    • 2000
  • This paper examines three algorithms to recognize speaker's emotion using the speech signals. Target emotions are happiness, sadness, anger, fear, boredom and neutral state. MLB(Maximum-Likeligood Bayes), NN(Nearest Neighbor) and HMM (Hidden Markov Model) algorithms are used as the pattern matching techniques. In all cases, pitch and energy are used as the features. The feature vectors for MLB and NN are composed of pitch mean, pitch standard deviation, energy mean, energy standard deviation, etc. For HMM, vectors of delta pitch with delta-delta pitch and delta energy with delta-delta energy are used. We recorded a corpus of emotional speech data and performed the subjective evaluation for the data. The subjective recognition result was 56% and was compared with the classifiers' recognition rates. MLB, NN, and HMM classifiers achieved recognition rates of 68.9%, 69.3% and 89.1% respectively, for the speaker dependent, and context-independent classification.

  • PDF

영어의 억양 유형화를 이용한 발화 속도와 남녀 화자에 따른 음향 분석 (An acoustical analysis of speech of different speaking rates and genders using intonation curve stylization of English)

  • 이서배
    • 말소리와 음성과학
    • /
    • 제6권4호
    • /
    • pp.79-90
    • /
    • 2014
  • An intonation curve stylization was used for an acoustical analysis of English speech. For the analysis, acoustical feature values were extracted from 1,848 utterances produced with normal and fast speech rate by 28 (12 women and 16 men) native speakers of English. Men are found to speak faster than women at normal speech rate but no difference is found between genders at fast speech rate. Analysis of pitch point features has it that fast speech has greater Pt (pitch point movement time), Pr (pitch point pitch range), and Pd (pitch point distance) but smaller Ps (pitch point slope) than normal speech. Men show greater Pt, Pr, and Pd than women. Analysis of sentence level features reveals that fast speech has smaller Sr (sentence level pitch range), Sd (sentence duration), and Max (maximum pitch) but greater Ss (sentence slope) than normal speech. Women show greater Sr, Ss, Sp (pitch difference between the first pitch point and the last), Sd, MaxNr (normalized Max), and MinNr (normalized Min) than men. As speech rate increases, women speak with greater Ss and Sr than men.

이용자 태그를 활용한 비디오 스피치 요약의 자동 생성 연구 (Investigating an Automatic Method in Summarizing a Video Speech Using User-Assigned Tags)

  • 김현희
    • 한국문헌정보학회지
    • /
    • 제46권1호
    • /
    • pp.163-181
    • /
    • 2012
  • 본 연구는 스피치 요약의 알고리즘을 구성하기 위해서 방대한 스피치 본문의 복잡한 분석 없이 적용될 수 있는 이용자 태그 기법, 문장 위치 및 문장 중복도 제거 기법의 효율성을 분석해 보았다. 그런 다음, 이러한 분석 결과를 기초로 하여 스피치 요약 방법을 구성, 평가하여 효율적인 스피치 요약 방안을 제안하는 것을 연구 목적으로 하고 있다. 제안된 스피치 요약 방법은 태그 및 표제 키워드 정보를 활용하고 중복도를 최소화하면서 문장 위치에 대한 가중치를 적용할 수 있는 수정된 Maximum Marginal Relevance 모형을 사용하여 구성하였다. 제안된 요약 방법의 성능은 스피치 본문의 단어 빈도 및 단어 위치 정보를 적용하여 상대적으로 복잡한 어휘 처리를 한 Extractor 시스템의 성능과 비교되었다. 비교 결과, 제안된 요약 방법을 사용한 경우가 Extractor 시스템의 경우 보다 평균 정확률은 통계적으로 유의미한 차이를 보이며 더 높았고, 평균 재현율은 더 높았지만 통계적으로 유의미한 차이를 보이지는 못했다.

1차원 SPIHT를 이용한 가변 비트율 음성 부호기의 설계 (Design of a Variable Bit Rate Speech Coder Based on One-dimensional SPIHT)

  • 나훈;정대권
    • 한국음향학회지
    • /
    • 제22권6호
    • /
    • pp.443-451
    • /
    • 2003
  • 코드북 기반의 CELP 부호기는 코드북에 미리 할당된 부호화 비트율에 따라서 여기 신호를 모델링한 후 코드북을 이용하여 음성신호를 합성한다. 따라서 임의의 다양한 비트율을 하나의 부호기에서 지원하지 못하는 단점이 있다. 본 논문에서 제안하는 가변 비트율 부호기는 웨이블렛 변환 (wavelet transform과 1차원 SPIHr (one dimensional SPIHT)를 이용하여 현재 프레임에 할당되는 비트수에 따라서 여기신호를 부호화한다. 또한 CELP 부호기의 경우처럼 특정한 몇 가지 형태로 여기신호(또는 코드북)를 모델링할 필요가 없고, 정확한 피치정보가 없어도 여기신호를 사용자의 요구에 따라 다양한 비트율로 부호화할 수 있다. 그 결과 코드북이 존재하지 않기 때문에 부호기의 복잡도가 낮으며, CELP 기반의 G.729와 G.723.1 부호기와의 음질 비교 결과 동등하거나 나은 결과를 보여준다.