• Title/Summary/Keyword: speech parameter

Search Result 373, Processing Time 0.024 seconds

A Study of Normal Nasalance and Velopharyngeal Port Activity in the Speech of Korean Adults (정상 성인의 비음도와 비인강 활성도에 관한 연구)

  • Leem Dae-Ho;Shin Hyo-Keun;Baek Jin-A.;Kim Hyun-Gi;Kwon Min-Su
    • Korean Journal of Cleft Lip And Palate
    • /
    • v.7 no.2
    • /
    • pp.123-132
    • /
    • 2004
  • The purpose of this study was to obtain normative nasalance scores for adult subjects speaking the Korean language. Additional objectives of the study were to determine if speaker sex played a role in differences in nasalance score and there was significantly correlation of nasalance score with nasalance slope score. The subjects include 75 healthy young Korean adults with normal oral and velopharyngeal resource and function. They had no history of speech problem, were judged as having normal speech and resonance at the time of testing, and had no upper respiratory tract infections or allergies at the time of testing. The Nasometer II 6400 was used to obtain nasalance scores and nasalance slope scores for /a/, /i/, /e/, /o/, /u/, /ja/, /je/, /wi/, /p'ap'i/ and /sasi/. The data of nasalance and nasalance slope were analyzed statistically. The mean nasalance score of the female was significantly higher than that of male at /a/, /i/, /wi/, /p'ap'i/ and /sasi/(p <0.10). The mean nasalance score of /i/ was highest and that of /o/ was the lowest. In this study, we could not and the relationship of the nasalance score and the closing slope score. However, there was negative correlation between the mean nasalance score and the opening slope score at ie/ and /;ai, positive to /sasi/. These normative nasalance scores for normal young adults speaking the Korean language provide important reference information for Korean cleft palate teams. In the future study of velopharygneal activity with the Nasometer, the opening slope score will be able to be the important parameter.

  • PDF

The Utility of Perturbation, Non-linear dynamic, and Cepstrum measures of dysphonia according to Signal Typing (음성 신호 분류에 따른 장애 음성의 변동률 분석, 비선형 동적 분석, 캡스트럼 분석의 유용성)

  • Choi, Seong Hee;Choi, Chul-Hee
    • Phonetics and Speech Sciences
    • /
    • v.6 no.3
    • /
    • pp.63-72
    • /
    • 2014
  • The current study assessed the utility of acoustic analyses the most commonly used in routine clinical voice assessment including perturbation, nonlinear dynamic analysis, and Spectral/Cepstrum analysis based on signal typing of dysphonic voices and investigated their applicability of clinical acoustic analysis methods. A total of 70 dysphonic voice samples were classified with signal typing using narrowband spectrogram. Traditional parameters of %jitter, %shimmer, and signal-to-noise ratio were calculated for the signals using TF32 and correlation dimension(D2) of nonlinear dynamic parameter and spectral/cepstral measures including mean CPP, CPP_sd, CPPf0, CPPf0_sd, L/H ratio, and L/H ratio_sd were also calculated with ADSV(Analysis of Dysphonia in Speech and VoiceTM). Auditory perceptual analysis was performed by two blinded speech-language pathologists with GRBAS. The results showed that nearly periodic Type 1 signals were all functional dysphonia and Type 4 signals were comprised of neurogenic and organic voice disorders. Only Type 1 voice signals were reliable for perturbation analysis in this study. Significant signal typing-related differences were found in all acoustic and auditory-perceptual measures. SNR, CPP, L/H ratio values for Type 4 were significantly lower than those of other voice signals and significant higher %jitter, %shimmer were observed in Type 4 voice signals(p<.001). Additionally, with increase of signal type, D2 values significantly increased and more complex and nonlinear patterns were represented. Nevertheless, voice signals with highly noise component associated with breathiness were not able to obtain D2. In particular, CPP, was highly sensitive with voice quality 'G', 'R', 'B' than any other acoustic measures. Thus, Spectral and cepstral analyses may be applied for more severe dysphonic voices such as Type 4 signals and CPP can be more accurate and predictive acoustic marker in measuring voice quality and severity in dysphonia.

A Study on the Correlation between Body-Size and MDVP Parameters in the Normal Male and Female Korean Population (정상 한국인의 성별 체형정보와 MDVP 변수간의 상관관계 연구)

  • Kang, Jae-Hwan;Yoo, Jong-Hyang;Kim, Jong-Yeol
    • Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.107-119
    • /
    • 2008
  • This paper intends to investigate the correlation of 12 MDVP measurements with age, sex and body-size of sampled healthy patients. In order to extract pitch and 12 MDVP parameters efficiently and display the correlation of each parameter easily, we developed the speech analysis program using C/C++ and MFC development tool. The sample group consists of 205 males and 343 females with ages 9-81. We collected vowel voices /a/ and 8 body-size measurements from them. Body-size values were taken at 8 different torso positions of each person. We analyzed the matched voice samples and body-size measurements by the developed speech analysis program and SPSS program. The result shows that a typical characteristic age-F0 pattern that F0 of male subjects are rapidly decreased after mutational period and have stable state with age and that of female subjects are slowly changed by overall age. In MDVP, age-STD in males, age-sPPQ in females relationships are especially similar to the age-F0 relationship. In case of male group, sPPQ(0.316%), Jitt(0.04%), Shim(0.25%), APQ(0.28%) variables are increased with age after mutational period. And Jitt(0.042%), sPPQ(0.219%) of females group are increased with age too. In cases of height, weight and BMI there exists a weak correlation with MDVP, which shows a correlation coefficient below 0.25 about male and female groups. The survey of correlation relationship between 8 body-size measurements and MDVP shows a insignificant statistical result by only just having the correlation coefficient maximum in M8-8 and F0(-0.394%) for males and M8-6,7(-0.368%, -0.364%) for females.

  • PDF

A study on the new hybrid recurrent TDNN-HMM architecture for speech recognition (음성인식을 위한 새로운 혼성 recurrent TDNN-HMM 구조에 관한 연구)

  • Jang, Chun-Seo
    • The KIPS Transactions:PartB
    • /
    • v.8B no.6
    • /
    • pp.699-704
    • /
    • 2001
  • ABSTRACT In this paper, a new hybrid modular recurrent TDNN (time-delay neural network)-HMM (hidden Markov model) architecture for speech recognition has been studied. In TDNN, the recognition rate could be increased if the signal window is extended. To obtain this effect in the neural network, a high-level memory generated through a feedback within the first hidden layer of the neural network unit has been used. To increase the ability to deal with the temporal structure of phonemic features, the input layer of the network has been divided into multiple states in time sequence and has feature detector for each states. To expand the network from small recognition task to the full speech recognition system, modular construction method has been also used. Furthermore, the neural network and HMM are integrated by feeding output vectors from the neural network to HMM, and a new parameter smoothing method which can be applied to this hybrid system has been suggested.

  • PDF

How to Express Emotion: Role of Prosody and Voice Quality Parameters (감정 표현 방법: 운율과 음질의 역할)

  • Lee, Sang-Min;Lee, Ho-Joon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.11
    • /
    • pp.159-166
    • /
    • 2014
  • In this paper, we examine the role of emotional acoustic cues including both prosody and voice quality parameters for the modification of a word sense. For the extraction of prosody parameters and voice quality parameters, we used 60 pieces of speech data spoken by six speakers with five different emotional states. We analyzed eight different emotional acoustic cues, and used a discriminant analysis technique in order to find the dominant sequence of acoustic cues. As a result, we found that anger has a close relation with intensity level and 2nd formant bandwidth range; joy has a relative relation with the position of 2nd and 3rd formant values and intensity level; sadness has a strong relation only with prosody cues such as intensity level and pitch level; and fear has a relation with pitch level and 2nd formant value with its bandwidth range. These findings can be used as the guideline for find-tuning an emotional spoken language generation system, because these distinct sequences of acoustic cues reveal the subtle characteristics of each emotional state.

Efficient TTS Database Compression Based on AMR-WB Speech Coder (AMR-WB 음성 부호화기를 이용한 TTS 데이터베이스의 효율적인 압축 기법)

  • Lim, jong-Wook;Kim, Ki-Chul;Kim, Kyeong-Sun;Lee, Hang-Seop;Park, Hae-Young;Kim, Moo-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.3
    • /
    • pp.290-297
    • /
    • 2009
  • This paper presents an improved adaptive multi-rate wideband (AMR-WB) algorithm for the efficient Text-To-Speech (TTS) database compression. The proposed algorithm includes unnecessary common bit-stream (CBS) removal and parameter delta coding combined with speaker-dependent huffman coding to reduce the required bit-rate without any quality degradation. We also propose lossy coding schemes to produce the maximum bit-rate reduction with negligible quality degradation. The proposed lossless algorithm including CBS removal can reduce bit-rate by 12.40% without quality degradation compared with the 12.65 kbps AMR-WB mode. The proposed lossy algorithm can reduce bit-rate by 20.00% with 0.12 PESQ degradation.

State-Dependent Feature-Parameter Weighting By the Contribition of the feature parameter to the performance of the Speech Recongition (음성인식에 있어서 특징 파라미터의 기여도에 기반한 상태별 특징 파라미터 가중)

  • 최환진
    • The Journal of the Acoustical Society of Korea
    • /
    • v.17 no.1
    • /
    • pp.39-48
    • /
    • 1998
  • 본 논문에서는 은닉 마르코프 모델에 기반한 음성인식에 있어서 특징 파라미터의 인식 성능에 미치는 영향의 차이를 인식 시스템에 반영하여 인식성능을 향상시키기 위한 방 법을 제안하였다. 특징 파라미터별 가중치를 유도하기 위해서 우선 상태별 특징 파라미터의 인식율에 대한 기여도를 가중치로 변환하고, 이를 특징 파라미터 각각의 상태에서의 출력확 률에 곱하여 상태별 출력확률을 재 추정하게 된다. 실험결과, "가변가중"방법이 "고정가중" 방법에 비해서 단어 인식의 경우 3.3%, 그리고 문장 인식율의 경우 5.3%의 성능향상을 보 임으로써 상태별 특징 파라미터의 가중이 인식 성능 향상에 유효함을 알 수 있었다.

  • PDF

A Study on the Affinity Between Pairs of Korean Vowels Using the Dynamic Paremeters of Vocal Tract (성도의 다이내믹 피라미터에 의한 한글 모음간의 근사도에 관한 연구)

  • 김중규;안수길
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.19 no.1
    • /
    • pp.1-8
    • /
    • 1982
  • Many researches on the parametric representation of speech ,signals using the adaptive linear prediction method have been studied for the past few years. In this paper, we used the LPC(Linear Predictive Coding)method to analyae the parameters of Korean vowels and by using those parameters we studied the affinity between every pair of Korean vowels. As a result of our study, it is found that each pair of Korean vowels that has a greater phonetic affinity also has a greater affinity of vocal tract parameters than other pairs.

  • PDF

Endpoint Detection of Speech Signal Using Wavelet Transform (웨이브렛 변환을 이용한 음성신호의 끝점검출)

  • 석종원;배건성
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.6
    • /
    • pp.57-64
    • /
    • 1999
  • In this paper, we investigated the robust endpoint detection algorithm in noisy environment. A new feature parameter based on a discrete wavelet transform is proposed for word boundary detection of isolated utterances. The sum of standard deviation of wavelet coefficients in the third coarse and weighted first detailed scale is defined as a new feature parameter for endpoint detection. We then developed a new and robust endpoint detection algorithm using the feature found in the wavelet domain. For the performance evaluation, we evaluated the detection accuracy and the average recognition error rate due to endpoint detection in an HMM-based recognition system across several signal-to-noise ratios and noise conditions.

  • PDF

Reconstruction Effect of the Spectral Entropy for the Voice Activity Detection (음성 활동 구간 검출을 위한 스펙트랄 엔트로피의 재구성 효과)

  • Kwon HO-Min;Han Hag-Yong;Lee Kwang-Seok;Koh Si-Young;Hur Kang-In
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.25-28
    • /
    • 2002
  • Voice activity detection is important Problem in the speech recognition and communication. This paper introduces feature parameter which is reconstructed by the spectral entropy of information theory for the robust voice activity detection in the noise environment, analyzes and compares it with the energy method of voice activity detection and performance. In experiment, we confirmed that the spectral entropy is more feature parameter than the energy method for the robust voice activity detection in the various noise environment.

  • PDF