• Title/Summary/Keyword: vocal feature

Search Result 52, Processing Time 0.033 seconds

Vocal-cord Signal Study based on Phonological Feature for Vocal-cord Signal Isolated-Word recognizer (성대신호 명령어 인식기를 위한 음운자질에 기반한 성대신호 연구)

  • Jung, Young-Giu;Han, Mun-Sung;Cho, Kwan-Hyun
    • 한국HCI학회:학술대회논문집
    • /
    • 2006.02a
    • /
    • pp.565-570
    • /
    • 2006
  • 웨어러블 환경에서 가장 유용한 사용자 인터페이스는 음성이다. 그러나 현재 노이즈 제거 기술로는 웨어러블 환경과 같은 고소음 환경에서 음성 인식기의 실제적인 응용은 거의 불가능하다. 본 논문은 환경노이즈를 원천적으로 차단하는 성대 마이크를 이용한 명령어 인식기를 개발한다. 이를 위해 성대신호를 한국어 음운자질 이론을 기반으로 설명하고, 입력신호를 분석하여 이러한 접근방법의 타당성을 검증한다. 이러한 성대신호의 분석을 위해 스펙트럼과, FFT 결과를 사용하고, MFCC 알고리즘을 이용하여 주파수 영역내의 정보량이 인식에 미치는 영향을 분석한다. 그리고 분석결과를 바탕으로 성대신호 명령어 인식기를 위한 특징벡터로 유/무성음 분리에 사용되는 특징벡터가 유용함을 ZCPA 알고리즘을 이용한 성대신호 명령어 인식기를 개발하여 검증한다. 실험결과 ZCPA 를 사용한 것이 MFCC 에 비해 16%높은 인식률을 보인다.

  • PDF

A Study on the Voice Conversion Algorithm with High Quality (고음질을 갖는 음색변경에 관한 연구)

  • 박형빈;배명진
    • Proceedings of the IEEK Conference
    • /
    • 2000.09a
    • /
    • pp.157-160
    • /
    • 2000
  • In the generally a voice conversion has used VQ(Vector Quantization) for partitioning the spectral feature and has performed by adding an appropriate offset vector to the source speaker's spectral vector. But there is not represented the target speaker's various characteristics because of discrete characteristics of transformed parameter. In this paper, these problems are solved by using the LMR(Linear Multivariate Regression) instead of the mapping codebook which is determined to the relationship of source and target speaker vocal tract characteristics. Also we propose the method for solved the discontinuity which is caused by applying to time aligned parameters using Dynamic Time Warping the time or pitch-scale modified speech. In our proposed algorithm for overcoming the transitional discontinuities, first of all, we don't change time or pitch scale and by using the LMR change a speaker's vocal tract characteristics in speech with non-modified time or pitch. Compared to existed methods based on VQ and LMR, we have much better voice quality in the result of the proposed algorithm.

  • PDF

Speech Parameters for the Robust Emotional Speech Recognition (감정에 강인한 음성 인식을 위한 음성 파라메터)

  • Kim, Weon-Goo
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.16 no.12
    • /
    • pp.1137-1142
    • /
    • 2010
  • This paper studied the speech parameters less affected by the human emotion for the development of the robust speech recognition system. For this purpose, the effect of emotion on the speech recognition system and robust speech parameters of speech recognition system were studied using speech database containing various emotions. In this study, mel-cepstral coefficient, delta-cepstral coefficient, RASTA mel-cepstral coefficient and frequency warped mel-cepstral coefficient were used as feature parameters. And CMS (Cepstral Mean Subtraction) method were used as a signal bias removal technique. Experimental results showed that the HMM based speaker independent word recognizer using vocal tract length normalized mel-cepstral coefficient, its derivatives and CMS as a signal bias removal showed the best performance of 0.78% word error rate. This corresponds to about a 50% word error reduction as compare to the performance of baseline system using mel-cepstral coefficient, its derivatives and CMS.

Effect of Radiation Therapy on Voice Parameters in Early Glottic Cancer and Normal Larynx (방사선 요법이 초기 성대암 및 정상 후두의 음성 지표에 미치는 영향)

  • 김민식;박한종;선동일;박영학;조승호
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.7 no.1
    • /
    • pp.32-38
    • /
    • 1996
  • The preservation of the voice-producing mechanism is an important feature in the management of laryngeal cancer by radiotherapy. But, radiation therapy has certain side effects such as mucositis, tissue edema, necrosis and fibrosis which could effect on normal voice production. Several subjective studies that used questionnaires and auditory perceptual judgements of voice have been interpreted to mean that radiation results in a normal or near-normal voice. Objective evidence of the status of vocal function after radiation treatment, however, is still lacking. We analyzed the changes that occur in voice parameters in a group of patients undergoing radiation therapy, in order to determine the effect of radiation on voice quality. In this study acoustic, aerodynamic measures of vocal function were used to determine the characteristics of voice production. We found that voice parameters in early glottic cancer changed meaningfully comparing to normal larynx with or without radiation and radiation therapy has an little effect on normal larynx.

  • PDF

An Acoustic Investigation of Post-Obstruent Tensification Phenomena

  • Ahn, Hyun-Kee
    • Speech Sciences
    • /
    • v.11 no.4
    • /
    • pp.223-232
    • /
    • 2004
  • This study investigated and compared the acoustic characteristics of the Korean stop sound [k'] in three different phonological environments: the tensified lenis stop [k'] as observed in /prek+kaci/, the fortis stop /k'/ as in /pre+k'aci/, and the fortis stop /k'/ following an obstruent as in /prek+k'aci/. The specific research question was whether or not the tensified lenis stop shares all the acoustic features with the other two kinds of fortis stops. The acoustic measures adopted in this study were H1*-H2*, VOT, length of stop closure, and $F_0$. The major findings were that the three stops showed no significant difference in all the acoustic measures except the length of stop closure. The fortis stop /k'/ following an obstruent showed significantly longer duration of stop closure than the other two stops, both of which showed no significant difference. Based on these phonetic results, this study argued that, for the proper phonological description of post-obstruent tensification, the phonological feature [slack vocal folds] of a lenis stop should be changed into [stiff vocal folds, constricted glottis] that the fortis stops should have.

  • PDF

Study for Extraction of Stable Vocal Features and Definition of the Features (음성의 안정적 변수 추출 및 변수의 의미 연구)

  • Kim, Keun-Ho;Kim, Sang-Gil;Kang, Nam-Sik;Kim, Jong-Yeol
    • Korean Journal of Oriental Medicine
    • /
    • v.17 no.3
    • /
    • pp.97-104
    • /
    • 2011
  • Objectives : In this paper, we proposed a method for selecting reliable variables from various vocal features such as frequency derivative features, frequency band ratios, intensities of 5 vowels and an intensity of a sentence, since some features are sensitive to the variation of a subject's utterance. Methods : To obtain the reliable voice variables, the coefficient of variation (CV) was used as the index to evaluate the level of reliability. Since the distributions of a few features are not Gaussian, but are instead skewed to the right or left, we transformed the features by taking the log or square root. Moreover, the definition of the variables that are suitable to represent the vocal property was explained and analyzed. Results : At first, we recorded the vowels and the sentence five times both in the morning and afternoon of the same day, totally ten recordings from each of six subjects (three males and three females). We then analyzed the CVs of each subject's voice to obtain the stable features with a sufficient repeatability. The features having less than 20% CVs for all six subjects were selected. As a result, 92 stable variables from the 222 features were extracted, which included all the transformed variables. Conclusions : Voice can be widely used to classify the four constitution types and to recognize one's health condition from extracting meaningful features as physical quantity in traditional Korean medicine or Western medicine. Therefore, stable voice variables can be useful in the u-Healthcare system of personalized medicine and for improving diagnostic accuracy.

On the Voiced-Voiceless Distinction in Stops of English

  • Kim, Dae-Won
    • Korean Journal of English Language and Linguistics
    • /
    • v.2 no.1
    • /
    • pp.23-30
    • /
    • 2002
  • Phonologically, the difference between the English stops /b, d, g/ and /p, t, k/ is carried by the presence or the absence of the vocal fold vibration throughout their oral closure phase. If phonology has its foundation in phonetics, there must be phonetic evidence for the voiced-voiceless distinction. This study is aimed to determine whether or not the voiced-voiceless distinction is acceptable or proper in English. The determination was based mainly on findings in the existing literature and in informal experiments. In conclusion, there is no phonetic evidence for the voiced-voiceless distinction both in production and perception. The [voice] appears to be one of potential phonetic correlates of the phonologically voiced stop. It is improper to use the [voice] as independent phonological marker, regardless of position (word-initial, intervocalic, word-final). A feature other than the voiced-voiceless feature must distinguish /b, d, g/ from /p, t, k/.

  • PDF

Voice Conversion Using Linear Multivariate Regression Model and LP-PSOLA Synthesis Method (선형다변회귀모델과 LP-PSOLA 합성방식을 이용한 음성변환)

  • 권홍석;배건성
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.3
    • /
    • pp.15-23
    • /
    • 2001
  • This paper presents a voice conversion technique that modifies the utterance of a source speaker as if it were spoken by a target speaker. Feature parameter conversion methods to perform the transformation of vocal tract and prosodic characteristics between the source and target speakers are described. The transformation of vocal tract characteristics is achieved by modifying the LPC cepstral coefficients using Linear Multivariate Regression (LMR). Prosodic transformation is done by changing the average pitch period between speakers, and it is applied to the residual signal using the LP-PSOLA scheme. Experimental results show that transformed speech by LMR and LP-PSOLA synthesis method contains much characteristics of the target speaker.

  • PDF

Study of Developing SOP for Extracting Stable Vocal Features for Accurate Diagnosis (음성의 안정적 변수 추출을 위한 SOP 개발 연구)

  • Kim, Keun-Ho;Jang, Jun-Su;Kim, Young-Su;Kim, Jong-Yeol
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.25 no.6
    • /
    • pp.1108-1112
    • /
    • 2011
  • Voice can be widely used to classify the four constitution types and to recognize one's health condition from extracting meaningful features as physical quantity in traditional Korean medicine or Western medicine. In this paper, we proposed the method to update the standard operating procedure (SOP) to acquire and record voices for extracting stable vocal features since they are sensitive to the variation of a subject's utterance. At first, we obtained pitch frequencies from vowels and the sentence and intensity form the sentence as features with voices acquired under subjects' utterance conditions and then the deviation ratios of features from median values according to the utterance conditions were obtained and the condition to minimize the ratio was selected as a new SOP. As a result, we decided the SOP for a subject to utter vowels with the length of 2s~1s and sentences with over 2s interval between them after practice, in consideration of the deviation and qualitative requirements. Stable voice features obtained from updated SOP produce accurate diagnosis, which will be developed and simplified for using in the u-Healthcare system of personalized medicine.

Robust Speech Parameters for the Emotional Speech Recognition (감정 음성 인식을 위한 강인한 음성 파라메터)

  • Lee, Guehyun;Kim, Weon-Goo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.22 no.6
    • /
    • pp.681-686
    • /
    • 2012
  • This paper studied the speech parameters less affected by the human emotion for the development of the robust emotional speech recognition system. For this purpose, the effect of emotion on the speech recognition system and robust speech parameters of speech recognition system were studied using speech database containing various emotions. In this study, mel-cepstral coefficient, delta-cepstral coefficient, RASTA mel-cepstral coefficient, root-cepstral coefficient, PLP coefficient and frequency warped mel-cepstral coefficient in the vocal tract length normalization method were used as feature parameters. And CMS (Cepstral Mean Subtraction) and SBR(Signal Bias Removal) method were used as a signal bias removal technique. Experimental results showed that the HMM based speaker independent word recognizer using frequency warped RASTA mel-cepstral coefficient in the vocal tract length normalized method, its derivatives and CMS as a signal bias removal showed the best performance.