• Title/Summary/Keyword: normal voices

Search Result 46, Processing Time 0.021 seconds

A Study on Pitch Perception of Normal Korean (한국 성인 음성의 음도인식에 관한 연구)

  • Jeong, Ok-Ran;Kim, Hyung-Soon;Kim, Young-Tae;Sub, Jang-Su
    • Speech Sciences
    • /
    • v.1
    • /
    • pp.315-323
    • /
    • 1997
  • This study attempts to determine the fundamental frequency level of male and female voices that Koreans perceive as normal. Seventy-three college students majoring in Speech Pathology participated in the study on a voluntary basis. The subjects listened to a male voice with fundamental frequency of 60 Hz, 80 Hz, 100 Hz, 120 Hz, 140 Hz, 160 Hz, 180 Hz, and 200 Hz, and a female voice with fundamental frequency of 140 Hz, 160 Hz, 180 Hz, 200 Hz, 220 Hz, 240 Hz, 260 Hz, and 280 Hz. The PSOLA (Pitch Synchronous Overlap). method and harmonic modeling method of speech signal were used to change pitch in the 20 Hz interval. The voices were presented in a random order to prevent listener bias. The results were as follows; Firstly, $46.6\%$ judged male voice with 120 Hz as normal, and $19.2\%$ judged 140 Hz as normal, and another $19.2\%$ judged 160 Hz as normal. Secondly, $50.7\%$ perceived female voice with 220 Hz as normal, and $32.9\%\;and\;30.1\%$ responded to 200 Hz and 240 Hz, respectively. The problems and recommendations for a future investigation are discussed.

  • PDF

Listener's Age Estimation by Prosody Manipulation (운율 변조 양상에 따른 청자의 연령 지각)

  • Kim, Jiyoun;Seong, Cheoljae
    • Phonetics and Speech Sciences
    • /
    • v.6 no.2
    • /
    • pp.81-88
    • /
    • 2014
  • The normal aging process on speech production and these changes are perceived by listeners. This study examined whether age perception changed under various conditions of prosodic manipulations in normal listeners, comparing the prosodic changes according to age and sex in adulthood. The older and younger voices were resynthesized by manipulation of the speaking rate and pitch to shift the perceived age of the groups toward each other. Two-way repeated ANOVA were conducted to determine if the prosodic type of resynthesized cue resulted in a significant shift in perceived age of young and old voices. The manipulation of the speaking rate resulted in a significant shift in perceived age for the older and younger groups. A significant shift in age estimates was not observed for the younger male group when pitch was manipulated. There were significant gender-by-age group interactions for prosodic manipulation type. Age-related changes in the prosodic properties of speech may ultimately influence speech perception.

Development of medical/electrical convergence software for classification between normal and pathological voices (장애 음성 판별을 위한 의료/전자 융복합 소프트웨어 개발)

  • Moon, Ji-Hye;Lee, JiYeoun
    • Journal of Digital Convergence
    • /
    • v.13 no.12
    • /
    • pp.187-192
    • /
    • 2015
  • If the software is developed to analyze the speech disorder, the application of various converged areas will be very high. This paper implements the user-friendly program based on CART(Classification and regression trees) analysis to distinguish between normal and pathological voices utilizing combination of the acoustical and HOS(Higher-order statistics) parameters. It means convergence between medical information and signal processing. Then the acoustical parameters are Jitter(%) and Shimmer(%). The proposed HOS parameters are means and variances of skewness(MOS and VOS) and kurtosis(MOK and VOK). Database consist of 53 normal and 173 pathological voices distributed by Kay Elemetrics. When the acoustical and proposed parameters together are used to generate the decision tree, the average accuracy is 83.11%. Finally, we developed a program with more user-friendly interface and frameworks.

Comparative Analysis of Performance of Established Pitch Estimation Methods in Sustained Vowel of Benign Vocal Fold Lesions (양성후두 질환의 지속모음을 대상으로 한 기존 피치 추정 방법들의 성능 비교 분석)

  • Jang, Seung-Jin;Kim, Hyo-Min;Choi, Seong-Hee;Park, Young-Cheol;Choi, Hong-Shik;Yoon, Young-Ro
    • Speech Sciences
    • /
    • v.14 no.4
    • /
    • pp.179-200
    • /
    • 2007
  • In voice pathology, various measurements calculated from pitch values are proposed to show voice quality. However, those measurements frequently seem to be inaccurate and unreliable because they are based on some wrong pitch values determined from pathological voice data. In order to solve the problem, we compared several pitch estimation methods to propose a better one in pathological voices. From the database of 99 pathological voice and 30 normal voice data, errors derived from pitch estimation were analyzed and compared between pathological and normal voice data or among the vowels produced by patients with benign vocal fold lesions. Results showed that gross pitch errors were observed in the cases of pathological voice data. From the types of pathological voices classified by the degree of aperiodicity in the speech signals, we found that pitch errors were closely related to the number of aperiodic segments. Also, the autocorrelation approach was found to be the most robust pitch estimation in the pathological voice data. It is desirable to conduct further research on the more severely pathological voice data in order to reduce pitch estimation errors.

  • PDF

Automatic Detection of Intonational and Accentual Phrases in Korean Standard Continuous Speech (한국 표준어 연속음성에서의 억양구와 강세구 자동 검출)

  • Lee, Ki-Young;Song, Min-Suck
    • Speech Sciences
    • /
    • v.7 no.2
    • /
    • pp.209-224
    • /
    • 2000
  • This paper proposes an automatic detection method of intonational and accentual phrases in Korean standard continuous speech. We use the pause over 150 msec for detecting intonational phrases, and extract accentual phrases from the intonational phrases by analyzing syllables and pitch contours. The speech data for the experiment are composed of seven male voices and two female voices which read the texts of the fable 'the ant and the grasshopper' and a newspaper article 'manmulsang' in normal speed and in Korean standard variation. The results of the experiment shows that the detection rate of intonational phrases is 95% on the average and that of accentual phrases is 73%. This detection rate implies that we can segment the continuous speech into smaller units(i.e. prosodic phrases) by using the prosodic information and so the objects of speech recognition can narrow down to words or phrases in continuous speech.

  • PDF

Acoustic and Stroboscopic Characteristics of Normal Person's Voices with Advancing Age (연령증가에 따른 정상 노인의 음향분석학적 특징)

  • 진성민;권기환;강현국
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.8 no.1
    • /
    • pp.44-48
    • /
    • 1997
  • Anatomic and physiological changes of the larynx with advancing age result in morphologic changes of the vocal fold and reduced control of the phonatory mechanism in elderly individuals and are reflected in increased unstability of fundamental frequency (Fo). The purpose of this study is to increase current understanding of acoustic and stroboscopic characteristics of normal elderly persons voices. First, phonated /a/ vowel productions by 40 normal adults (20 to 40 years, 20 men and 20 women) and 40 normal elderly persons (60 to 80 years,20 men and 20 women) were analyzed, using CSL (model 4300B) acoustic analysis software, to obtain acoustic measures related to fundamental frequency stability nd vocal resonance characteristics. Second, stroboscopic images of the vocal fold behavior in all subjects were analyzed by experienced specialists. In the men, fundamental frequency variation (vFe) (p<0.01), jitter. (p<0.05), and shimmer (p<0.05) for the older group were significantly higher than the value for the adult group. In the stroboscopic findings, edema of vocal fold had a significant finding in aged men (15%). In the women, vFo (p<0.05), jitter (p<0.05), and noise to harmonic ratio (NHR) (p<0.05) for the older group were significantly higher than the value for e adult group and first formant frequency (F1) (p<0.01) and second formant frequency (F2) (p<0.01) for. the older group were significantly lower than the value for the adult group. In the stroboscopic findings, vocal fold atrophy had a significant finding in aged women (25%). Frequency stability, as reflected by vFo, jitter, shimmer, and NHR, decreases with advancing age in men and women and spectral analysis of phonated /a/ vowel productions reveals the lowering of the frequency of F1 and second F2 with advancing age, especially in aged women. Change in the mass of vocal folds, due to atrophy or edema, is considered to be the greatest factor in these acoustic changes.

  • PDF

Laryngeal Cancer Screening using Cepstral Parameters (켑스트럼 파라미터를 이용한 후두암 검진)

  • 이원범;전경명;권순복;전계록;김수미;김형순;양병곤;조철우;왕수건
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.14 no.2
    • /
    • pp.110-116
    • /
    • 2003
  • Background and Objectives : Laryngeal cancer discrimination using voice signals is a non-invasive method that can carry out the examination rapidly and simply without giving discomfort to the patients. n appropriate analysis parameters and classifiers are developed, this method can be used effectively in various applications including telemedicine. This study examines voice analysis parameters used for laryngeal disease discrimination to help discriminate laryngeal diseases by voice signal analysis. The study also estimates the laryngeal cancer discrimination activity of the Gaussian mixture model (GMM) classifier based on the statistical modelling of voice analysis parameters. Materials and Methods : The Multi-dimensional voice program (MDVP) parameters, which have been widely used for the analysis of laryngeal cancer voice, sometimes fail to analyze the voice of a laryngeal cancer patient whose cycle is seriously damaged. Accordingly, it is necessary to develop a new method that enables an analysis of high reliability for the voice signals that cannot be analyzed by the MDVP. To conduct the experiments of laryngeal cancer discrimination, the authors used three types of voices collected at the Department of Otorhinorlaryngology, Pusan National University Hospital. 50 normal males voice data, 50 voices of males with benign laryngeal diseases and 105 voices of males laryngeal cancer. In addition, the experiment also included 11 voices data of males with laryngeal cancer that cannot be analyzed by the MDVP, Only monosyllabic vowel /a/ was used as voice data. Since there were only 11 voices of laryngeal cancer patients that cannot be analyzed by the MDVP, those voices were used only for discrimination. This study examined the linear predictive cepstral coefficients (LPCC) and the met-frequency cepstral coefficients (MFCC) that are the two major cepstrum analysis methods in the area of acoustic recognition. Results : The results showed that this met frequency scaling process was effective in acoustic recognition but not useful for laryngeal cancer discrimination. Accordingly, the linear frequency cepstral coefficients (LFCC) that excluded the met frequency scaling from the MFCC was introduced. The LFCC showed more excellent discrimination activity rather than the MFCC in predictability of laryngeal cancer. Conclusion : In conclusion, the parameters applied in this study could discriminate accurately even the terminal laryngeal cancer whose periodicity is disturbed. Also it is thought that future studies on various classification algorithms and parameters representing pathophysiology of vocal cords will make it possible to discriminate benign laryngeal diseases as well, in addition to laryngeal cancer.

  • PDF

An analysis of a statistical difference of acoustic Parameters' distribution between normal voice and pathological voice (병적 음성과 정상 음성의 음향학적 파라미터 분포에 대한 통계적 분석)

  • 김용주;권순복;김기련;신민철;조철우;왕수건
    • Proceedings of the IEEK Conference
    • /
    • 2001.06d
    • /
    • pp.249-252
    • /
    • 2001
  • The most basic means of communication among humans is a voice. Without speaking of voice technologies, we found it is important and convenient to use a voice in everyday life. But. in consideration to speech recognition systems, we can't always desire a normal voice input as input signal to the system. Generally speaking. a pathological voice as against a normal which is a voice with a problem in the larynx. could be also special case of input voice. Of course, but the distortion of a speech signal by environmental effects i.e., noise or transmission channel was a raised problem. we will take up a pathological voices with laryngeal disease which is essential distortion factor in voice. Also, we are to find out the difference of acoustic parameters distribution between normal and pathological voice by a statistical method in our research.

  • PDF

The Aerodynamic Analysis between Normal Voice and Esophageal Voice (정상인과 식도발성 음성에서의 공기역학적 비교 연구)

  • 박국진;최홍식;정형진;유신영;박준호;김한수
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.9 no.1
    • /
    • pp.5-10
    • /
    • 1998
  • Voice rehabilitation is very important concerning in laryngectomees. Esophageal speech is a common and widely used method of voice restoration. But, until now there is no reliable data which shows the aerodynamic characteristics of esophageal speech. In order to evaluate the vocal quality of normal laryngeal and esophageal speech, several aerodynamic parameters were measured in 13 adults with normal laryngeal voice and 2 excellent esophageal speakers using Aerophone II voice function analyzer. The examined parameters were maximal flow rate, mean airflow rate, subglottic pressure, vocal efficiency, glottic resistance, maximal phonation time and mean sound pressure level. In vocal efficiency, there is no difference between two groups, but in other parameters, marked differences were showed in esophageal speakers, especially mean resistance. Results indicates that esophageal speakers make the efficient voices with poor aerodynamic condition, comparing with normal laryngeal speakers.

  • PDF

A Study of Extracting Acoustic Parameters for Individual Speakers (개별화자의 음성파라미터 추출에 관한 연구: 음성파라미터의 상관관계를 중심으로)

  • Ko, Do-Heung
    • Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.129-143
    • /
    • 2003
  • Fundamental frequency (Fo), jitter, shimmer, and harmonics-to-noise ratio (NHR) have been measured to see their interactions between the parameters using Multi-Dimensional Voice Program (MDVP). 100 Korean normal adults (50 males and 50 females) ranging from their early 20's to their early 30's produced the eight sustained vowels including /a/, /i/, /u/, /c/, /e/,/$\varepsilon$/, /i/, and /e/. The subjects were asked to read the above vowels five times in isolation with the interval of five seconds, respectively. Male voices, on the average, showed 130.7 Hz in Fo, 0.6696% in jitter, 1.8151% in shimmer, and 0.12 in NHR, while female voices showed 232.8 Hz in Fo, 0.9222% in jitter, 1.9199% in shimmer, and 0.1098 in NHR. As to the correlation coefficient, it was found that for male speakers jitter vs. shimmer, shimmer vs. NHR, Fo vs. shimmer, and Fo vs. NHR are statistically significant. It was found that for female subjects jitter vs. shimmer and Fo vs. shimmer are statistically significant. However, it is concluded that the correlation coefficient in females are not meaningful in a practical way though they are all statistically significant.

  • PDF