• 제목/요약/키워드: Speech characteristics

검색결과 969건 처리시간 0.023초

우울증 화자 음성의 음향음성학적 특성 (Vocal acoustic characteristics of speakers with depression)

  • 백연숙;김세주;김은연;최예린
    • 말소리와 음성과학
    • /
    • 제4권1호
    • /
    • pp.91-98
    • /
    • 2012
  • The purposes of this paper is to study the characteristics of compared to the speakers voice without depression and speakers with depression, and to propose a objective method for the measurement of the therapeutic effects as well as for diagnostics of depression based on the characteristics. The voice samples obtained from 11 female speakers with depression, aged from 20 to 40, diagnosed as having major depressive disorder by an psychiatrist were compared with those from 12 normal controls with matched sex, age, height, weight, education, smoking, and drinking. The voice samples are taken by a portable digital recorder(TASCAM DR-07, Japan) and analysed using the MDVP(Multi-Dimentional Voice Program) software module from CSL(Computerized Speech Lab, kay elemetrics, co, model 4100). The result of the investigation are as following. First, the average speaking fundamental frequency and loudness range of the speakers with depression group was statistically significantly lower than that of the control group. The pitch range of the control group was rather higher than that of the speakers with depression group, but without statistical significance. Overall speech rates have no statistical difference between two groups. Second, the average speaking fundamental frequency and loudness range have statistically significant negative correlation with Beck Depression Inventory, i. e. more severe depression exhibits lower average speaking fundamental frequency and loudness range. Other vocal parameters such as pitch range and overall speech rate have no statistically meaningful correlations with Beck Depression Inventory.

인공와우이식 아동의 운율 특성 - 조음속도와 쉼, 지속시간을 중심으로 - (The Prosodic Characteristics of Children with Cochlear Implant with Respect to the Articulation Rate, Pause, and Duration)

  • 오순영;성철재
    • 말소리와 음성과학
    • /
    • 제4권4호
    • /
    • pp.117-127
    • /
    • 2012
  • This research reports the prosodic characteristics (including articulation speech rate, pause characteristics, duration) of children with cochlear implants with reference to those of children with normal hearing. Subjects are 8-to 10-year-old children, balancing each number of gender as 24. Dialogue speech data are comprised of four types of sentence patterns. Results show that 1) there's a statistically meaningful difference on articulation speech rate between the two groups. 2) On pauses, they are not observed in exclamatory and declarative sentences in normal children. While imperative sentences show no statistical difference on the number of pauses between the two groups, interrogative sentences do. 3) Declarative, exclamatory, and interrogative sentences reveal statistical difference between the two groups in terms of the sentence's final two-syllable word duration, showing no difference on imperative sentences. 4) When it comes to the RFP (duration ratio of sentence final syllable to penultimate syllable), we no statistically meaningful difference between the two groups in all types of sentences exists. 5) Lastly, RWS (the ratio of sentence final two syllable word duration to that of whole sentence duration) shows statistical difference between two groups in imperative sentences, but not in all the rest types.

혼합여기모델을 이용한 대역 확장된 음성신호의 음질 개선 (Quality Improvement of Bandwidth Extended Speech Using Mixed Excitation Model)

  • 최무열;김형순
    • 대한음성학회지:말소리
    • /
    • 제52호
    • /
    • pp.133-144
    • /
    • 2004
  • The quality of narrowband speech can be enhanced by the bandwidth extension technology. This paper proposes a mixed excitation and an energy compensation method based on Gaussian Mixture Model (GMM). First, we employ the mixed excitation model having both periodic and aperiodic characteristics in frequency domain. We use a filter bank to extract the periodicity features from the filtered signals and model them based on GMM to estimate the mixed excitation. Second, we separate the acoustic space into the voiced and unvoiced parts of speech to compensate for the energy difference between narrowband speech and reconstructed highband, or lowband speech, more accurately. Objective and subjective evaluations show that the quality of wideband speech reconstructed by the proposed method is superior to that by the conventional bandwidth extension method.

  • PDF

뇌성마비 성인 발화의 운율특성 (Prosodic Properties in the Speech of Adults with Cerebral Palsy)

  • 이숙향;고현주;김수진
    • 대한음성학회지:말소리
    • /
    • 제64호
    • /
    • pp.39-51
    • /
    • 2007
  • The purpose of this study is to investigate prosodic characteristics in the speech of adults with cerebral palsy through a comparison with the speech of normal speakers. Ten speakers with cerebral palsy (6 males, 4 females) and 6 normal speakers (3 males, 3 females) served as subjects. The results revealed that, compared to normal speakers, speakers with cerebral palsy showed a slower speech rate, a larger number of intonational phrases(IPs) and pauses, a larger number of accentual phrases(APs) per IP, a longer duration of pauses, and more gradual slopes of [L +H] in APs. However, the two groups showed similar tone patterns in their APs. The results also showed mild to moderate correlations between speech intelligibility and the prosodic properties which showed significant differences between the two groups, suggesting that they could be important prosodic factors to predict speech intelligibility in the speech of adults with cerebral palsy.

  • PDF

회의실 유리창 진동음의 명료도 분석 (Speech Intelligibility Analysis on the Vibration Sound of the Window Glass of a Conference Room)

  • 김윤호;김희동;김석현
    • 한국소음진동공학회:학술대회논문집
    • /
    • 한국소음진동공학회 2006년도 추계학술대회논문집
    • /
    • pp.150-155
    • /
    • 2006
  • Speech intelligibility is investigated on a conference room-window glass coupled system. Using MLS(Maximum Length Sequency) signal as a sound source, acceleration and velocity responses of the window glass are measured by accelerometer and laser doppler vibrometer. MTF(Modulation Transfer Function) is used to identify the speech transmission characteristics of the room and window system. STI(Speech Transmission Index) is calculated by using MTF and speech intelligibility of the room and the window glass is estimated. Speech intelligibilities by the acceleration signal and the velocity signal are compared and the possibility of the wiretapping is investigated. Finally, intelligibility of the conversation sound is examined by the subjective test.

  • PDF

켑스트럼 거리 기반의 음성/음악 판별 성능 향상 (Performance Improvement of Speech/Music Discrimination Based on Cepstral Distance)

  • 박슬한;최무열;김형순
    • 대한음성학회지:말소리
    • /
    • 제56호
    • /
    • pp.195-206
    • /
    • 2005
  • Discrimination between speech and music is important in many multimedia applications. In this paper, focusing on the spectral change characteristics of speech and music, we propose a new method of speech/music discrimination based on cepstral distance. Instead of using cepstral distance between the frames with fixed interval, the minimum of cepstral distances among neighbor frames is employed to increase discriminability between fast changing music and speech. And, to prevent misclassification of speech segments including short pause into music, short pause segments are excluded from computing cepstral distance. The experimental results show that proposed method yields the error rate reduction of$68\%$, in comparison with the conventional approach using cepstral distance.

  • PDF

아동기 말실행증 아동의 조음교대운동 특성 (Alternating Motion Rate Characteristics in Children with Childhood Apraxia of Speech)

  • 박준범;하승희
    • 말소리와 음성과학
    • /
    • 제6권3호
    • /
    • pp.33-40
    • /
    • 2014
  • The purpose of the study was to examine alternating motion rate and its variability in children with childhood apraxia of speech (CAS) compared to typically developing children. Six children with CAS aged 9-12 years old and 10 children who were age-matched participated in the study. This study measured tokens per second and variabilities of the rates during the production of /$p^*$ a/, /$t^*$ a/, and /$k^*$ a/. For variability measures of the rates, each participant was asked to repeat speech tasks three times and the average value of the rates and its standard deviation were obtained. The results revealed that the CAS group showed slower rate only at /$k^*$ a/ than the control group. The CAS group exhibited greater variability of AMR at all the tasks than the control group. The results suggested that variability of AMR might be a more distinctive speech feature to children with CAS than the rate of the speech task.

잡음 환경에서의 음성 감정 인식을 위한 특징 벡터 처리 (Feature Vector Processing for Speech Emotion Recognition in Noisy Environments)

  • 박정식;오영환
    • 말소리와 음성과학
    • /
    • 제2권1호
    • /
    • pp.77-85
    • /
    • 2010
  • This paper proposes an efficient feature vector processing technique to guard the Speech Emotion Recognition (SER) system against a variety of noises. In the proposed approach, emotional feature vectors are extracted from speech processed by comb filtering. Then, these extracts are used in a robust model construction based on feature vector classification. We modify conventional comb filtering by using speech presence probability to minimize drawbacks due to incorrect pitch estimation under background noise conditions. The modified comb filtering can correctly enhance the harmonics, which is an important factor used in SER. Feature vector classification technique categorizes feature vectors into either discriminative vectors or non-discriminative vectors based on a log-likelihood criterion. This method can successfully select the discriminative vectors while preserving correct emotional characteristics. Thus, robust emotion models can be constructed by only using such discriminative vectors. On SER experiment using an emotional speech corpus contaminated by various noises, our approach exhibited superior performance to the baseline system.

  • PDF

인공와우이식 아동의 운율 특성 - 발화속도와 억양기울기를 중심으로 - (The Prosodic Characteristics of Children with Cochlear Implants with Respect to Speech Rate and Intonation Slope)

  • 오순영;성철재;최은아
    • 말소리와 음성과학
    • /
    • 제3권3호
    • /
    • pp.157-165
    • /
    • 2011
  • This study investigated speech rate and intonation slope (least square method; F0, quarter-tone) in normal and CI children's utterances. Each group consisted of 12 people and were divided into groups of children with CI operation (before 3;00), children with CI operation (after 3;00), and normal children. Materials are composed of four kinds of grammatical dialogue sentences which are lacking in respect. Given three groups as independent variables and both speech rate and intonation slope as dependent variables, a one-way ANOVA showed that normal children had faster speech rates and steeper intonation slopes than those of the CI group. More specifically, there was a statistically significant speech rate difference between normal and CI children in all of the sentential patterns but imperative form (p<.01). Additionally, F0 and qtone slope observed in sentential final word showed a significant statistical difference between normal and CI children in imperative form (f0: p<.01; q-tone: p<.05).

  • PDF

합성음의 자연도 향상을 위한 포먼트 궤적 중첩 방법 (Formant Locus Overlapping Method to Enhance Naturalness of Synthetic Speech)

  • 안승권;성굉모
    • 전자공학회논문지B
    • /
    • 제28B권10호
    • /
    • pp.755-760
    • /
    • 1991
  • In this paper, we propose a new formant locus overlapping method which can effectively enhance a naturalness of synthetic speech produced by ddemisyllable based Korean text-to-speech system. At first, Korean demisyllables are divided into several number of segments which have linear formant transition characteristics. Then, database, which is composed of start point and length of each formant segments, is provided. When we synthesize speech with these demisyllable database, we concatenate each formant locus by using a proposed overlapping method which can closely simulate haman articulation mechanism. We have implemented a Korean text-to-speech system by using this method and proved that the formant loci of synthetic speech are similar to those of the natural speech. Finally, we could illustrate that the resulting spectrograms of proposed method are more similar to natural speech than those of conventional method.

  • PDF