• 제목/요약/키워드: speech features

검색결과 652건 처리시간 0.031초

연속구어 내 발성 종결-개시의 음향학적 특징 - 말더듬 화자와 비말더듬 화자 비교 - (Acoustic Features of Phonatory Offset-Onset in the Connected Speech between a Female Stutterer and Non-Stutterers)

  • 한지연;이옥분
    • 음성과학
    • /
    • 제13권2호
    • /
    • pp.19-33
    • /
    • 2006
  • The purpose of this paper was to examine acoustical characteristics of phonatory offset-onset mechanism in the connected speech of female adults with stuttering and normal nonfluency. The phonatory offset-onset mechanism refers to the laryngeal articulatory gestures. Those gestures are required to mark word boundaries in phonetic contexts of the connected speech. This mechanism included 7 patterns based on the speech spectrogram. This study showed the acoustic features in the connected speech in the production of female adults with stuttering (n=1) and normal nonfluency (n=3). Speech tokens in V_V, V_H, and V_S contexts were selected for the analysis. Speech samples were recorded by Sound Forge, and the spectrographic analysis was conducted using Praat. Results revealed a stuttering (with a type of block) female exhibited more laryngealization gestures in the V_V context. Laryngealization gesture was more characterized by a complete glottal stop or glottal fry both in V_H and in V_S contexts. The results were discussed from theoretical and clinical perspectives.

  • PDF

억양의 근접복사 유형화를 이용한 감정음성의 음향분석 (An acoustical analysis of emotional speech using close-copy stylization of intonation curve)

  • 이서배
    • 말소리와 음성과학
    • /
    • 제6권3호
    • /
    • pp.131-138
    • /
    • 2014
  • A close-copy stylization of intonation curve was used for an acoustical analysis of emotional speech. For the analysis, 408 utterances of five emotions (happiness, anger, fear, neutral and sadness) were processed to extract acoustical feature values. The results show that certain pitch point features (pitch point movement time and pitch point distance within a sentence) and sentence level features (pitch range of a final pitch point, pitch range of a sentence and pitch slope of a sentence) are affected by emotions. Pitch point movement time, pitch point distance within a sentence and pitch slope of a sentence show no significant difference between male and female participants. The emotions with high arousal (happiness and anger) are consistently distinguished from the emotion with low arousal (sadness) in terms of these acoustical features. Emotions with higher arousal show steeper pitch slope of a sentence. They have steeper pitch slope at the end of a sentence. They also show wider pitch range of a sentence. The acoustical analysis in this study implies the possibility that the measurement of these acoustical features can be used to cluster and identify emotions of speech.

한국어 자음에서 변별 자질들의 지각적 위계 (The Perceptual Hierarchy of Distinctive Features in Korean Consonants)

  • 배문정
    • 말소리와 음성과학
    • /
    • 제2권4호
    • /
    • pp.109-118
    • /
    • 2010
  • Using a speeded classification task (Garner, 1978), we investigated the perceptual interaction of distinctive features in Korean consonants. The main questions of this study were whether listeners can perceptually identify the component features that make up complex consonant sounds, whether these features are processed independently or dependently and whether there is a systematic hierarchy in their dependency. Participants were asked to classify syllables based on their difference in distinctive features in the task. Reaction times for this task were also gathered. For example, participants classified spoken syllables /ta/ and /pa/ as one category and /$t^ha$/ and /$p^ha$/ as another in terms of aspiration condition. In terms of articulation, participants classified /ta/ and /$t^ha$/ as one category and /pa/ and /$p^ha$/ as another. We assumed that the difference between their RTs represents their interdependency. We compared the laryngeal features and place features (Experiment 1), resonance features and place features (Experiment 2), and manner features and laryngeal features (Experiment 3). The results showed that distinctive features were not perceived in a completely independent way, but they had an asymmetric and hierarchical interdependency. The laryngeal features were found to be more independent compared to place and manner features. We discuss these results in the context of perceptual basis in phonology.

  • PDF

사상체질과 음성특징과의 상관관계 연구 (A Study on Correlation between Sasang Constitution and Speech Features)

  • 권철홍;김종열;김근호;한성만
    • 혜화의학회지
    • /
    • 제19권2호
    • /
    • pp.219-228
    • /
    • 2011
  • Objective : Sasang constitution medicine utilizes voice characteristics to diagnose a person's constitution. In this paper we propose methods to analyze Sasang constitution using speech information technology. That is, this study aims at establishing the relationship between Sasang constitutions and their corresponding voice characteristics by investigating various speech variables. Materials & Methods : Voice recordings of 1,406 speakers are obtained whose constitutions have been already diagnosed by the experts in the fields. A total of 144 speech features obtained from five vowels and a sentence are used. The features include pitch, intensity, formant, bandwidth, MDVP and MFCC related variables for each constitution. We analyze the speech variables and find whether there are statistically significant differences among three constitutions. Results : The main speech variables classifying three constitutions are related to pitch and MFCCs for male, and formant and MFCCs for female. The correct decision rate is 73.7% for male Soeumin, 63.3% for male Soyangin, 57.3% for male Taeumin, 74.0% for female Soeumin, 75.6% for female Soyangin, 94.3% for female Taeumin, and 73.0% on the average. Conclusion : Experimental results show that statistically significant correlation between some speech variables and the constitutions is observed.

음성신호의 최적특징을 적응적으로 추출하는 방법에 관한 연구 (A Study on the Adaptive Method for Extracting Optimum Features of Speech Signal)

  • 장승관;차태호;최웅세;김창석
    • 한국통신학회논문지
    • /
    • 제19권2호
    • /
    • pp.373-380
    • /
    • 1994
  • 본 논문에서는 음성신호를 일정한 크기로 적응시켜 최적의 특징을 추출할 수 있는 방법을 제안하였다. 음성신호의 특징을 추출하기 위하여 고속선형예측 알고리즘인 FRLS 적용할 때 음성신호를 일정한 크기로 분할한 후 각 프레임 마다 제안한 균등사기상관함수를 가지고 최적특징을 추출하였다.

  • PDF

Detecting Data which Represent Emotion Features from the Speech Signal

  • Park, Chang-Hyun;Sim, Kwee-Bo;Lee, Dong-Wook;Joo, Young-Hoon
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 2001년도 ICCAS
    • /
    • pp.138.1-138
    • /
    • 2001
  • Usually, when we take a conversation with another, we can know his emotion as well as his idea. Recently, some applications using speech recognition comes out , however, those can recognize only context of various informations which he(she) gave. In the future, machine familiar to human will be a requirement for more convenient life. Therefore, we need to get emotion features. In this paper, we´ll collect a multiplicity of reference data which represent emotion features from the speech signal. As our final target is to recognize emotion from a stream of speech, as such, we must be able to understand features that represent emotion. There are much emotions human can show. the delicate difference of emotions makes this recognition problem difficult.

  • PDF

한국어 단음절에서 자음과 모음 자질의 비선형적 지각 (Nonlinear Interaction between Consonant and Vowel Features in Korean Syllable Perception)

  • 배문정
    • 말소리와 음성과학
    • /
    • 제1권4호
    • /
    • pp.29-38
    • /
    • 2009
  • This study investigated the interaction between consonants and vowels in Korean syllable perception using a speeded classification task (Garner, 1978). Experiment 1 examined whether listeners analytically perceive the component phonemes in CV monosyllables when classification is based on the component phonemes (a consonant or a vowel) and observed a significant redundancy gain and a Garner interference effect. These results imply that the perception of the component phonemes in a CV syllable is not linear. Experiment 2 examined the further relation between consonants and vowels at a subphonemic level comparing classification times based on glottal features (aspiration and lax), on place of articulation features (labial and coronal), and on vowel features (front and back). Across all feature classifications, there were significant but asymmetric interference effects. Glottal feature.based classification showed the least amount of interference effect, while vowel feature.based classification showed moderate interference, and place of articulation feature-based classification showed the most interference. These results show that glottal features are more independent to vowels, but place features are more dependent to vowels in syllable perception. To examine the three-way interaction among glottal, place of articulation, and vowel features, Experiment 3 featured a modified Garner task. The outcome of this experiment indicated that glottal consonant features are independent to both the place of articulation and vowel features, but the place of articulation features are dependent to glottal and vowel features. These results were interpreted to show that speech perception is not abstract and discrete, but nonlinear, and that the perception of features corresponds to the hierarchical organization of articulatory features which is suggested in nonlinear phonology (Clements, 1991; Browman and Goldstein, 1989).

  • PDF

음성의 감성요소 추출을 통한 감성 인식 시스템 (The Emotion Recognition System through The Extraction of Emotional Components from Speech)

  • 박창현;심귀보
    • 제어로봇시스템학회논문지
    • /
    • 제10권9호
    • /
    • pp.763-770
    • /
    • 2004
  • The important issue of emotion recognition from speech is a feature extracting and pattern classification. Features should involve essential information for classifying the emotions. Feature selection is needed to decompose the components of speech and analyze the relation between features and emotions. Specially, a pitch of speech components includes much information for emotion. Accordingly, this paper searches the relation of emotion to features such as the sound loudness, pitch, etc. and classifies the emotions by using the statistic of the collecting data. This paper deals with the method of recognizing emotion from the sound. The most important emotional component of sound is a tone. Also, the inference ability of a brain takes part in the emotion recognition. This paper finds empirically the emotional components from the speech and experiment on the emotion recognition. This paper also proposes the recognition method using these emotional components and the transition probability.

청각 장애자를 위한 시각 음성 처리 시스템에 관한 연구 (A Study on the Visible Speech Processing System for the Hearing Impaired)

  • 김원기;김남현;유선국;정성현
    • 대한의용생체공학회:학술대회논문집
    • /
    • 대한의용생체공학회 1990년도 춘계학술대회
    • /
    • pp.57-61
    • /
    • 1990
  • The purpose of this study is to help the hearing impaired's speech training with a visible speech processing system. In brief, this system converts the features of speech signals into graphics on monitor, and adjusts the features of hearing impaired to normal ones. There are form ant and pitch in the features used for this system. They are extracted using the digital signal processing such as linear prediotive method or AMDF(Average Magnitude Difference Function). In order to effectively train for the hearing impaired's abnormal speech, easilly visible feature has been being studied.

  • PDF

클래스 히스토그램 등화 기법에 의한 강인한 음성 인식 (Robust Speech Recognition by Utilizing Class Histogram Equalization)

  • 서영주;김회린;이윤근
    • 대한음성학회지:말소리
    • /
    • 제60호
    • /
    • pp.145-164
    • /
    • 2006
  • This paper proposes class histogram equalization (CHEQ) to compensate noisy acoustic features for robust speech recognition. CHEQ aims to compensate for the acoustic mismatch between training and test speech recognition environments as well as to reduce the limitations of the conventional histogram equalization (HEQ). In contrast to HEQ, CHEQ adopts multiple class-specific distribution functions for training and test environments and equalizes the features by using their class-specific training and test distributions. According to the class-information extraction methods, CHEQ is further classified into two forms such as hard-CHEQ based on vector quantization and soft-CHEQ using the Gaussian mixture model. Experiments on the Aurora 2 database confirmed the effectiveness of CHEQ by producing a relative word error reduction of 61.17% over the baseline met-cepstral features and that of 19.62% over the conventional HEQ.

  • PDF