• 제목/요약/키워드: speech error

Search Result 581, Processing Time 0.02 seconds

Voice Personality Transformation Using a Probabilistic Method (확률적 방법을 이용한 음성 개성 변환)

  • Lee Ki-Seung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.3
    • /
    • pp.150-159
    • /
    • 2005
  • This paper addresses a voice personality transformation algorithm which makes one person's voices sound as if another person's voices. In the proposed method, one person's voices are represented by LPC cepstrum, pitch period and speaking rate, the appropriate transformation rules for each Parameter are constructed. The Gaussian Mixture Model (GMM) is used to model one speaker's LPC cepstrums and conditional probability is used to model the relationship between two speaker's LPC cepstrums. To obtain the parameters representing each probabilistic model. a Maximum Likelihood (ML) estimation method is employed. The transformed LPC cepstrums are obtained by using a Minimum Mean Square Error (MMSE) criterion. Pitch period and speaking rate are used as the parameters for prosody transformation, which is implemented by using the ratio of the average values. The proposed method reveals the superior performance to the previous VQ-based method in subjective measures including average cepstrum distance reduction ratio and likelihood increasing ratio. In subjective test. we obtained almost the same correct identification ratio as the previous method and we also confirmed that high qualify transformed speech is obtained, which is due to the smoothly evolving spectral contours over time.

A Study on Adaptive Model Updating and a Priori Threshold Decision for Speaker Verification System (화자 확인 시스템을 위한 적응적 모델 갱신과 사전 문턱치 결정에 관한 연구)

  • 진세훈;이재희;강철호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.5
    • /
    • pp.20-26
    • /
    • 2000
  • In speaker verification system the HMM(hidden Markov model) parameter updating using small amount of data and the priori threshold decision are crucial factor for dealing with long-term variability in people voices. In the paper we present the speaker model updating technique which can be adaptable to the session-to-intra speaker variability and the priori threshold determining technique. The proposed technique decreases verification error rates which the session-to-session intra-speaker variability can bring by adapting new speech data to speaker model parameter through Baum Welch re-estimation. And in this study the proposed priori threshold determining technique is decided by a hybrid score measurement which combines the world model based technique and the cohen model based technique together. The results show that the proposed technique can lead a better performance and the difference of performance is small between the posteriori threshold decision based approach and the proposed priori threshold decision based approach.

  • PDF

Clinical Characteristics of Formal Thought Disorder in Schizophrenia (조현병에서 형식적 사고장애의 임상적 특성)

  • Yang, Chaeyoung;Kim, Han-sung;Kim, Eunkyung;Kim, Il Bin;Park, Seon-Cheol;Choi, Joonho
    • Korean Journal of Biological Psychiatry
    • /
    • v.28 no.2
    • /
    • pp.70-77
    • /
    • 2021
  • Objectives Our study aimed to present the distinctive correlates of formal thought disorder in patients with schizophrenia, using the Clinical Language Disorder Rating Scale (CLANG). Methods We compared clinical characteristics between schizophrenia patients with (n = 84) and without (n = 82) formal thought disorder. Psychometric scales including the CLANG, the Brief Psychiatric Rating Scale (BPRS), the Young Mania Rating Scale (YMRS), the Calgery Depression Scale for Schizophrenia (CDSS) and the Word Fluency Test (WFT) were used. Results After adjusting the effects of age, sex and total scores on the BPRS, YMRS and WFT, the subjects with disorganized speech presented significantly higher score on the abnormal syntax (p = 0.009), lack of semantic association (p = 0.005), discourse failure (p < 0.0001), pragmatics disorder (p = 0.001), dysarthria (p < 0.0001), and paraphasic error (p = 0.005) items than those without formal thought disorder. With defining the mentioned item scores as covariates, binary logistic regression model predicted that discourse failure (adjusted odds ratio [aOR] = 5.88, p < 0.0001) and pragmatics disorder (aOR = 2.17, p = 0.04) were distinctive correlates of formal thought disorder in patients with schizophrenia. Conclusions This study conducted Clinician Rated Dimensions of Psychosis Symptom Severity (CRDPSS) and CLANG scales on 166 hospitalized schizophrenia patients to explore the sub-items of the CLANG scale independently related to formal thought disorders in schizophrenia patients. Discourse failure and pragmatics disorder might be used as the distinctive indexes for formal thought disorder in patients with schizophrenia.

Multi-Emotion Recognition Model with Text and Speech Ensemble (텍스트와 음성의 앙상블을 통한 다중 감정인식 모델)

  • Yi, Moung Ho;Lim, Myoung Jin;Shin, Ju Hyun
    • Smart Media Journal
    • /
    • v.11 no.8
    • /
    • pp.65-72
    • /
    • 2022
  • Due to COVID-19, the importance of non-face-to-face counseling is increasing as the face-to-face counseling method has progressed to non-face-to-face counseling. The advantage of non-face-to-face counseling is that it can be consulted online anytime, anywhere and is safe from COVID-19. However, it is difficult to understand the client's mind because it is difficult to communicate with non-verbal expressions. Therefore, it is important to recognize emotions by accurately analyzing text and voice in order to understand the client's mind well during non-face-to-face counseling. Therefore, in this paper, text data is vectorized using FastText after separating consonants, and voice data is vectorized by extracting features using Log Mel Spectrogram and MFCC respectively. We propose a multi-emotion recognition model that recognizes five emotions using vectorized data using an LSTM model. Multi-emotion recognition is calculated using RMSE. As a result of the experiment, the RMSE of the proposed model was 0.2174, which was the lowest error compared to the model using text and voice data, respectively.

Voice Activity Detection Based on SVM Classifier Using Likelihood Ratio Feature Vector (우도비 특징 벡터를 이용한 SVM 기반의 음성 검출기)

  • Jo, Q-Haing;Kang, Sang-Ki;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.8
    • /
    • pp.397-402
    • /
    • 2007
  • In this paper, we apply a support vector machine(SVM) that incorporates an optimized nonlinear decision rule over different sets of feature vectors to improve the performance of statistical model-based voice activity detection(VAD). Conventional method performs VAD through setting up statistical models for each case of speech absence and presence assumption and comparing the geometric mean of the likelihood ratio (LR) for the individual frequency band extracted from input signal with the given threshold. We propose a novel VAD technique based on SVM by treating the LRs computed in each frequency bin as the elements of feature vector to minimize classification error probability instead of the conventional decision rule using geometric mean. As a result of experiments, the performance of SVM-based VAD using the proposed feature has shown better results compared with those of reported VADs in various noise environments.

Frequency Domain Double-Talk Detector Based on Gaussian Mixture Model (주파수 영역에서의 Gaussian Mixture Model 기반의 동시통화 검출 연구)

  • Lee, Kyu-Ho;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.4
    • /
    • pp.401-407
    • /
    • 2009
  • In this paper, we propose a novel method for the cross-correlation based double-talk detection (DTD), which employing the Gaussian Mixture Model (GMM) in the frequency domain. The proposed algorithm transforms the cross correlation coefficient used in the time domain into 16 channels in the frequency domain using the discrete fourier transform (DFT). The channels are then selected into seven feature vectors for GMM and we identify three different regions such as far-end, double-talk and near-end speech using the likelihood comparison based on those feature vectors. The presented DTD algorithm detects efficiently the double-talk regions without Voice Activity Detector which has been used in conventional cross correlation based double-talk detection. The performance of the proposed algorithm is evaluated under various conditions and yields better results compared with the conventional schemes. especially, show the robustness against detection errors resulting from the background noises or echo path change which one of the key issues in practical DTD.

Arithmetic Fluctuation Effect affected by Induced Emotional Valence (유발된 정서가에 따른 계산 요동의 효과)

  • Kim, Choong-Myung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.2
    • /
    • pp.185-191
    • /
    • 2018
  • This study examined the type and extent of interruption between induced emotion and succeeding arithmetic operation. The experiment was carried out to determine the influence of the induced emotions (anger, joy, and sorrow) and stimulus types (picture and sentence) on the cognitive process load that may block the interactions among the constituents of working memory. The study subjects were 32 undergraduates who were similar with respect to age and education parameters and were especially instructed to attend to induced emotion by imitation of facial expression and to make a correct decision during the remainder calculation task. In the results, the stimulus types did not exhibit any difference but there was a significant difference among the induced emotion types. The difference was observed in slower response time at positive emotion(joy condition) as compared with other emotions(anger and sorrow). More specifically, error and delayed correct response rate for emotion types were analysed to determine which phase the slower response was associated with. Delayed responses of the joy condition by sentence-inducing stimulus were identified with the error rate difference, and those by picture-inducing stimulus with the delayed correct response rate. These findings not only suggest that induced positive emotion increased response time compared to negative emotions, but also imply that picture-inducing stimulus easily affords arithmetic fluctuation whereas sentence-inducing stimulus results in arithmetic failure.

Phonetic analysis of Korean elementary students who had overseas study at early ages (조기 유학 후 귀국한 초등학생의 발음 이상에 대한 음성학적 연구)

  • Ryu, Mee-Heun;Lee, Chang-Woo
    • Clinical and Experimental Pediatrics
    • /
    • v.53 no.4
    • /
    • pp.579-584
    • /
    • 2010
  • Purpose : The number of the repatriated Korean students who had overseas study at early ages is increasing. They received foreign education, they can speak international languages, but they have many difficulties in articulation and intonation of the Korean language. This study aims to measure closure and aspiration duration, length of consonants, length of subsequent vowels, and ratio of consonants against subsequent vowels in vowel-consonant-vowel (VCV) syllables. Methods : This study compares the acoustic and phonetic characteristics of repatriated and native students, the ratio of articulation error of Korean plosives, the closure and aspiration duration, and the ratio of the aspiration duration against the closure duration. Results : The ratio of articulation error of Korean plosives between repatriated and native students is 19% and 2%, respectively. The closure duration was significantly longer in repatriated students than in native students. The aspiration duration was significantly longer in repatriated students than in native students. No difference was found in the ratio of aspiration duration against closure duration between the native and repatriated students. Conclusion : This study can be a good reference for estimating the phonetic difficulties of Korean elementary students who had overseas study at early ages.

Diadochokinetic Skills in Typically developing Children Aged 4-6 Years : Pilot Study (학령전기 정상발달 아동의 자모음 교대운동특성 : 예비연구)

  • Jeong, Han-Jin;Lee, Ok-Bun;Sehr, Kyeung-Hee
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.7
    • /
    • pp.3149-3155
    • /
    • 2011
  • The purpose of this study was to know the characteristics of DDK performance between CV(e.g. 'pa') and VV(e.g., 'ai') syllables in typically developing children aged 4 to 6 years old. 12 TD children performed DDK with CV structure(/pʰə/, /tʰə/, /kʰə/, /pʰətʰə/, /tʰəkʰə?/, /pʰətʰəkʰə/) and with VV structure(/ai/, /ɔi/, /ɑɔi/). Spoken syllables were counted in one second, and all spoken DDK were measured by PC-quirer. The results showed that all spoken DDK became faster as the age of children were increased. This trend was also appeared in both CV and VV syllables repetition. In addition, there was no differences in DDK rate with CV and VV syllables. The frequency of articulatory error during DDK performance was very high in the age of 3, and there was no pattern in the frequency of articulatory error according to the developmental age.

An Effect for Sequential Information Processing by the Anxiety Level and Temporary Affect Induction (불안수준 및 일시적 유발정서가 서열정보 어휘처리에 미치는 효과)

  • Kim, Choong-Myung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.20 no.4
    • /
    • pp.224-231
    • /
    • 2019
  • The current paper was conducted to unravel the influence of affect induction as a background emotion in the process of cognitive task to judge the degree of sequence in groups with or without anxiety symptoms. Four types of affect induction and two sequential task types were used as within-subject variables, and two types of college students groups classified under the Beck Anxiety Inventory (BAI) as a between-subject variable were selected to determine reaction times involving sequential judgment among the lexical relevance information. DmDx5 was used to present a series of stimuli and elicit a response from subjects. Repeated measured ANOVA analyses revealed that reaction times and error rates were significantly larger with anxiety participants compared to the normal group regardless of affect and task types. Within-subject variable effects found that specific affect type (sorrow condition) and number-related task type showed a more rapid response compared to other affect types and magnitude-related task type, respectively. In sum, these findings confirmed the difference in tendency with reaction time and error rates that varied as a function of accompanying affect types as well as anxiety level and task types suggesting the that underlying background affect plays a major role in processing affect-cognitive association tasks.