• Title/Summary/Keyword: cepstral analysis

Search Result 80, Processing Time 0.039 seconds

A Study on the Consonant Classification Using Fuzzy Inference (퍼지추론을 이용한 한국어 자음분류에 관한 연구)

  • 박경식
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1992.06a
    • /
    • pp.71-75
    • /
    • 1992
  • This paper proposes algorithm in order to classify Korean consonant phonemes same as polosives, fricatives affricates into la sounds, glottalized sounds, aspirated sounds. This three kinds of sounds are one of distinctive characters of the Korean language which don't eist in language same as English. This is thesis on classfication of 14 Korean consonants(k, t, p, s, c, k', t', p', s', c', kh, ph, ch) as a previous stage for Korean phone recognition. As feature sets for classification, LPC cepstral analysis. The eperiments are two stages. First, using short-time speech signal analysis and Mahalanobis distance, consonant segments are detected from original speech signal, then the consonants are classified by fuzzy inference. As the results of computer simulations, the classification rate of the speech data was come to 93.75%.

  • PDF

Weighted filter bank analysis and model adaptation for improving the recognition performance of partially corrupted speech (부분 손상된 음성의 인식성능 향상을 위한 가중 필터뱅크 분석 및 모델 적응)

  • Cho Hoon-Young;Oh Yung-Hwan
    • MALSORI
    • /
    • no.44
    • /
    • pp.157-169
    • /
    • 2002
  • We propose a weighted filter bank analysis and model adaptation (WFBA-MA) scheme to improve the utilization of uncorrupted or less severely corrupted frequency regions for robust speech recognition. A weighted met frequency cepstral coefficient is obtained by weighting log filter bank energies with reliability coefficients and hidden Markov models are also modified to reflect the local reliabilities. Experimental results on TIDIGITS database corrupted by band-limited noises and car noise indicated that the proposed WFBA-MA scheme utilizes the uncorrupted speech information well, significantly improving recognition performance in comparison to multi-band speech recognition systems.

  • PDF

Recognition of Korean Connected Digit Telephone Speech Using the Training Data Based Temporal Filter (훈련데이터 기반의 temporal filter를 적용한 4연숫자 전화음성 인식)

  • Jung, Sung-Yun;Bae, Keun-Sung
    • MALSORI
    • /
    • no.53
    • /
    • pp.93-102
    • /
    • 2005
  • The performance of a speech recognition system is generally degraded in telephone environment because of distortions caused by background noise and various channel characteristics. In this paper, data-driven temporal filters are investigated to improve the performance of a specific recognition task such as telephone speech. Three different temporal filtering methods are presented with recognition results for Korean connected-digit telephone speech. Filter coefficients are derived from the cepstral domain feature vectors using the principal component analysis. According to experimental results, the proposed temporal filtering method has shown slightly better performance than the previous ones.

  • PDF

Implementation of Hidden Markov Model based Speech Recognition System for Teaching Autonomous Mobile Robot (자율이동로봇의 명령 교시를 위한 HMM 기반 음성인식시스템의 구현)

  • 조현수;박민규;이민철
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2000.10a
    • /
    • pp.281-281
    • /
    • 2000
  • This paper presents an implementation of speech recognition system for teaching an autonomous mobile robot. The use of human speech as the teaching method provides more convenient user-interface for the mobile robot. In this study, for easily teaching the mobile robot, a study on the autonomous mobile robot with the function of speech recognition is tried. In speech recognition system, a speech recognition algorithm using HMM(Hidden Markov Model) is presented to recognize Korean word. Filter-bank analysis model is used to extract of features as the spectral analysis method. A recognized word is converted to command for the control of robot navigation.

  • PDF

Enhancement of Mobile Authentication System Performance based on Multimodal Biometrics (다중 생체인식 기반의 모바일 인증 시스템 성능 개선)

  • Jeong, Kanghun;Kim, Sanghoon;Moon, Hyeonjoon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.05a
    • /
    • pp.342-345
    • /
    • 2013
  • 본 논문은 모바일 환경에서의 다중생체인식을 통한 개인인증 시스템을 제안한다. 다중생체인식을 위하여 얼굴인식과 화자인식을 선택하였으며, 시스템의 인식 시나리오는 다음을 따른다. 얼굴인식을 위하여 Modified census transform (MCT) 기반의 얼굴검출과 k-means 클러스터 분석 (cluster analysis) 알고리즘 기반의 눈 검출을 통해 얼굴영역 전처리를 수행하고, principal component analysis (PCA) 기반의 얼굴인증 시스템을 구현한다. 화자인식을 위하여 음성의 끝점 추출과 Mel frequency cepstral coefficient(MFCC) 특징을 추출하고, dynamic time warping (DTW) 기반의 화자 인증 시스템을 구현한다. 그리고 각각의 생체인식을 본 논문에서 제안된 방법을 기반으로 융합하여 인식률을 향상시킨다.

Acoustic Analysis of Voice Change According to Extent of Thyroidectomy (갑상선 수술범위에 따른 음성의 음향적 분석)

  • Kang, Young Ae;Koo, Bon Seok
    • Phonetics and Speech Sciences
    • /
    • v.7 no.4
    • /
    • pp.77-83
    • /
    • 2015
  • Voice complication without the laryngeal nerve injury can occur after thyroidectomy. The purpose of this study is to investigate voice changes according to extent of thyroidectomy with acoustic analysis. Thirty-five female patients with papillary thyroid carcinoma took voice evaluation at before and 1 month, and 3 months after thyroidectomy. Acoustic analysis parameters were speaking fundamental frequency(SFF), min $F_0$, max $F_0$, dynamic range $F_0$, jitter, shimmer, noise-to-harmonic ratio(NHR), and Cepstral prominence peak(CPP). Repeated-measured analysis of variance was applied. Time-related voice changes showed significant differences in all parameters except NHR. At 1 month after surgery, voice quality was worse and pitch was decreasing, but voice quality and pitch were improving at 3-month follow-up. Voice changes according to the extent of surgery were in SFF, max $F_0$, and dynamic range $F_0$. Time by surgery-related voice change existed only in min $F_0$. The result showed that the severity of voice complication depended on the extend of thyroidectomy which had a negative impact on $F_0$-related parameters. The deterioration of voice quality at 1 month after thyroidectomy may be affected by the loss of thyroid hormone in the blood. The descent of $F_0$-related parameters may be impacted by laryngeal fixation of surgical site adhesion.

Effects of Laryngeal Massage on Muscle Tension Dysphonia: A Systematic Review and Meta-Analysis (근긴장성 발성장애의 후두마사지 효과: 체계적 고찰 및 메타분석)

  • Kim, Jaeock
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.32 no.2
    • /
    • pp.64-74
    • /
    • 2021
  • Background and Objectives This study was to investigate the voice quality and articulation effects of laryngeal massage on muscle tension dysphonia (MTD). Materials and Method A systematic review of articles published between January 2000 and December 2020 in Cochrane, PubMed, ScienceDirect, SpingerLink, ERIC, and Naver Academic was conducted. From the total of 2094 articles identified, 10 peer-reviewed articles were included in a meta-analysis. Mean effect sizes of the variables related to voice quality (jitter, shimmer, harmonic to noise ratio or noise to harmonic ratio, high-F0, low-I, cepstral peak prominence) and articulation (F1, F2, F1 slope, F2 slope) were calculated by Hedges'g. Results Meta-analysis of the selected articles showed that laryngeal massage had medium to large effects on all variables of voice quality and articulation except F0-high and F1 slope in the MTD patients. Conclusion This study provided comprehensive clinical evidence that it is highly desirable to apply laryngeal massage to MTD patients.

Channel Compensation for Cepstrum-Based Detection of Laryngeal Diseases (켑스트럼 기반의 후두암 감별을 위한 채널보상)

  • Kim Young Kuk;Kim Su Mi;Kim Hyung Soon;Wang Soo-Geun;Jo Cheol-Woo;Yang Byung-Gon
    • MALSORI
    • /
    • no.50
    • /
    • pp.111-122
    • /
    • 2004
  • Automatic detection of laryngeal diseases by voice is attractive because of its non-intrusive nature. Cepstrum based approach to detect laryngeal cancer shows reliable performance even when the periodicity of voice signals is severely lost, but it has a drawback that it is not robust to channel mismatch due to different microphone characteristics. In this paper, to deal with mismatched training and test microphone conditions, we investigate channel compensation techniques such as Cepstral Mean Subtraction (CMS) and Pole Filtered CMS (PFCMS). According to our experiments, PFCMS yields better performance than CMS. By using PFCMS, we obtained 12% and 40% error reduction over baseline and CMS, respectively.

  • PDF

Spectral Characteristics and Nasalance Scores of Hypernasality in Patient with Cleft Palate

  • Soh, Byung-Soo;Shin, Hyo-Keun;Kim, Hyun-Gi
    • Speech Sciences
    • /
    • v.12 no.1
    • /
    • pp.27-35
    • /
    • 2005
  • Differential instrumentation for the diagnoses of individuals with Cleft palate has been used to objectively measure speech problems. The Cepstrum Method was used to study the vocal tract transfer function. The vocal tract transfer function and the source spectrum should be considered in the evaluation of nasal resonance. The aim of this study was to collect quantitative data on the acoustic Instrumentation used for evaluating hypernasality. Normal subjects (9 male, 21 female; 37 male children, 20 female children) and individuals with VPI (13 male, 8 female; 16 male children, 9 female) participated in this study. The vowel /i/ was selected to gauge the severances of hypernasality Spectral and Cepstral studies using CSL was used to identify the acoustic characteristics. Cepstrum analysis shows significant differences in quefrency and amplitude. The quefrency of normal groups was shorter than that of the VPI groups, while the amplitude of normal groups was lower than that of the VPI groups. This may have significance in the evaluation 'of nasal resonance.

  • PDF

A Study on Feature Extraction using Wavelet Transform for Speech Recognition (웨이블렛 변환을 이용한 음성특징 추출에 관한 연구)

  • Joung Eui-jun;Chang Sung-wook;Yang Sung-il;Kwon Y.
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.33-36
    • /
    • 2001
  • 본 논문에서는 기존의 음성인식에서 사용하는 특징벡터인 MFCC(Mel-Frequency Cepstral Cefficients)를 대신하여 웨이블렛 변환을 이용한 새로운 특징벡터를 추출하는 방법을 제안한다. 새 특징벡터로는 MRA(Multi-Resolution Analysis)를 이용하여 구성하였다. 웨이블렛 변환을 이용한 새로운 특징벡터의 추출 목적은 시간축과 주파수축에서의 더 좋은 해상도를 가지는 성질을 이용하는 것이다. 실험결과에서 웨이블렛 변환을 이용한 새로운 특징벡터를 이용한 인식이 기존의 방식보다 더 좋은 인식률을 보이고 있음을 확인하였다.

  • PDF