• Title/Summary/Keyword: cepstral analysis

Search Result 80, Processing Time 0.025 seconds

Classification of Diphthongs using Acoustic Phonetic Parameters (음향음성학 파라메터를 이용한 이중모음의 분류)

  • Lee, Suk-Myung;Choi, Jeung-Yoon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.32 no.2
    • /
    • pp.167-173
    • /
    • 2013
  • This work examines classification of diphthongs, as part of a distinctive feature-based speech recognition system. Acoustic measurements related to the vocal tract and the voice source are examined, and analysis of variance (ANOVA) results show that vowel duration, energy trajectory, and formant variation are significant. A balanced error rate of 17.8% is obtained for 2-way diphthong classification on the TIMIT database, and error rates of 32.9%, 29.9%, and 20.2% are obtained for /aw/, /ay/, and /oy/, for 4-way classification, respectively. Adding the acoustic features to widely used Mel-frequency cepstral coefficients also improves classification.

Speech Query Recognition for Tamil Language Using Wavelet and Wavelet Packets

  • Iswarya, P.;Radha, V.
    • Journal of Information Processing Systems
    • /
    • v.13 no.5
    • /
    • pp.1135-1148
    • /
    • 2017
  • Speech recognition is one of the fascinating fields in the area of Computer science. Accuracy of speech recognition system may reduce due to the presence of noise present in speech signal. Therefore noise removal is an essential step in Automatic Speech Recognition (ASR) system and this paper proposes a new technique called combined thresholding for noise removal. Feature extraction is process of converting acoustic signal into most valuable set of parameters. This paper also concentrates on improving Mel Frequency Cepstral Coefficients (MFCC) features by introducing Discrete Wavelet Packet Transform (DWPT) in the place of Discrete Fourier Transformation (DFT) block to provide an efficient signal analysis. The feature vector is varied in size, for choosing the correct length of feature vector Self Organizing Map (SOM) is used. As a single classifier does not provide enough accuracy, so this research proposes an Ensemble Support Vector Machine (ESVM) classifier where the fixed length feature vector from SOM is given as input, termed as ESVM_SOM. The experimental results showed that the proposed methods provide better results than the existing methods.

Automatic Phonetic Segmentation of Korean Speech Signal Using Phonetic-acoustic Transition Information (음소 음향학적 변화 정보를 이용한 한국어 음성신호의 자동 음소 분할)

  • 박창목;왕지남
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.8
    • /
    • pp.24-30
    • /
    • 2001
  • This article is concerned with automatic segmentation for Korean speech signals. All kinds of transition cases of phonetic units are classified into 3 types and different strategies for each type are applied. The type 1 is the discrimination of silence, voiced-speech and unvoiced-speech. The histogram analysis of each indicators which consists of wavelet coefficients and SVF (Spectral Variation Function) in wavelet coefficients are used for type 1 segmentation. The type 2 is the discrimination of adjacent vowels. The vowel transition cases can be characterized by spectrogram. Given phonetic transcription and transition pattern spectrogram, the speech signal, having consecutive vowels, are automatically segmented by the template matching. The type 3 is the discrimination of vowel and voiced-consonants. The smoothed short-time RMS energy of Wavelet low pass component and SVF in cepstral coefficients are adopted for type 3 segmentation. The experiment is performed for 342 words utterance set. The speech data are gathered from 6 speakers. The result shows the validity of the method.

  • PDF

Multi-constrained optimization combining ARMAX with differential search for damage assessment

  • K, Lakshmi;A, Rama Mohan Rao
    • Structural Engineering and Mechanics
    • /
    • v.72 no.6
    • /
    • pp.689-712
    • /
    • 2019
  • Time-series models like AR-ARX and ARMAX, provide a robust way to capture the dynamic properties of structures, and their residuals can be effectively used as features for damage detection. Even though several research papers discuss the implementation of AR-ARX and ARMAX models for damage diagnosis, they are basically been exploited so far for detecting the time instant of damage and also the spatial location of the damage. However, the inverse problem associated with damage quantification i.e. extent of damage using time series models is not been reported in the literature. In this paper, an approach to detect the extent of damage by combining the ARMAX model by formulating the inverse problem as a multi-constrained optimization problem and solving using a newly developed hybrid adaptive differential search with dynamic interaction is presented. The proposed variant of the differential search technique employs small multiple populations which perform the search independently and exchange the information with the dynamic neighborhood. The adaptive features and local search ability features are built into the algorithm in order to improve the convergence characteristics and also the overall performance of the technique. The multi-constrained optimization formulations of the inverse problem, associated with damage quantification using time series models, attempted here for the first time, can considerably improve the robustness of the search process. Numerical simulation studies have been carried out by considering three numerical examples to demonstrate the effectiveness of the proposed technique in robustly identifying the extent of the damage. Issues related to modeling errors and also measurement noise are also addressed in this paper.

Effective Mood Classification Method based on Music Segments (부분 정보에 기반한 효과적인 음악 무드 분류 방법)

  • Park, Gun-Han;Park, Sang-Yong;Kang, Seok-Joong
    • Journal of Korea Multimedia Society
    • /
    • v.10 no.3
    • /
    • pp.391-400
    • /
    • 2007
  • According to the recent advances in multimedia computing, storage and searching technology have made large volume of music contents become prevalent. Also there has been increasing needs for the study on efficient categorization and searching technique for music contents management. In this paper, a new classifying method using the local information of music content and music tone feature is proposed. While the conventional classifying algorithms are based on entire information of music content, the algorithm proposed in this paper focuses on only the specific local information, which can drastically reduce the computing time without losing classifying accuracy. In order to improve the classifying accuracy, it uses a new classification feature based on music tone. The proposed method has been implemented as a part of MuSE (Music Search/Classification Engine) which was installed on various systems including commercial PDAs and PCs.

  • PDF

Effects of Injection Laryngoplasty with Hyaluronic Acid in Patients with Vocal Fold Paralysis

  • Kim, Geun-Hyo;Lee, Jae-Seok;Lee, Chang-Yoon;Lee, Yeon-Woo;Bae, In-Ho;Park, Hee-June;Lee, Byung-Joo;Kwon, Soon-Bok
    • Osong Public Health and Research Perspectives
    • /
    • v.9 no.6
    • /
    • pp.354-361
    • /
    • 2018
  • Objectives: The purpose of this study was to explore the effects of injection laryngoplasty (IL) with hyaluronic acid in patients with vocal fold paralysis (VFP). Methods: A total of 50 patients with VFP participated in this study. Pre- and post-IL assessments were performed, which included analyzing the sustained vowel /a/ phonation, and the patient reading 1 Korean sentence from the "Walk" passage that comprised 25 syllables in 10 words. To investigate the effect of IL on vocal fold function, acoustic analysis (acoustic voice quality index, cepstral peak prominence, maximum phonation time, speaking fundamental frequency) was conducted and auditory-perceptual (grade and overall severity), visual judgment (gap), and self-questionnaire (voice handicap index-10) assessments were performed. Results: The patients with VFP showed statistically significant differences between pre-and post-IL assessments for acoustic and auditory-perception, visual judgment, and self-questionnaire assessments. Conclusion: The patients with VFP showed positive change in vocal fold function between pre- and post-IL measurements. The findings showed that IL with hyaluronic acid is an effective method to improve vocal fold function in patients with VFP.

The Effect of Voice Therapy for the Treatment of Functional Aphonia: A Preliminary Study (기능적 실성증에 대한 음성치료의 효과 분석: 기초 연구)

  • Kim, No Eul;Kim, Jun Seok;Oh, Jae Hwan;Kim, Dong Young;Woo, Joo Hyun
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.32 no.2
    • /
    • pp.75-80
    • /
    • 2021
  • Background and Objectives Functional aphonia refers to in which by presenting whispering voice and almost producing very high-pitched tensed voices are produced. Voice therapy is the most effective treatment, but there is a lack of consensus for application of voice therapy. The purpose of this study was to examine the vocal characteristics of functional aphonia and the effect of voice therapy applied accordingly. Materials and Method From October 2019 to December 2020, 11 patients with functional aphonia were treated using voice therapy which was processing three stages such as vocal hygiene, trial therapy, and behavioral therapy. Of these, 7 patients who completed the voice evaluation before and after voice therapy was enrolled in this study. By retrospective chart review, clinical information such as sex, age, symptoms, duration, social and medical history, process of voice therapy, subjective and objective findings were analyzed. Voice parameters before and after voice therapy were compared. Results In GRBAS study, grade, rough, and asthenic, and in Consensus Auditory-Perceptual Evaluation of Voice, overall severity, roughness, pitch, and loudness were significantly improved after voice therapy. In Voice handicap index, all of the scores of total and sub-categories were significantly decreased. In objective voice analysis, jitter, cepstral peak prominence, and maximum phonation time were significantly improved. Conclusion The voice therapy was effective for the treatment of functional aphonia by restoring patient's vocalization and improving voice quality, pitch and loudness.

Acoustic Analysis and Auditory-Perceptual Assessment for Diagnosis of Functional Dysphonia (기능성 음성장애의 진단을 위한 음향학적, 청지각적 평가)

  • Kim, Geun-Hyo;Lee, Yeon-Yoo;Bae, In-Ho;Lee, Jae-Seok;Lee, Chang-Yoon;Park, Hee-June;Lee, Byung-Joo;Kwon, Soon-Bok
    • Journal of Clinical Otolaryngology Head and Neck Surgery
    • /
    • v.29 no.2
    • /
    • pp.212-222
    • /
    • 2018
  • Background and Objectives : The purpose of this study was to compare the measured values of acoustic and auditory perceptual assessments between normal and functional dysphonia (FD) groups. Materials and Methods : 102 subjects with FD and 59 normal voice groups were participated in this study. Mid-vowel portion of the sustained vowel /a/ and two sentences of 'Sanchaek' were edited, concatenated, and analyzed by Praat script. And then auditory-perceptual (AP) rating was completed by three listeners. Results : The FD group showed higher acoustic voice quality index version 2.02 and version 3.01 (AVQIv2 and AVQIv3), slope, Hammarberg index (HAM), grade (G) and overall severity (OS), values than normal group. Additionally, smoothed cepstral peak prominence in Praat (PraatCPPS), tilt, low-to high spectral band energies (L/H ratio), long-term average spectrum (LTAS) in FD group were lower than normal voice group. And the correlation among measured values ranged from -0.250 to 0.960. In ROC curve analysis, cutoff values of AVQIv2, AVQIv3, PraatCPPS, slope, tilt, L/H ratio, HAM, and LTAS were 3.270, 2.013, 13.838, -22.286, -9.754, 369.043, 27.912, and 34.523, respectively, and the AUC of each analysis was over .890 in AVQIv2, AVQIv3, and PraatCPPS, over 0.731 in HAM, tilt, and slope, over 0.605 in LTAS and L/H ratio. Conclusions : In conclusion, AVQI and CPPS showed the highest predictive power for distinguishing between normal and FD groups. Acoustic analyses and AP rating as noninvasive examination can reinforce the screening capability of FD and help to establish efficient diagnosis and treatment process plan for FD.

A Signal Processing Technique for Predictive Fault Detection based on Vibration Data (진동 데이터 기반 설비고장예지를 위한 신호처리기법)

  • Song, Ye Won;Lee, Hong Seong;Park, Hoonseok;Kim, Young Jin;Jung, Jae-Yoon
    • The Journal of Society for e-Business Studies
    • /
    • v.23 no.2
    • /
    • pp.111-121
    • /
    • 2018
  • Many problems in rotating machinery such as aircraft engines, wind turbines and motors are caused by bearing defects. The abnormalities of the bearing can be detected by analyzing signal data such as vibration or noise, proper pre-processing through a few signal processing techniques is required to analyze their frequencies. In this paper, we introduce the condition monitoring method for diagnosing the failure of the rotating machines by analyzing the vibration signal of the bearing. From the collected signal data, the normal states are trained, and then normal or abnormal state data are classified based on the trained normal state. For preprocessing, a Hamming window is applied to eliminate leakage generated in this process, and the cepstrum analysis is performed to obtain the original signal of the signal data, called the formant. From the vibration data of the IMS bearing dataset, we have extracted 6 statistic indicators using the cepstral coefficients and showed that the application of the Mahalanobis distance classifier can monitor the bearing status and detect the failure in advance.

Study for Correlation between Objective and Subjective Voice Parameters in Patients with Dysphonia (발성장애 환자에서 주관적 음성검사와 객관적 음성검사의 연관성 연구)

  • Park, Jung Woo;Kim, Boram;Oh, Jae Hwan;Kang, Tae Kyu;Kim, Dong Young;Woo, Joo Hyun
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.30 no.2
    • /
    • pp.118-123
    • /
    • 2019
  • Background and Objectives Voice evaluation is classified into subjective tests such as auditory perception and self-measurement, and objective tests such as acoustic and aerodynamic analysis. When evaluating dysphonia, subjective and objective test results do not always match. The purpose of this study was to analyze the relationship between subjective and objective evaluation in patients with dysphonia and to identify meaningful parameters by disease. Materials and Method The total of 322 patients who visited voice clinic from May 2017 to May 2018 were included in this study. Laryngeal lesions were identified using stroboscopy. Pearson correlation test was performed to analyse correlation between subjective tests including GRBAS scale and voice handicap index, and objective tests including jitter, shimmer, noise to harmonic ratio (NHR), cepstral peak prominence (CPP), maximal phonation time (MPT), mean flow rate, and subglottic pressure. Results In vocal nodule and sulcus vocalis, among GRBAS system, grade and breathiness showed good correlation with CPP, and roughness showed good correlation with jitter or shimmer. In unilateral vocal cord paralysis (UVCP), grade and breathiness showed a very good correlation with CPP, and also good correlation with jitter, shimmer, NHR, and MPT. Also asthenia showed good correlation with CPP and MPT. Vocal polyp has a limited association with other diseases. Conclusion In patients with dysphonia, grade and breathiness showed good correlation with CPP, jitter, and shimmer, and reflect the state of voice change well especially in UVCP, CPP, and MPT.