• Title/Summary/Keyword: Speech spectrum

Search Result 309, Processing Time 0.029 seconds

Frequency Band Selection Exited Linear Prediction Wideband Speech/Audio Coding Using SBR (SBR을 이용한 주파수 밴드선택 여기 선형예측 광대역 음성/오디오 부호화)

  • Jang, Sunghoon;Lee, Insung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.32 no.6
    • /
    • pp.556-562
    • /
    • 2013
  • This paper is aimed to improve performance of Band-Selection speech/audio Coder reconstucted band spectrum that is not sent by the comfort noise. To improve the performance, we use the Spectral Band Replication(SBR) technique instead of substitution of Comfort noise. To synthesize SBR signal, the SBR algorithm is referenced in selected signals and the spectrum synthesized by SBR is injected to non-selected band. Each sub-band spectrum has been energy-weighted by real audio signal. We propose the enhanced the Band-Selection Coder that utilizes synthesized SBR signal from selected signal instead of comfort noise.

A Study of Acoustic Measurement in Connected Speech with Dysphonia (음성장애 연속구어의 음향학적 분석)

  • Lee, Myoung-Soon
    • Phonetics and Speech Sciences
    • /
    • v.3 no.4
    • /
    • pp.109-115
    • /
    • 2011
  • The purposes of this study were to identify acoustic parameters of connected speech and to contribute to acoustic analysis of dysphonic voice about patient's natural speech voice as well as sustained phonation of vowels. Acoustic parameters of sentences included LTAS (long-term average spectrum) mean and spectral slope over frequence ranges such as 0-4kHz, 0-6kHz, 0-8kHz, 0-12.5kHz as well as HNR. Acoustic parameters of the vowel 'a' included jitter, RAP, shimmer, NHR, and HNR. Based on 'G' of GRBAS for the severity of dysphonia, two experienced raters judged and classified as four groups including controls, mild, moderate and severe dysphonic group. Connected speech was two sentences extracted from 'stroll' passage. Parameters of the vowel and LTAS mean of the sentences were measured by CSL. The spectral slope of the sentences and HNR of the vowel and the sentences were measured by Praat. Data were statistically analyzed by Spearman correlation and Kruskal-Wallis test using SPSS 12.0. The results of this study are as follows: First, jitter, RAP, shimmer and NHR were significantly different between the groups. Second, for several frequencies, LTAS mean and spectral slope of the sentences were significantly different between the groups. Third, the HNR of the sentences were significantly different between the groups. Forth, there was a presence of correlation between HNR and NHR of the vowel and HNR of the sentences. Accordingly, this study concluded that LTAS, spectral slope, and HNR were predictive parameters of connected speech voice for dysphonic voice.

  • PDF

Speech Enhancement in Noisy Speech Using Neural Network (신경회로망을 사용한 잡음이 중첩된 음성 강조)

  • Choi, Jae-Seung
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.42 no.5 s.305
    • /
    • pp.165-172
    • /
    • 2005
  • In speech recognition under a noisy environment, it is necessary to construct a system which reduces the noise and enhances the speech. Then it is effective to imitate the human auditory system which has an excellent analytical spectrum mechanism for speech enhancement. Accordingly, this paper proposes an adaptive method using the auditory mechanism which is called lateral inhibition. This method first estimates the noise intensity by neural network, then adaptively adjusts both the coefficients of the lateral inhibition and the adjusting coefficient of amplitude component according to the noise intensity for each input frame. It is confirmed that the proposed method is effective for speech degraded by white noise, colored noise, and road noise based on the spectral distortion measurement.

Vocal Tract Normalization Using The Power Spectrum Warping (파워 스펙트럼 warping을 이용한 성도 정규화)

  • Yu, Il-Su;Kim, Dong-Ju;No, Yong-Wan;Hong, Gwang-Seok
    • Proceedings of the KIEE Conference
    • /
    • 2003.11b
    • /
    • pp.215-218
    • /
    • 2003
  • The method of vocal tract normalization has been known as a successful method for improving the accuracy of speech recognition. A frequency warping procedure based low complexity and maximum likelihood has been generally applied for vocal tract normalization. In this paper, we propose a new power spectrum warping procedure that can be improve on vocal tract normalization performance than a frequency warping procedure. A mechanism for implementing this method can be simply achieved by modifying the power spectrum of filter bank in Mel-frequency cepstrum feature(MFCC) analysis. Experimental study compared our Proposal method with the well-known frequency warping method. The results have shown that the power spectrum warping is better 50% about the recognition performance than the frequency warping.

  • PDF

Report on Seven Cases on Patients with Autism Spectrum Disorder Treated by Kwakhyangjungkisanhapyukmijihwangtang-gamibang (곽향정기산합육미지황탕가미방(藿香正氣散合六味地黃湯加味方)을 처방한 자폐스펙트럼장애 환아 7례)

  • Lee, Ji Na;Kim, Deog Gon;Lee, Jin Yong
    • The Journal of Pediatrics of Korean Medicine
    • /
    • v.29 no.1
    • /
    • pp.50-59
    • /
    • 2015
  • Objectives The purpose of this study is to report seven cases of autism spectrum disorder treated by oriental medicine. Methods Seven patients who are diagnosed with autism spectrum disorder were treated with herbal medicine (Kwakhyangjungkisanhapyukmijihwangtang-gamibang), and the effect was measured. Results After the treatment, cognitive skill, speech, motor function, communication skill, and the patients' general conditions have gotten better. Conclusions This study has shown that the oriental medical treatment for autism spectrum disorder was effective, but further studies are needed.

A New Method for Segmenting Speech Signal by Frame Averaging Algorithm

  • Byambajav D.;Kang Chul-Ho
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.4E
    • /
    • pp.128-131
    • /
    • 2005
  • A new algorithm for speech signal segmentation is proposed. This algorithm is based on finding successive similar frames belonging to a segment and represents it by an average spectrum. The speech signal is a slowly time varying signal in the sense that, when examined over a sufficiently short period of time (between 10 and 100 ms), its characteristics are fairly stationary. Generally this approach is based on finding these fairly stationary periods. Advantages of the. algorithm are accurate border decision of segments and simple computation. The automatic segmentations using frame averaging show as much as $82.20\%$ coincided with manually verified segmentation of CMU ARCTIC corpus within time range 16 ms. More than $90\%$ segment boundaries are coincided within a range of 32 ms. Also it can be combined with many types of automatic segmentations (HMM based, acoustic cues or feature based etc.).

Noise Spectrum Estimation Using Line Spectral Frequencies for Robust Speech Recognition

  • Jang, Gil-Jin;Park, Jeong-Sik;Kim, Sang-Hun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.31 no.3
    • /
    • pp.179-187
    • /
    • 2012
  • This paper presents a novel method for estimating reliable noise spectral magnitude for acoustic background noise suppression where only a single microphone recording is available. The proposed method finds noise estimates from spectral magnitudes measured at line spectral frequencies (LSFs), under the observation that adjacent LSFs are near the peak frequencies and isolated LSFs are close to the relatively flattened valleys of LPC spectra. The parameters used in the proposed method are LPC coefficients, their corresponding LSFs, and the gain of LPC residual signals, so it suits well to LPC-based speech coders.

Speech/Music Discrimination Using Mel-Cepstrum Modulation Energy (멜 켑스트럼 모듈레이션 에너지를 이용한 음성/음악 판별)

  • Kim, Bong-Wan;Choi, Dea-Lim;Lee, Yong-Ju
    • MALSORI
    • /
    • no.64
    • /
    • pp.89-103
    • /
    • 2007
  • In this paper, we introduce mel-cepstrum modulation energy (MCME) for a feature to discriminate speech and music data. MCME is a mel-cepstrum domain extension of modulation energy (ME). MCME is extracted on the time trajectory of Mel-frequency cepstral coefficients, while ME is based on the spectrum. As cepstral coefficients are mutually uncorrelated, we expect the MCME to perform better than the ME. To find out the best modulation frequency for MCME, we perform experiments with 4 Hz to 20 Hz modulation frequency. To show effectiveness of the proposed feature, MCME, we compare the discrimination accuracy with the results obtained from the ME and the cepstral flux.

  • PDF

Spectral Characteristics and Nasalance Scores of Hypernasality in Patient with Cleft Palate

  • Soh, Byung-Soo;Shin, Hyo-Keun;Kim, Hyun-Gi
    • Speech Sciences
    • /
    • v.12 no.1
    • /
    • pp.27-35
    • /
    • 2005
  • Differential instrumentation for the diagnoses of individuals with Cleft palate has been used to objectively measure speech problems. The Cepstrum Method was used to study the vocal tract transfer function. The vocal tract transfer function and the source spectrum should be considered in the evaluation of nasal resonance. The aim of this study was to collect quantitative data on the acoustic Instrumentation used for evaluating hypernasality. Normal subjects (9 male, 21 female; 37 male children, 20 female children) and individuals with VPI (13 male, 8 female; 16 male children, 9 female) participated in this study. The vowel /i/ was selected to gauge the severances of hypernasality Spectral and Cepstral studies using CSL was used to identify the acoustic characteristics. Cepstrum analysis shows significant differences in quefrency and amplitude. The quefrency of normal groups was shorter than that of the VPI groups, while the amplitude of normal groups was lower than that of the VPI groups. This may have significance in the evaluation 'of nasal resonance.

  • PDF

A Study on a Method of U/V Decision by Using The LSP Parameter in The Speech Signal (LSP 파라미터를 이용한 음성신호의 성분분리에 관한 연구)

  • 이희원;나덕수;정찬중;배명진
    • Proceedings of the IEEK Conference
    • /
    • 1999.06a
    • /
    • pp.1107-1110
    • /
    • 1999
  • In speech signal processing, the accurate decision of the voiced/unvoiced sound is important for robust word recognition and analysis and a high coding efficiency. In this paper, we propose the mehod of the voiced/unvoiced decision using the LSP parameter which represents the spectrum characteristics of the speech signal. The voiced sound has many more LSP parameters in low frequency region. To the contrary, the unvoiced sound has many more LSP parameters in high frequency region. That is, the LSP parameter distribution of the voiced sound is different to that of the unvoiced sound. Also, the voiced sound has the minimun value of sequantial intervals of the LSP parameters in low frequency region. The unvoiced sound has it in high frequency region. we decide the voiced/unvoiced sound by using this charateristics. We used the proposed method to some continuous speech and then achieved good performance.

  • PDF