• Title/Summary/Keyword: Speech spectrum

Search Result 309, Processing Time 0.027 seconds

Speech enhancement system using the multi-band coherence function and spectral subtraction method (다중 주파수 밴드 간섭함수와 스펙트럼 차감법을 이용한 음성 향상 시스템)

  • Oh, Inkyu;Lee, Insung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.38 no.4
    • /
    • pp.406-413
    • /
    • 2019
  • This paper proposes a speech enhancement method through the process of combining the gain function with spectrum subtraction method in the two microphone array with close spacing. A speech enhancement method that uses a gain function estimated by the SNR (Signal-to Noise Ratio) based on the multi frequency band coherence function causes the performance degradation in high correlation between input noises of two channels. A new speech enhancement method is proposed where the weighted gain function is used by combining the gain function from the spectral subtraction. The performance evaluation of the proposed method was shown by comparison with PESQ (Perceptual Evaluation of Speech Quality) value which is an objective quality evaluation test provided by the ITU-T (International Telecommunications Union Telecommunication). In the PESQ tests, the maximum 0.217 of PESQ value is improved in the various background noise environments.

Speech Enhancement Based on Minima Controlled Recursive Averaging Technique Incorporating Conditional MAP (조건 사후 최대 확률 기반 최소값 제어 재귀평균기법을 이용한 음성향상)

  • Kum, Jong-Mo;Park, Yun-Sik;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.27 no.5
    • /
    • pp.256-261
    • /
    • 2008
  • In this paper, we propose a novel approach to improve the performance of minima controlled recursive averaging (MCRA) which is based on the conditional maximum a posteriori criterion. A crucial component of a practical speech enhancement system is the estimation of the noise power spectrum. One state-of-the-art approach is the minima controlled recursive averaging (MCRA) technique. The noise estimate in the MCRA technique is obtained by averaging past spectral power values based on a smoothing parameter that is adjusted by the signal presence probability in frequency subbands. We improve the MCRA using the speech presence probability which is the a posteriori probability conditioned on both the current observation the speech presence or absence of the previous frame. With the performance criteria of the ITU-T P.862 perceptual evaluation of speech quality (PESQ) and subjective evaluation of speech quality, we show that the proposed algorithm yields better results compared to the conventional MCRA-based scheme.

Estimation of Speeker Recognition Parameter using Lyapunov Dimension (Lyapunov 차원을 이용한 화자식별 파라미터 추정)

  • Yoo, Byong-Wook;Kim, Chang-Seok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.4
    • /
    • pp.42-48
    • /
    • 1997
  • This paper has apparaised ability of speaker recognition and speech recognition using correlation dimension and Lyapunov dimension. In this method, speech was regarded the cahos that the random signal is appeared in determinisitic raising system. we deduced exact correlation dimension and Lyapunov dimension with searching important orbit from AR model power spectrum when reconstruct strange attractor using Taken's embedding theory. We considered a usefulness of speech recognition and speaker recognition using correlation dimension and Lyapunov dimension that characterized reconstruction attractor. As a result of consideration, which were of use more the speaker recognition than speech recognition, and in case of speaker recognition using Lyapunov dimension were much recognition rate more than speaker recognitions using correlation dimension.

  • PDF

Energy-Dependent Preemphasis for Speech Signal Preprocessing (음성신호 전처리를 위한 에너지 의존 프리엠퍼시스)

  • Kim, Dong-Jun;Park, Sang-Hui
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.3
    • /
    • pp.18-25
    • /
    • 1997
  • This study describes a modified preemphasis formula, what we call energy-dependent preemphasis(EDP). This uses the normalized short-term energy of speech signal, with the assumption that the source characteristics of the glottal pulses and the radiation characteristics of the lips are approximately proportional to the energy of speech signal. Using this method, speech analyses, such as AR spectrum estimation, formant detection, are performed for nonstationary starting parts of 5 Korean single vowels. The results are compared with the conventional two preemphasis methods. We found that the proposed preemphasis gave enhanced spectral shapes and more accurate formant frequencies and avoided overlapping phenomenon of adjacent two formants.

  • PDF

CELP speech coder by the structure of multi-codebook (다중 코드북 구조를 이용한 CELP형 음성부호화기)

  • 박규정;한승조
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.5 no.1
    • /
    • pp.23-33
    • /
    • 2001
  • In this paper we propose a multi-codebook structure which can synthesize high quality speech without increasing of CELP coder's computation. We also design a 4.8kbps CELP speech coder with the proposed codebook structure. The proposed multi-codebook structure is made up of basic codebook and the other codebook which Is formed for strengthen spectrum an4 pitch. Multi-codebook structure can represent accurate gains since it represents excitation signals as summation of two kinds of codebooks and uses different codebook gains respectively. Therefore it can provide better speech quality than other conventional structures. In computer simulation of the 4.8kpbs CELP coder designed with the proposed codebook structure its segSNR was 0.81dB more high than the DoD CELP coder of same transmission rates.

  • PDF

A DSP Implementation of Subband Sound Localization System

  • Park, Kyusik
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.4E
    • /
    • pp.52-60
    • /
    • 2001
  • This paper describes real time implementation of subband sound localization system on a floating-point DSP TI TMS320C31. The system determines two dimensional location of an active speaker in a closed room environment with real noise presents. The system consists of an two microphone array connected to TI DSP hosted by PC. The implemented sound localization algorithm is Subband CPSP which is an improved version of traditional CPSP (Cross-Power Spectrum Phase) method. The algorithm first split the input speech signal into arbitrary number of subband using subband filter banks and calculate the CPSP in each subband. It then averages out the CPSP results on each subband and compute a source location estimate. The proposed algorithm has an advantage over CPSP such that it minimize the overall estimation error in source location by limiting the specific band dominant noise to that subband. As a result, it makes possible to set up a robust real time sound localization system. For real time simulation, the input speech is captured using two microphone and digitized by the DSP at sampling rate 8192 hz, 16 bit/sample. The source location is then estimated at once per second to satisfy real-time computational constraints. The performance of the proposed system is confirmed by several real time simulation of the speech at a distance of 1m, 2m, 3m with various speech source locations and it shows over 5% accuracy improvement for the source location estimation.

  • PDF

Oral and Nasal Spectral Outputs in Korean Oral Vowels (정상 모음에 대한 구강 및 비강 spectral output 분석)

  • Hong, Ki-Hwan;Choi, Seung-Chul;Kim, Byum-Kyu;Yang, Yoon-Soo;Shim, Hyun-Ah
    • Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.145-157
    • /
    • 2003
  • Vowels are classified by the shapes of vocal tract. These shapes form constriction points along the tract, which have an influence on such vocal tract resonance as F1, F2, F3, and so on. The formant frequency is influenced by aperture and placement of tongue and the intensity is influenced by air pressure of subglottis. The object of this study compares to characterize the spectral outputs of oral and nasal spectra for the formant frequencies and intensity of Korean oral vowels. Subjects consisted of 20 normal persons (10 male and 10 female) without laryngeal pathology. The speech sample included /a/, /e/, /i/, /o/, /u/ of Korean oral vowels. The spectrum of each vowel was analysed by Nasal View and Real Analysis Program using Dr. Speech. The result showed that nasal intensity is decreased manifestly from F1 to F2. But oral intensity and Intensity is decreased little bit from F1 to F2. The most of values of nasal formant frequency is similarity oral formant frequency and Formant frequency or little bit smaller.

  • PDF

A Cepstral Analysis of Breathy Voice with Vocal Fold Paralysis (성대마비로 인한 기식 음성에 대한 Cepstral 분석)

  • Kang, Young-Ae;Seong, Cheol-Jae
    • Phonetics and Speech Sciences
    • /
    • v.4 no.2
    • /
    • pp.89-94
    • /
    • 2012
  • The aim of this study is to investigate the usefulness of the parameter CPP (cepstral peak prominence) and LTAS (long term average spectrum) band energy for an analysis of breathy voice with vocal fold paralysis. Thirty-four female subjects who have vocal paralysis after thyroidectomy participated in this study. According to the perceptual judgements by three speech pathologists and one phonetic scholar, subjects were divided into two groups: breathy voice group (n = 21) and non-breathy voice group (n = 13). Maximum sustained phonation task was measured for acoustic analysis. CPP-related (i.e. mean F0, mean CPP, and mean CPPs) and LTAS-related (i.e. minimum, maximum, and mean) parameters were used. Independent samples t-test was conducted. Regarding CPP, there are significant differences in mean CPP and mean CPPs between groups. The values of mean CPP and CPPs in the non-breathy voice group are higher than those in the breathy voice group. The CPP could be regarded as the useful parameter for breathy voice analysis in the clinic. When it comes to LTAS, energy from 0 to 2 kHz are significantly different between groups. The minimum value of non-breathy group is lower than that of breathy group, whereas the maximum value of non-breathy group is higher. The frequency band below 2 kHz seems to be related to breathy voice.

A Comparative Study of Vowels Produced by Normal Subjects and Patients with Malignant Vocal Folds by Correlation Coefficient and Difference Sum of Narrow-band Spectra (악성종양환자와 정상인이 발성한 모음의 좁은대역 스펙트럼값의 상관계수와 절대차이합 비교)

  • Yang, Byung-Gon;Wang, Soo-Geun;Jo, Cheol-Woo;Kim, Hyung-Soon;Kim, Eun-Ji;Kwon, Soon-Bok
    • Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.189-200
    • /
    • 2003
  • The objective of this study was to examine two new parameters by which we could screen people with malignant vocal folds. The new parameters were the difference sums and Pearson correlation coefficients between adjacent pairs of intensity level matrices of narrow-band spectra. Audio files from the Korean Disordered Speech Database were analyzed by Praat, a speech analysis software, to obtain matrices of 400 intensity levels at 16 time points of each sustained vowel spectra. We limited our study to 12 normal subjects and 20 patients with malignant vocal folds who recorded at least three Korean vowels at a sound-proofed booth in Busan National University Hospital. Results indicated that the average coefficients of the abnormal subjects were much lower than those of the normal subjects while the average difference sums of the patients were much higher than those of the normal ones. Also, we found that the degree of the malignancy of the vocal folds was related to the coefficients and sums. However, some subjects at the initial stages of cancerous vocal folds yielded almost comparable coefficients and difference sums to those of the normal speakers. Further studies on larger databases will be desirable to set certain criteria or threshold levels for screening people with vocal fold diseases.

  • PDF

Two-Channel Noise Reduction Using Beamforming and DOA-Based Masking (빔포밍 및 DOA 기반의 마스킹을 이용한 2채널 잡음제거)

  • Kim, Youngil;Jeong, Sangbae
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.1
    • /
    • pp.32-40
    • /
    • 2013
  • In this paper, we propose a multi-channel speech enhancement algorithm using beamforming and direction-of-arrival (DOA)-based masking. The proposed algorithm enhances noisy speech basically by the linearly constrained minimum variance (LCMV) algorithm and then a mel-scale Wiener filter designed using DOA-based masking is applied to remove still remaining noises. To improve the performance, we optimize the learning rate of the adaptive filters in LCMV and the DOA threshold to detect target speech spectrum. As performance indices, the perceptual evaluation of speech quality (PESQ) score and output SNRs are measured. Experimantal results show that the proposed algorithm outperforms the conventional LCMV beamformer by 0.09 in PESQ score and 5.75 dB in output SNR, respectively.