• Title/Summary/Keyword: Frame SNR

Search Result 112, Processing Time 0.024 seconds

A Study on the Context-dependent Speaker Recognition Adopting the Method of Weighting the Frame-based Likelihood Using SNR (SNR을 이용한 프레임별 유사도 가중방법을 적용한 문맥종속 화자인식에 관한 연구)

  • Choi, Hong-Sub
    • MALSORI
    • /
    • no.61
    • /
    • pp.113-123
    • /
    • 2007
  • The environmental differences between training and testing mode are generally considered to be the critical factor for the performance degradation in speaker recognition systems. Especially, general speaker recognition systems try to get as clean speech as possible to train the speaker model, but it's not true in real testing phase due to environmental and channel noise. So in this paper, the new method of weighting the frame-based likelihood according to frame SNR is proposed in order to cope with that problem. That is to make use of the deep correlation between speech SNR and speaker discrimination rate. To verify the usefulness of this proposed method, it is applied to the context dependent speaker identification system. And the experimental results with the cellular phone speech DB which is designed by ETRI for Koran speaker recognition show that the proposed method is effective and increase the identification accuracy by 11% at maximum.

  • PDF

Voice Activity Detection Based on SNR and Non-Intrusive Speech Intelligibility Estimation

  • An, Soo Jeong;Choi, Seung Ho
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.11 no.4
    • /
    • pp.26-30
    • /
    • 2019
  • This paper proposes a new voice activity detection (VAD) method which is based on SNR and non-intrusive speech intelligibility estimation. In the conventional SNR-based VAD methods, voice activity probability is obtained by estimating frame-wise SNR at each spectral component. However these methods lack performance in various noisy environments. We devise a hybrid VAD method that uses non-intrusive speech intelligibility estimation as well as SNR estimation, where the speech intelligibility score is estimated based on deep neural network. In order to train model parameters of deep neural network, we use MFCC vector and the intrusive speech intelligibility score, STOI (Short-Time Objective Intelligent Measure), as input and output, respectively. We developed speech presence measure to classify each noisy frame as voice or non-voice by calculating the weighted average of the estimated STOI value and the conventional SNR-based VAD value at each frame. Experimental results show that the proposed method has better performance than the conventional VAD method in various noisy environments, especially when the SNR is very low.

Two-step a priori SNR Estimation in the Log-mel Domain Considering Phase Information (위상 정보를 고려한 로그멜 영역에서의 2단계 선험 SNR 추정)

  • Lee, Yun-Kyung;Kwon, Oh-Wook
    • Phonetics and Speech Sciences
    • /
    • v.3 no.1
    • /
    • pp.87-94
    • /
    • 2011
  • The decision directed (DD) approach is widely used to determine a priori SNR from noisy speech signals. In conventional speech enhancement systems with a DD approach, a priori SNR is estimated by using only the magnitude components and consequently follows a posteriori SNR with one frame delay. We propose a phase-dependent two-step a priori SNR estimator based on the minimum mean square error (MMSE) in the log-mel spectral domain so that we can consider both magnitude and phase information, and it can overcome the performance degradation caused by one frame delay. From the experimental results, the proposed estimator is shown to improve the output SNR of enhanced speech signals by 2.3 dB compared to the conventional DD approach-based system.

  • PDF

Frame Reliability Weighting for Robust Speech Recognition (프레임 신뢰도 가중에 의한 강인한 음성인식)

  • 조훈영;김락용;오영환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.3
    • /
    • pp.323-329
    • /
    • 2002
  • This paper proposes a frame reliability weighting method to compensate for a time-selective noise that occurs at random positions of speech signal contaminating certain parts of the speech signal. Speech frames have different degrees of reliability and the reliability is proportional to SNR (signal-to noise ratio). While it is feasible to estimate frame Sl? by using the noise information from non-speech interval under a stationary noisy situation, it is difficult to obtain noise spectrum for a time-selective noise. Therefore, we used statistical models of clean speech for the estimation of the frame reliability. The proposed MFR (model-based frame reliability) approximates frame SNR values using filterbank energy vectors that are obtained by the inverse transformation of input MFCC (mal-frequency cepstral coefficient) vectors and mean vectors of a reference model. Experiments on various burnt noises revealed that the proposed method could represent the frame reliability effectively. We could improve the recognition performance by using MFR values as weighting factors at the likelihood calculation step.

Analysis of the Relationships according to the Frame (f/s) Change of Cine Imaging in Coronary Angiographic System: With Focus on FOV Enlargement and Live Zoom (심장 혈관 조영장치에서의 프레임 레이트(f/s) 변화에 따른 상관 관계 분석 : FOV 확대와 Live Zoom을 중점으로)

  • Kim, Won Hyo;Song, Jong-Nam;Han, Jae-Bok
    • Journal of the Korean Society of Radiology
    • /
    • v.12 no.7
    • /
    • pp.845-852
    • /
    • 2018
  • This study aimed to investigate the difference of X-ray exposure by comparing and analyzing absorbed dose according to changes in the number of frames in coronary angiography, also depending whether the zoom mode is FOV enlargement or Zoom Live. Moreover, for appropriate frame selection measures for examination, including the effect of frame change on the image quality, were sought by measuring the noise strength expressed by the standard deviation (SD), the signal to noise ratio (SNR) and contrast to noise ratio (CNR). The study was conducted with an anthropomorphic phantom on an angio-system. The linear relationship between the frame rate and the radiation dose was evident. On the contrary, the indices of image quality (SD, SNR, and CNR) were almost constant irrespective of the number of frames. The difference depending on the zoom mode was not statistically significant for DAP, air kerma, and SD (p > 0.05). However, SNR and CNR were statistically different between FOV enlargement and Zoom Live. In conclusion, since the image quality was not degraded significantly with the decreasing frame rate from 30, 15, to 7.5 f/s and the radiation dose evidently decreases in almost exactly linear proportion to the decreasing frame rate, the number of frames per second needs to be maintained as low as reasonably achievable. As for the dependence on the zooming mode, the Live Zoom mode showed statistically significant improvement in the image quality indices of SNR and CNR and it justifies active use of the Live Zoom mode which enables real-time image enlargment without additional radiation dose.

Analysis of the Relationships Between ESD and DAP, and Image SNR·CNR According to the Frame Change of Cine Imaging in CAG : With Focus on 10 f/s and 15 f/s (심장혈관 조영술에서 씨네(cine)촬영의 프레임변화에 따른 ESD와 DAP 및 영상의 SNR·CNR 관계 분석: 10f/s과 15f/s을 중심으로)

  • Jung, Myo-Young;Seo, Young-Hyun;Song, Jong-Nam;Han, Jae-Bok
    • Journal of the Korean Society of Radiology
    • /
    • v.12 no.5
    • /
    • pp.669-675
    • /
    • 2018
  • This study aimed to investigate the difference of X-ray exposure by comparing and analyzing entrance surface dose and absorbed dose according to the frame change in coronary angiography using an X-ray machine. Moreover, appropriate frame selection measures for examination, including the effect of frame change on the image quality, were sought by measuring and analyzing the SNR and CNR of the image through image J. The study was conducted on 30 patients (19 males and 11 females) who underwent CAG at this hospital from June 2017 to October 2017. In regard to the patients, their age range was 49-82 years (mean of $65{\pm}9$ years), body weight was 45-91 kg (mean of $67{\pm}8.9kg$), height was 150-179cm (mean of $165.1{\pm}8.9kg$), and BMI was 19.5-30.5(mean of $24.5{\pm}2.9$). For the entrance surface dose and absorbed dose, air kerma value and DAP were obtained and analyzed retrospectively. The SNR and CNR were measured and analyzed through imageJ, and the result values were derived by applying the values to the formula. As for the statistical analyses, the correlations between the entrance surface dose and absorbed dose, and between the SNR and CNR were analyzed by using the SPSS statistical program. The relationship between the entrance surface dose and absorbed dose was not statistically significant for both 10 f/s and 15 f/s (p>0.05). In terms of the relationship between the SNR and CNR, the SNR ($3.374{\pm}2.1297$) and CNR ($0.234{\pm}0.2249$) in 10 f/s were $1.43{\pm}0.4861$ and $0.132{\pm}0.0555$ lower, respectively, than the SNR ($4.929{\pm}2.8532$) and CNR ($0.391{\pm}0.3025$) in 15 f/s, which were not statistically significant (p>0.05). In the correlation analysis, statistically significant results were obtained among the BMI, air kerma, and DAP; between air kerma and DAP; and between SNR and CNR (p<0.001, p<0.001). In conclusion, there was no significant difference between the entrance surface dose and absorbed dose even when the images were taken by changing the frame from 10 f/s to 15 f/s at the time of the coronary angiography. SNR and CNR increased at 15 f/s than at 10 f/s, but they were not statistically significant. Therefore, this study suggests that the concern of the patient and practitioner regarding image quality degradation, as well as the problem of X-ray exposure caused by imaging at 10 f/s and 15 f/s, may be reduced.

Performance Improvement of Turbo Code in low SNR and short frame sizes (낮은 SNR과 짧은 프레임에서 터보코드 성능 개선)

  • 정상연;이용식;심우성;허도근
    • Proceedings of the IEEK Conference
    • /
    • 1999.06a
    • /
    • pp.61-64
    • /
    • 1999
  • The turbo code appropriate to IMT-2000 is known to have a good performance whenever the size of frame increases. But it is not appropriate to a sort of video service to need real time because of decoding complexity and long delay time by the size of frame. Therefore this paper proposes decoding decision algorithm of short frame in which soft output is weighted according to iteration number in turbo decoder. Performance of the proposed algorithm is analysed in the AWGN channel when short length of frame is 100, 256, 640. As the result. it is appeared that the proposed decoding decision algorithm has improved in BER other than in the existing MAP decoding algorithm.

  • PDF

A Robust TDMA Frame Structure and Initial Synchronization in Satellite Communication (위성통신을 위한 강인한 TDMA Frame 구조 및 초기동기 기법)

  • Ko, Dong-Kuk;Yoon, Won-Sik
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.8
    • /
    • pp.1631-1641
    • /
    • 2012
  • A TDMA system in satellite communication has been utilized. Especially DVB-S2 was standardized and now operated in satellite broadcasting system. In this paper, we propose a TDMA frame structure appropriate for special purpose which has the good reliablilty in a poor RF environment even if frequency efficiency is decreased. TDMA frame duration is 12 seconds which is long duration in comparison with general TDMA system with several ms. Designing the frame structure, time and frequency shift in single frame duration are considered. Simulation results show that the proposed frame structure and synchronization method has robust synchronization performance when the terminal is even in low SNR as well as 25 kHz frequency offsets.

Speech Processing System Using a Noise Reduction Neural Network Based on FFT Spectrums

  • Choi, Jae-Seung
    • Journal of information and communication convergence engineering
    • /
    • v.10 no.2
    • /
    • pp.162-167
    • /
    • 2012
  • This paper proposes a speech processing system based on a model of the human auditory system and a noise reduction neural network with fast Fourier transform (FFT) amplitude and phase spectrums for noise reduction under background noise environments. The proposed system reduces noise signals by using the proposed neural network based on FFT amplitude spectrums and phase spectrums, then implements auditory processing frame by frame after detecting voiced and transitional sections for each frame. The results of the proposed system are compared with the results of a conventional spectral subtraction method and minimum mean-square error log-spectral amplitude estimator at different noise levels. The effectiveness of the proposed system is experimentally confirmed based on measuring the signal-to-noise ratio (SNR). In this experiment, the maximal improvement in the output SNR values with the proposed method is approximately 11.5 dB better for car noise, and 11.0 dB better for street noise, when compared with a conventional spectral subtraction method.

A Study on Variation and Determination of Gaussian function Using SNR Criteria Function for Robust Speech Recognition (잡음에 강한 음성 인식에서 SNR 기준 함수를 사용한 가우시안 함수 변형 및 결정에 관한 연구)

  • 전선도;강철호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.7
    • /
    • pp.112-117
    • /
    • 1999
  • In case of spectral subtraction for noise robust speech recognition system, this method often makes loss of speech signal. In this study, we propose a method that variation and determination of Gaussian function at semi-continuous HMM(Hidden Markov Model) is made on the basis of SNR criteria function, in which SNR means signal to noise ratio between estimation noise and subtracted signal per frame. For proving effectiveness of this method, we show the estimation error to be related with the magnitude of estimated noise through signal waveform. For this reason, Gaussian function is varied and determined by SNR. When we test recognition rate by computer simulation under the noise environment of driving car over the speed of 80㎞/h, the proposed Gaussian decision method by SNR turns out to get more improved recognition rate compared with the frequency subtracted and non-subtracted cases.

  • PDF