• Title/Summary/Keyword: 음성 검출기

Search Result 137, Processing Time 0.021 seconds

Fundamental Frequency Estimation of Voiced Speech Signals Based on the Inflection Point Detection (변곡점 검출에 기반한 음성의 기본 주파수 추정)

  • Byeonggwan Iem
    • Journal of IKEEE
    • /
    • v.27 no.4
    • /
    • pp.472-476
    • /
    • 2023
  • Fundamental frequency/pitch period are major characteristics of speech signals. They are used in many speech applications like speech coding, speech recognition, speaker identification, and so on. In this paper, some of inflection points are used to estimate the pitch which is the inverse of the fundamental frequency. The inflection points are defined as points where local maxima, local minima or the slope changes occur. The speech signal is preprocessed to remove unnecessary inflection points due to the high frequency components using a low pass filter. Only the inflection points from local maxima are used to get the pitch period. While the existing pitch estimation methods process speech signals in blockwise, the proposed method detects the inflection points in sample and produces the pitch period/fundamental frequency estimates along the time. Computer simulation shows the usefulness of the proposed method as a fundamental frequency estimator.

An Embedded Timing Loss Detector for Robust Data Transmission (데이터 전송을 위한 타이밍 손실 검출기)

  • 이용환
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.9
    • /
    • pp.1404-1411
    • /
    • 1993
  • Unlike voice communication, data transmission, can be seriously affected by transient channel impairments. In some cases, timing synchronization between the transmitter and the receiver may not be recovered in the presence of these kinds of impairments without a forced reinitialization process. Therefore, it is highly desirable for data communication equipment to have an efficient timing loss detector for robust recovery. In this paper, one such detector is proposed for data transceivers haying a secondary channel embedded in the main channel. A known sequence multiplexed with the secondary channel data is repeatedly sent through the embedded secondary channel. For continuous watch-dog like operation, the detection is sequentially performed based on a modified up/down counter scheme. The performance of the proposed detector is analytically evaluated In closed form.

  • PDF

A study on Gabor Filter Bank-based Feature Extraction Algorithm for Analysis of Acoustic data of Emergency Rescue (응급구조 음향데이터 분석을 위한 Gabor 필터뱅크 기반의 특징추출 알고리즘에 대한 연구)

  • Hwang, Inyoung;Chang, Joon-Hyuk
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2015.10a
    • /
    • pp.1345-1347
    • /
    • 2015
  • 본 논문에서는 응급상황이 신고되는 상황에서 수보자에게 전달되는 신고자의 주변음향신호로부터 신고자의 주변상황을 추정하기 위하여 음향의 주파수적 특성 및 변화특성의 모델링 성능이 뛰어난 Gabor 필터뱅크 기반의 특징벡터 추출 기술 및 분류 성능이 뛰어난 심화신경망을 도입한다. 제안하는 Gabor 필터뱅크 기반의 특징벡터 추출 기법은 비음성 구간 검출기를 통하여 음성/비음성을 구분한 후에 비음성 구간에서 23차의 Mel-filter bank 계수를 추출한 후에 이로부터 Gabor 필터를 이용하여 주변상황 추정을 위한 특징벡터를 추출하고, 이로부터 학습된 심화신경망을 통하여 신고자의 장소적 정보를 추정한다. 제안된 기법은 여러 가지 시나리오 환경에서 평가되었으며, 우수한 분류성능을 보였다.

Performance Improvement of Double Talk Detection before Convergence of the Echo Canceller by Using Linear Predictive Coding Filter Gain of the Primary Input Signal (주입력신호의 LPC 필터 이득을 이용한 반향제거기의 수렴전 동시통화검출 성능 개선)

  • Yoo, Jae-Ha
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.6
    • /
    • pp.628-633
    • /
    • 2014
  • This paper proposes a performance improvement method of the conventional double talk detection method which can operate before convergence of the echo canceller. The proposed method estimates the coefficients of the linear predictive coding(LPC) filter by using the primary input signal. The time-varying threshold for double talk detection is determined based on the LPC filter gain of the primary input signal level. The proposed method can reduce not only false detection rate which means wrong detection of single talk as double talk but also double talk detection delay. Computer simulation was performed using a long-term real speech signals. It is shown that the proposed method improves the conventional method in terms of lowering the false detection rate and shortening the detection delay.

Context-adaptive Phoneme Segmentation for a TTS Database (문자-음성 합성기의 데이터 베이스를 위한 문맥 적응 음소 분할)

  • 이기승;김정수
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.2
    • /
    • pp.135-144
    • /
    • 2003
  • A method for the automatic segmentation of speech signals is described. The method is dedicated to the construction of a large database for a Text-To-Speech (TTS) synthesis system. The main issue of the work involves the refinement of an initial estimation of phone boundaries which are provided by an alignment, based on a Hidden Market Model(HMM). Multi-layer perceptron (MLP) was used as a phone boundary detector. To increase the performance of segmentation, a technique which individually trains an MLP according to phonetic transition is proposed. The optimum partitioning of the entire phonetic transition space is constructed from the standpoint of minimizing the overall deviation from hand labelling positions. With single speaker stimuli, the experimental results showed that more than 95% of all phone boundaries have a boundary deviation from the reference position smaller than 20 ms, and the refinement of the boundaries reduces the root mean square error by about 25%.

Frequency Domain Double-Talk Detector Based on Gaussian Mixture Model (주파수 영역에서의 Gaussian Mixture Model 기반의 동시통화 검출 연구)

  • Lee, Kyu-Ho;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.4
    • /
    • pp.401-407
    • /
    • 2009
  • In this paper, we propose a novel method for the cross-correlation based double-talk detection (DTD), which employing the Gaussian Mixture Model (GMM) in the frequency domain. The proposed algorithm transforms the cross correlation coefficient used in the time domain into 16 channels in the frequency domain using the discrete fourier transform (DFT). The channels are then selected into seven feature vectors for GMM and we identify three different regions such as far-end, double-talk and near-end speech using the likelihood comparison based on those feature vectors. The presented DTD algorithm detects efficiently the double-talk regions without Voice Activity Detector which has been used in conventional cross correlation based double-talk detection. The performance of the proposed algorithm is evaluated under various conditions and yields better results compared with the conventional schemes. especially, show the robustness against detection errors resulting from the background noises or echo path change which one of the key issues in practical DTD.

A Study on Glottal Spectrum Analysis According to the Distance between the Microphone and the lips (Microphone 거리에 따른 Glottal Spectrum 성분 분석에 관한 연구)

  • Park Hyunyoung;Jang Kyunga;Bae Myungjin
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.65-68
    • /
    • 2002
  • 현재 음성인식기는 다 채널의 음성입력방식을 사용하고 있는 추세이다. 이런 방법으로 음성인식기를 사용할 때에 자동적으로 음성을 검출하는 음성입력 방식은 발성자와 마이크간의 거리에 따라 Glottal Spectrum 성분이 변하는 특성을 가지고 있다. 이러한 Glottal Spectrum 성분은 a=R1/R0 (LPC 포락선의 기울기) 로 나타낼 수 있다. 본 논문에서는 발성자와 마이크 거리에 따른 Glottal Spectrum 성분을 비교 분석 하고자 한다.

  • PDF

Optimization of Detection Method Using a Moving Average Estimator for Speech Enhancement (음성강화를 위한 이동 평균 예측량 기반의 검출방법 최적화)

  • Lee, Soo-Jeong;Shin, Kye-Hyeon;Kim, Soon-Hyob
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.44 no.3
    • /
    • pp.97-104
    • /
    • 2007
  • Adaptive echo canceller(AEC) has become an important component in speech communication systems, including mobile phones and speech recognition. In these applications, the acoustic echo path has a long impulse response. We propose a moving-averge least mean square(MVLMS) algorithm with a detection method for acoustic echo cancellation. Using, the result of the tests that used colored input models clearly shows that the MVLMS detection algorithm has convergence performance superior to the least mean square(LMS) detection algorithm alone. Although the computational complexity of the new MVLMS algorithm is only slightly greater than that of the standard LMS detection algorithm, the new algorithm confers a significant improvement in stability.

Scoring Methods for Improvement of Speech Recognizer Detecting Mispronunciation of Foreign Language (외국어 발화오류 검출 음성인식기의 성능 개선을 위한 스코어링 기법)

  • Kang Hyo-Won;Kwon Chul-Hong
    • MALSORI
    • /
    • no.49
    • /
    • pp.95-105
    • /
    • 2004
  • An automatic pronunciation correction system provides learners with correction guidelines for each mispronunciation. For this purpose we develope a speech recognizer which automatically classifies pronunciation errors when Koreans speak a foreign language. In order to develope the methods for automatic assessment of pronunciation quality, we propose a language model based score as a machine score in the speech recognizer. Experimental results show that the language model based score had higher correlation with human scores than that obtained using the conventional log-likelihood based score.

  • PDF

Pronunciation Network Construction of Speech Recognizer for Mispronunciation Detection of Foreign Language (한국인의 외국어 발화오류 검출을 위한 음성인식기의 발음 네트워크 구성)

  • Lee Sang-Pil;Kwon Chul-Hong
    • MALSORI
    • /
    • no.49
    • /
    • pp.123-134
    • /
    • 2004
  • An automatic pronunciation correction system provides learners with correction guidelines for each mispronunciation. In this paper we propose an HMM based speech recognizer which automatically classifies pronunciation errors when Koreans speak Japanese. We also propose two pronunciation networks for automatic detection of mispronunciation. In this paper, we evaluated performances of the networks by computing the correlation between the human ratings and the machine scores obtained from the speech recognizer.

  • PDF