• Title/Summary/Keyword: 음성 검출기

Search Result 137, Processing Time 0.032 seconds

Real-Time Implementation of Speaker Dependent Speech Recognition Hardware Module Using the TMS320C32 DSP : VR32 (TMS320C32 DSP를 이용한 실시간 화자종속 음성인식 하드웨어 모듈(VR32) 구현)

  • Chung, Ik-Joo;Chung, Hoon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.17 no.4
    • /
    • pp.14-22
    • /
    • 1998
  • 본 연구에서는 Texas Instruments 사의 저가형 부동소수점 디지털 신호 처리기 (Digital Singnal Processor, DSP)인 TMS320C32를 이용하여 실시간 화자종속 음성인식 하 드웨어 모듈(VR32)을 개발하였다. 하드웨어 모듈의 구성은 40MHz의 TMS320C32 DSP, 14bit 코덱인 TLC32044(또는 8bit μ-law PCM 코덱), EPROM과 SRAM 등의 메모리와 호 스트 인터페이스를 위한 로직 회로로 이루어졌다. 뿐만 아니라 이 하드웨어 모듈을 PC사에 서 평가해보기 위한 PC 인터페이스용 보드 및 소프트웨어도 개발하였다. 음성인식 알고리 즘의 구성은 에너지와 ZCR을 기반으로 한 끝점검출(Endpoint Detection) 침 10차 가중 LPC 켑스터럼(Weighted LPC Cepstrum) 분석이 실시간으로 이루어지며 이후 Dynamic Time Warping(DTW)를 통하여 최고 유사 단어를 결정하고 다시 검증과정을 거쳐 최종 인식을 수행한다. 끝점검출의 경우 적응 문턱값(Adaptive threshold)을 이용하여 잡음에 강인한 끝 점검출이 가능하며 DTW 알고리즘의 경우 C 및 어셈블리를 이용한 최적화를 통하여 계산 속도를 대폭 개선하였다. 현재 인식률은 일반 사무실 환경에서 통상 단축다이얼 용도로 사 용할 수 있는 30 단어에 대하여 95% 이상으로 매우 높은 편이며, 특히 배경음악이나 자동 차 소음과 같은 잡음환경에서도 잘 동작한다.

  • PDF

A Study on the Realization of Echo Canceller in CDMA Mobile Communication Networks (CDMA 이동통신 망에서의 반향제거기 구현에 관한 연구)

  • 유태훈;박광철;이윤희;김기두
    • Journal of the Institute of Electronics Engineers of Korea TE
    • /
    • v.37 no.5
    • /
    • pp.36-47
    • /
    • 2000
  • The CDMA digital cellular systems provide better voice Quality than analog systems, however there exists inherent delays due to speech coding and transmission processing, which brings echoes returned by the BSC and PSTN interface. In this paper, we show the performance improvement of a proposed echo canceller by real time implementation, where Block Update NLMS algorithm is applied into the TMS320C54X DSP. By applying the proposed method into the practical mobile phone, we verify that various types of echoes (LE, ESE, AE) may be removed more precisely. We also cope with echo path change resulting from change of delay length after taking VAD to find echo path delay.

  • PDF

Detection of Glottal Closure Instant using the property of G-peak (G-peak의 특성을 이용한 성문폐쇄시점 검출)

  • Keum, Hong;Kim, Dae-Sik;Bae, Myung-Jin;Kim, Young-Il
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.1E
    • /
    • pp.82-88
    • /
    • 1994
  • It is important to exactly detect the GCI(Glottal Closure Instant) in the speech signal processing. A few methods to detect the GCI of voiced speech have een proposer, untill now. But these are difficult to detect the GCI for wide range of speakers and or various vowel signals. In this paper, we prposed a new method for GCI detection using the G-peak. The speech waveforms are passed through the LPF of variable bandwidth. Then, the GCI's of voiced speech are detected by the G-peak based on the filtered signals. We compared the detected with the eye-checked GCI at the SNR of clean, 20dB, and 0dB. We took into account the range within 1ms between eye-checked and detected GCI. We obtained the result of the detection rate as 97.9% in the clean speech, 96.5% in 20dB SNR, and 94.8% in 0dB SNR, respectively.

  • PDF

Development of energy expenditure measurement device based on voice and body activity (음성과 활동량을 이용한 에너지 소모량 측정기기 개발)

  • Im, Jae Joong
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.12 no.6
    • /
    • pp.303-309
    • /
    • 2012
  • Energy expenditure values were estimated based on the voice signals and body activities. Voice signals and body activities were obtained using PVDF contact vibration sensor and 3-axis accelerometer, respectively. Vibration caused by voices, activity signals, and actual energy consumption were acquired using data acquisition system and gas analyzer. With the use of power values from the voice signals and weight as independent variables, R-square of 0.918 appeared to show the highest value. For activity outputs, use of signal vector magnitude, body mass index, height, and age as independent variables revealed to provide the highest correlation with actual energy expenditure. Estimation of energy expenditure based on voice and activity provides more accurate results than based on activity only.

A New EGG System Design and Speech Analysis for Quantitative Analysis of Human Glottal Vibration Patterns (성문진동 패턴의 정량적인 해석을 위한 새로운 시스템 설계와 음성분석)

  • 김종찬;이재천;김덕원;오명환;윤대희;차일환
    • Journal of Biomedical Engineering Research
    • /
    • v.20 no.4
    • /
    • pp.427-433
    • /
    • 1999
  • The purpose of the study is to develop an improved pitch extraction method that can be used in a variety of speech applications such as high-puality compression and vocoding, and recognition and synthesis of speech. To do so, we develop a new electroglottograph (EGG) measurement system that is based on the four modulation-demodulation type spot electrodes for detecting the EGG signals. Then, the glottal closure instant(GCI) is determined from the EGG signals on a real-time basis. We can obtain the pitch contour using the information on the GCI. It turns out that the new pitch contour algorithm (PCA) operates more reliably as compared to the conventional speech-only-based algorithm. In addition, we study the speech source models and glottal vibratory patterns for Koreans by measuring and analyzing the diversified vibration patterns of the vocal from the EGG signals.

  • PDF

Machine scoring method for speech recognizer detection mispronunciation of foreign language (외국어 발화오류 검출 음성인식기를 위한 스코어링 기법)

  • Kang, Hyo-Won;Bae, Min-Young;Lee, Jae-Kang;Kwon, Chul-Hong
    • Proceedings of the KSPS conference
    • /
    • 2004.05a
    • /
    • pp.239-242
    • /
    • 2004
  • An automatic pronunciation correction system provides users with correction guidelines for each pronunciation error. For this purpose, we propose a speech recognition system which automatically classifies pronunciation errors when Koreans speak a foreign language. In this paper, we also propose machine scoring methods for automatic assessment of pronunciation quality by the speech recognizer. Scores obtained from an expert human listener are used as the reference to evaluate the different machine scores and to provide targets when training some of algorithms. We use a log-likelihood score and a normalized log-likelihood score as machine scoring methods. Experimental results show that the normalized log-likelihood score had higher correlation with human scores than that obtained using the log-likelihood score.

  • PDF

Machine Scoring Methods Highly-correlated with Human Ratings in Speech Recognizer Detecting Mispronunciation of Foreign Language (한국인의 외국어 발화오류검출 음성인식기에서 청취판단과 상관관계가 높은 기계 스코어링 기법)

  • Bae, Min-Young;Kwon, Chul-Hong
    • Speech Sciences
    • /
    • v.11 no.2
    • /
    • pp.217-226
    • /
    • 2004
  • An automatic pronunciation correction system provides users with correction guidelines for each pronunciation error. For this purpose, we develop a speech recognition system which automatically classifies pronunciation errors when Koreans speak a foreign language. In this paper, we propose a machine scoring method for automatic assessment of pronunciation quality by the speech recognizer. Scores obtained from an expert human listener are used as the reference to evaluate the different machine scores and to provide targets when training some of algorithms. We use a log-likelihood score and a normalized log-likelihood score as machine scoring methods. Experimental results show that the normalized log-likelihood score had higher correlation with human scores than that obtained using the log-likelihood score.

  • PDF

MCE Training Algorithm for a Speech Recognizer Detecting Mispronunciation of a Foreign Language (외국어 발음오류 검출 음성인식기를 위한 MCE 학습 알고리즘)

  • Bae, Min-Young;Chung, Yong-Joo;Kwon, Chul-Hong
    • Speech Sciences
    • /
    • v.11 no.4
    • /
    • pp.43-52
    • /
    • 2004
  • Model parameters in HMM based speech recognition systems are normally estimated using Maximum Likelihood Estimation(MLE). The MLE method is based mainly on the principle of statistical data fitting in terms of increasing the HMM likelihood. The optimality of this training criterion is conditioned on the availability of infinite amount of training data and the correct choice of model. However, in practice, neither of these conditions is satisfied. In this paper, we propose a training algorithm, MCE(Minimum Classification Error), to improve the performance of a speech recognizer detecting mispronunciation of a foreign language. During the conventional MLE(Maximum Likelihood Estimation) training, the model parameters are adjusted to increase the likelihood of the word strings corresponding to the training utterances without taking account of the probability of other possible word strings. In contrast to MLE, the MCE training scheme takes account of possible competing word hypotheses and tries to reduce the probability of incorrect hypotheses. The discriminant training method using MCE shows better recognition results than the MLE method does.

  • PDF

Non-Keyword Model for the Improvement of Vocabulary Independent Keyword Spotting System (가변어휘 핵심어 검출 성능 향상을 위한 비핵심어 모델)

  • Kim, Min-Je;Lee, Jung-Chul
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.7
    • /
    • pp.319-324
    • /
    • 2006
  • We Propose two new methods for non-keyword modeling to improve the performance of speaker- and vocabulary-independent keyword spotting system. The first method is decision tree clustering of monophone at the state level instead of monophone clustering method based on K-means algorithm. The second method is multi-state multiple mixture modeling at the syllable level rather than single state multiple mixture model for the non-keyword. To evaluate our method, we used the ETRI speech DB for training and keyword spotting test (closed test) . We also conduct an open test to spot 100 keywords with 400 sentences uttered by 4 speakers in an of fce environment. The experimental results showed that the decision tree-based state clustering method improve 28%/29% (closed/open test) than the monophone clustering method based K-means algorithm in keyword spotting. And multi-state non-keyword modeling at the syllable level improve 22%/2% (closed/open test) than single state model for the non-keyword. These results show that two proposed methods achieve the improvement of keyword spotting performance.

Phoneme Recognition based on Two-Layered Stereo Vision Neural Network (2층 구조의 입체 시각형 신경망 기반 음소인식)

  • Kim, Sung-Ill;Kim, Nag-Cheol
    • Journal of Korea Multimedia Society
    • /
    • v.5 no.5
    • /
    • pp.523-529
    • /
    • 2002
  • The present study describes neural networks for stereoscopic vision, which are applied to identifying human speech. In speech recognition based on stereoscopic vision neural networks (SVNN), the similarities are first obtained by comparing input vocal signals with standard models. They are then given to a dynamic process in which both competitive and cooperative processes are conducted among neighboring similarities. Through the dynamic processes, only one winner neuron is finally detected. In a comparative study, the two-layered SVNN was 7.7% higher in recognition accuracies than the hidden Markov model (HMM). From the evaluation results, it was noticed that SVNN outperformed the existing HMM recognizer.

  • PDF