• Title/Summary/Keyword: phonetic system

Search Result 313, Processing Time 0.019 seconds

Machine scoring method for speech recognizer detection mispronunciation of foreign language (외국어 발화오류 검출 음성인식기를 위한 스코어링 기법)

  • Kang, Hyo-Won;Bae, Min-Young;Lee, Jae-Kang;Kwon, Chul-Hong
    • Proceedings of the KSPS conference
    • /
    • 2004.05a
    • /
    • pp.239-242
    • /
    • 2004
  • An automatic pronunciation correction system provides users with correction guidelines for each pronunciation error. For this purpose, we propose a speech recognition system which automatically classifies pronunciation errors when Koreans speak a foreign language. In this paper, we also propose machine scoring methods for automatic assessment of pronunciation quality by the speech recognizer. Scores obtained from an expert human listener are used as the reference to evaluate the different machine scores and to provide targets when training some of algorithms. We use a log-likelihood score and a normalized log-likelihood score as machine scoring methods. Experimental results show that the normalized log-likelihood score had higher correlation with human scores than that obtained using the log-likelihood score.

  • PDF

Application of Shape Analysis Techniques for Improved CASA-Based Speech Separation (CASA 기반 음성분리 성능 향상을 위한 형태 분석 기술의 응용)

  • Lee, Yun-Kyung;Kwon, Oh-Wook
    • MALSORI
    • /
    • no.65
    • /
    • pp.153-168
    • /
    • 2008
  • We propose a new method to apply shape analysis techniques to a computational auditory scene analysis (CASA)-based speech separation system. The conventional CASA-based speech separation system extracts speech signals from a mixture of speech and noise signals. In the proposed method, we complement the missing speech signals by applying the shape analysis techniques such as labelling and distance function. In the speech separation experiment, the proposed method improves signal-to-noise ratio by 6.6 dB. When the proposed method is used as a front-end of speech recognizers, it improves recognition accuracy by 22% for the speech-shaped stationary noise condition and 7.2% for the two-talker noise condition at the target-to-masker ratio than or equal to -3 dB.

  • PDF

On the Simple Speaker Verification System Using Tolerance Interval Analysis Without Background Speaker Models (Tolerance Interval Analysis를 이용한 배경화자 없는 간단한 화자인증시스템에 관한 연구)

  • Choi, Hong-Sub
    • MALSORI
    • /
    • no.56
    • /
    • pp.147-158
    • /
    • 2005
  • In this paper, we are focused to develop the simplified speaker verification algorithm without background speaker models, which will be adopted in the portable speaker verification system equipped in portable terminals such as mobile phone and PMP. According to the tolerance interval analysis, the population of someone's speaker model can be represented by a suitable number of selected independent samples of speaker model. So we can make the representative speaker model and threshold under the specified confidence level and coverage. Using proposed algorithm with the number of samples is 40, the experiments show that the false rejection rate is $3.0\%$ and the false acceptance rate $4.3\%$, worth comparing to conventional method's results, $5.4\%\;and\;5.5\%$, respectively. Next step of research will be on the suitable adaptation methods to overcome speech variation problems due to aging effect and operating environments.

  • PDF

A Voice-Activated Dialing System with Distributed Speech Recognition in WiFi Environments (무선랜 환경에서의 분산 음성 인식을 이용한 음성 다이얼링 시스템)

  • Park Sung-Joon;Koo Myoung_wan
    • MALSORI
    • /
    • no.56
    • /
    • pp.135-145
    • /
    • 2005
  • In this paper, a WiFi phone system with distributed speech recognition is implemented. The WiFi phone with voice-activated dialing and its functions are explained. Features of the input speech are extracted and are sent to the interactive voice response (IVR) server according to the real-time transport protocol (RTP). Feature extraction is based on the European Telecommunication Standards Institute (ETSI) standard front-end, but is modified to reduce the processing time. The time for front-end processing on a WiFi phone is compared with that in a PC.

  • PDF

Classification of Asthma Disease Using Thoracic Data (흉부음 데이터를 이용한 천식 질환 판별)

  • Moon In-Seob;Choi Hyoung-Ki;Lee Chul-Hee;Park Ki-Young;Kim Chong-Kyo
    • MALSORI
    • /
    • no.49
    • /
    • pp.135-144
    • /
    • 2004
  • In this paper, we make a study of classification normal from abnormal - normal, asthma through analysis of thoracic sound to take use thoracic sound detection system. Thoracic sound detection system has a function to store thoracic sound and analyze the data. The wave shape of thoracic sound is similar to noise and is systematically generated by inhalation and exhalation breathing, therefore, in this paper, to classify asthma sound in thoracic sound, we could discriminate between normal and abnormal case using level crossing rate(LCR) and spectrogram energy rate.

  • PDF

Design and Implementation of Education Multimedia Content Mastication system based on AVATAR (아바타 기반 교육용 멀티미디어 컨텐츠 저작시스템의 설계 및 구현)

  • 이혜정;정석태
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.8 no.5
    • /
    • pp.1042-1049
    • /
    • 2004
  • In this paper, we design and implements editor for education multimedia content authoring based on avatar that use LipSynchro software development kit(SDK). This system automatically generate a movement of Avatar interworking with motion generation engine and phonetic tuning engine. thus, it is able to more excellent educational contents that combine educational multimedia contents authoring tool.

Performance Improvement ofSpeech Recognition Based on SPLICEin Noisy Environments (SPLICE 방법에 기반한 잡음 환경에서의 음성 인식 성능 향상)

  • Kim, Jong-Hyeon;Song, Hwa-Jeon;Lee, Jong-Seok;Kim, Hyung-Soon
    • MALSORI
    • /
    • no.53
    • /
    • pp.103-118
    • /
    • 2005
  • The performance of speech recognition system is degraded by mismatch between training and test environments. Recently, Stereo-based Piecewise LInear Compensation for Environments (SPLICE) was introduced to overcome environmental mismatch using stereo data. In this paper, we propose several methods to improve the conventional SPLICE and evaluate them in the Aurora2 task. We generalize SPLICE to compensate for covariance matrix as well as mean vector in the feature space, and thereby yielding the error rate reduction of 48.93%. We also employ the weighted sum of correction vectors using posterior probabilities of all Gaussians, and the error rate reduction of 48.62% is achieved. With the combination of the above two methods, the error rate is reduced by 49.61% from the Aurora2 baseline system.

  • PDF

Design and Implementation of a Call Control Markup Interpreter and Its Interaction with Voice Dialog Systems (호 제어 마크업 해석기 개발 및 음성 대화 시스템과의 연동)

  • Lee, Kyung-A;Kwon, Ji-Hye;Kim, Ji-Young;Hong, Ki-Hyung
    • MALSORI
    • /
    • no.53
    • /
    • pp.171-183
    • /
    • 2005
  • Call Control eXtensible Markup (CCXML) is a standard language that supports a call control of voice dialog systems such as VoiceXML based systems. CCXML allows developers to handle telephony calls in an easy way without deep knowledge about telephony networks and their switching systems.We design and implement a call control markup interpreter. At the implementation, we use a Dialogic JCT-LS board, but, by designing a wrapping class for CTI (computer telephony board) features, the interpreter can easily adopt other CTI boards. We also design and implement event-based interaction scheme between the interpreter and voice dialog systems. For verifying the interaction scheme, we implement a simple voice dialog system.

  • PDF

The Text-to-Speech System Assessment Based on Word Frequency and Word Regularity Effects (단어빈도와 단어규칙성 효과에 기초한 합성음 평가)

  • Nam, Ki-Chun;Choi, Won-Il;Kim, Choong-Myung;Choi, Yang-Gyu;Kim, Jong-Jin
    • MALSORI
    • /
    • no.53
    • /
    • pp.61-74
    • /
    • 2005
  • In the present study, the intelligibility of the synthesized speech sounds was evaluated by using the psycholinguistic and fMRI techniques. In order to see the difference in recognizing words between the natural and synthesized speech sounds, word regularity and word frequency were varied. The results of Experiment1 and Experiment2 showed that the intelligibility difference of the synthesized speech comes from word regularity. In the case of the synthesized speech, the regular words were recognized slower than the irregular words, and there was smaller activation of the auditory areas in brain for the regular words than for the irregular words.

  • PDF

A Study on the Vowel lengthening and a Morphophonological Interpretatipon for its function (홀소리 길이의 늘어짐(Vowel lengthening)의 기능 및 형태음운론적 해석)

  • Kim, Chong-Dok
    • Proceedings of the KSPS conference
    • /
    • 2005.04a
    • /
    • pp.9-13
    • /
    • 2005
  • The aim of this paper is to analyze the vowel lengthening in Korean, whose function is distinctive in the word's level. In this paper, I examined two acoustic parameters : vowel length and formants(F1 and F2) to distinguish or to identify the long vowel and his short correspondant, for exemple, /a:/ and /a/. According to the results of experimental analysis and to the discussion on the vowel length's relation and its influence to Korean phonological system, I considered a vowel lengthening as a prosodeme, so as a prosodic element in Korean phonological system.

  • PDF