• Title/Summary/Keyword: Speech Recognition Technology

Search Result 527, Processing Time 0.026 seconds

Improvement of Gesture Recognition using 2-stage HMM (2단계 히든마코프 모델을 이용한 제스쳐의 성능향상 연구)

  • Jung, Hwon-Jae;Park, Hyeonjun;Kim, Donghan
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.21 no.11
    • /
    • pp.1034-1037
    • /
    • 2015
  • In recent years in the field of robotics, various methods have been developed to create an intimate relationship between people and robots. These methods include speech, vision, and biometrics recognition as well as gesture-based interaction. These recognition technologies are used in various wearable devices, smartphones and other electric devices for convenience. Among these technologies, gesture recognition is the most commonly used and appropriate technology for wearable devices. Gesture recognition can be classified as contact or noncontact gesture recognition. This paper proposes contact gesture recognition with IMU and EMG sensors by using the hidden Markov model (HMM) twice. Several simple behaviors make main gestures through the one-stage HMM. It is equal to the Hidden Markov model process, which is well known for pattern recognition. Additionally, the sequence of the main gestures, which comes from the one-stage HMM, creates some higher-order gestures through the two-stage HMM. In this way, more natural and intelligent gestures can be implemented through simple gestures. This advanced process can play a larger role in gesture recognition-based UX for many wearable and smart devices.

Speech Recognition and Lip Shape Feature Extraction for English Vowel Pronunciation of the Hearing - Impaired Based on SVM Technique (SVM 기법에 기초한 청각장애인의 영어모음 발음을 위한 음성 인식 및 입술형태 특징 추출)

  • Lee, Kun-Min;Han, Kyung-Im;Park, Hye-Jung
    • Journal of rehabilitation welfare engineering & assistive technology
    • /
    • v.11 no.3
    • /
    • pp.247-252
    • /
    • 2017
  • The purpose of this study is to suggest the visual teaching method for the English vowel pronunciation, especially for the hearing-impaired who mostly rely on the visual aids, based on the SVM technique. By extracting phonetic features using the SVM technique from the sounds that are hard to hear by ear, the lip shapes for each vowel were refined. The lip shape refinement for vowels is advantageous in that language learners can easily see the movement of articulators by eye, and it is helpful for learning and teaching English vowels for the hearing-impaired.

Acoustic Masking Effect That Can Be Occurred by Speech Contrast Enhancement in Hearing Aids (보청기에서 음성 대비 강조에 의해 발생할 수 있는 마스킹 현상)

  • Jeon, Y.Y.;Yang, D.G.;Bang, D.H.;Kil, S.K.;Lee, S.M.
    • Journal of rehabilitation welfare engineering & assistive technology
    • /
    • v.1 no.1
    • /
    • pp.21-28
    • /
    • 2007
  • In most of hearing aids, amplification algorithms are used to compensate hearing loss, noise and feedback reduction algorithms are used and to increase the perception of speeches contrast enhancement algorithms are used. However, acoustic masking effect is occurred between formants if contrast is enhanced excessively. To confirm the masking effect in speeches, the experiment are composed of 6 tests; test pure tone test, speech reception test, word recognition test, pure tone masking test, formant pure tone masking test and speech masking test, and for objective evaluation, LLR is introduced. As a result of normal hearing subjects and hearing impaired subjects, more making is occurred in hearing impaired subjects than normal hearing subjects when using pure tone, and in the speech masking test, speech reception is also lower in hearing impaired subjects than in normal hearing subjects. This means that acoustic masking effect rather than distortion influences speech perception. So it is required to check the characteristics of masking effect before wearing a hearing aid and to apply this characteristics to fitting curve.

  • PDF

The Interactive Voice Services based on VoiceXML (VoiceXML 기반 음성인식시스템을 이용한 서비스 개발)

  • Kim Hak-Gyoon;Kim Eun-Hyang;Kim Jae-In;Koo Myoung-Wan
    • MALSORI
    • /
    • no.43
    • /
    • pp.113-125
    • /
    • 2002
  • As there are needs to search the Web information via wire or wireless telephones, VoiceXML forum was established to develop and promote the Voice eXtensible Markup Language (VoiceXML). VoiceXML simplifies the creation of personalized interactive voice response services on the Web, and allows voice and phone access to information on Web sites, call center databases. Also, it can utilize the Web-based technologies, such as CGI(Common Gateway Interface) scripts. In this paper, we have developed the voice portal service platform based on VoiceXML called TeleGateway. It enables integration of voice services with data services using the Automatic Speech Recognition (ASR) and Text-To-Speech (TTS) engines. Also, we have showed the various services on voice portal services.

  • PDF

Speaker Adaptation Using ICA-Based Feature Transformation

  • Jung, Ho-Young;Park, Man-Soo;Kim, Hoi-Rin;Hahn, Min-Soo
    • ETRI Journal
    • /
    • v.24 no.6
    • /
    • pp.469-472
    • /
    • 2002
  • Speaker adaptation techniques are generally used to reduce speaker differences in speech recognition. In this work, we focus on the features fitted to a linear regression-based speaker adaptation. These are obtained by feature transformation based on independent component analysis (ICA), and the feature transformation matrices are estimated from the training data and adaptation data. Since the adaptation data is not sufficient to reliably estimate the ICA-based feature transformation matrix, it is necessary to adjust the ICA-based feature transformation matrix estimated from a new speaker utterance. To cope with this problem, we propose a smoothing method through a linear interpolation between the speaker-independent (SI) feature transformation matrix and the speaker-dependent (SD) feature transformation matrix. From our experiments, we observed that the proposed method is more effective in the mismatched case. In the mismatched case, the adaptation performance is improved because the smoothed feature transformation matrix makes speaker adaptation using noisy speech more robust.

  • PDF

Performance Improvement of Voting-based Speaker Identification System by using the Observation Confidence (관측신뢰도 적용에 의한 투표기법 기반의 화자인식시스템의 성능향상)

  • Choi, Hong-Sub
    • Speech Sciences
    • /
    • v.15 no.2
    • /
    • pp.79-88
    • /
    • 2008
  • Recently demands for the speech technology-based products targeted for the mobile terminals such as cellular phones and PDA are rapidly increasing. And voting-based speaker identification algorithm is known to have a good performance in the mobile environment, since it works well with small amount of speaker training data. In this paper, we proposed a method to improve the performance of this voting based speaker identification system by using the observation confidence value which is derived from the function of SNR each frame. The proposed method is evaluated with ETRI cellular phone DB which is made for the speaker recognition task. The experimental results show that the proposed method has better performance of 2-3% identification rate than the conventional GMM method.

  • PDF

Implementation of HMM-Based Speech Recognizer Using TMS320C6711 DSP

  • Bae Hyojoon;Jung Sungyun;Son Jongmok;Kwon Hongseok;Kim Siho;Bae Keunsung
    • Proceedings of the IEEK Conference
    • /
    • summer
    • /
    • pp.391-394
    • /
    • 2004
  • This paper focuses on the DSP implementation of an HMM-based speech recognizer that can handle several hundred words of vocabulary size as well as speaker independency. First, we develop an HMM-based speech recognition system on the PC that operates on the frame basis with parallel processing of feature extraction and Viterbi decoding to make the processing delay as small as possible. Many techniques such as linear discriminant analysis, state-based Gaussian selection, and phonetic tied mixture model are employed for reduction of computational burden and memory size. The system is then properly optimized and compiled on the TMS320C6711 DSP for real-time operation. The implemented system uses 486kbytes of memory for data and acoustic models, and 24.5kbytes for program code. Maximum required time of 29.2ms for processing a frame of 32ms of speech validates real-time operation of the implemented system.

  • PDF

An Approach to Develop a Speech Recognition Speaker Using Chatbot for Senior Users (시니어 사용자를 위한 챗봇활용 음성인식 스피커 개발 방법)

  • Noh, Gunho;Lee, Kyoung Yong;Moon, Mikyeong
    • Journal of IKEEE
    • /
    • v.22 no.2
    • /
    • pp.330-338
    • /
    • 2018
  • As population aging progresses, there is a growing demand for IT technology that can relieve the psychological anxiety of the elderly living alone, recognize the dangerous situation, and check the family members' affection. In this paper, we describe the development of a speech recognition speaker that enable senior users to give simple interactive commands by voice and monitor the status of the user. The speaker analyzes the user's voice, grasps the conversation contents through the chatbot, connects the desired service to the user, and provides the result again by voice. By using this speaker, senior users can feel relaxed by natural conversation, and can monitor the status of danger more easily.

Language Model Adaptation for Broadcast News Recognition (방송 뉴스 인식을 위한 언어 모델 적응)

  • Kim Hyun Suk;Jeon Hyung Bae;Kim Sanghun;Choi Joon Ki;Yun Seung
    • MALSORI
    • /
    • no.51
    • /
    • pp.99-115
    • /
    • 2004
  • In this parer, we propose LM adaptation for broadcast news recognition. We collect information of recent articles from the internet on real time, make a recent small size LM, and then interpolate recent LM with a existing LM composed of existing large broadcast news corpus. We performed interpolation experiments to get the best type of articles from recent corpus because collected recent corpus is composed of articles which are related with test set, and which are unrelated. When we made an adapted LM using recent LM with similar articles to test set through Tf-Idf method and existing LM, we got the best result that ERR of pseudo-morpheme based recognition performance has 17.2 % improvement and the number of OOV has reduction from 70 to 27.

  • PDF

A Study on Formants of Vowels for Speaker Recognition (화자 인식을 위한 모음의 포만트 연구)

  • Ahn Byoung-seob;Shin Jiyoung;Kang Sunmee
    • MALSORI
    • /
    • no.51
    • /
    • pp.1-16
    • /
    • 2004
  • The aim of this paper is to analyze vowels in voice imitation and disguised voice, and to find the invariable phonetic features of the speaker. In this paper we examined the formants of monophthongs /a, u, i, o, {$\omega},{\;}{\varepsilon},{\;}{\Lambda}$/. The results of the present are as follows : $\circled1$ Speakers change their vocal tract features. $\circled2$ Vowels /a, ${\varepsilon}$, i/ appear to be proper for speaker recognition since they show invariable acoustic feature during voice modulation. $\circled3$ F1 does not change easily compared to higher formants. $\circled4$ F3-F2 appears to be constituent for a speaker identification in vowel /a/ and /$\varepsilon$/, and F4-F2 in vowel /i/. $\circled5$ Resulting of F-ratio, differences of each formants were more useful than individual formant of a vowel to speaker recognition.

  • PDF