• Title/Summary/Keyword: Voice recognition system

Search Result 334, Processing Time 0.029 seconds

A Study on Speaker Adaptation of Large Continuous Spoken Language Using back-off bigram (Back-off bigram을 이랑한 대용량 연속어의 화자적응에 관한 연구)

  • 최학윤
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.28 no.9C
    • /
    • pp.884-890
    • /
    • 2003
  • In this paper, we studied the speaker adaptation methods that improve the speaker independent recognition system. For the independent speakers, we compared the results between bigram and back-off bigram, MAP and MLLR. Cause back-off bigram applys unigram and back-off weighted value as bigram probability value, it has the effect adding little weighted value to bigram probability value. We did an experiment using total 39-feature vectors as featuring voice parameter with 12-MFCC, log energy and their delta and delta-delta parameter. For this recognition experiment, We constructed a system made by CHMM and tri-phones recognition unit and bigram and back-off bigrams language model.

Semantic Ontology Speech Recognition Performance Improvement using ERB Filter (ERB 필터를 이용한 시맨틱 온톨로지 음성 인식 성능 향상)

  • Lee, Jong-Sub
    • Journal of Digital Convergence
    • /
    • v.12 no.10
    • /
    • pp.265-270
    • /
    • 2014
  • Existing speech recognition algorithm have a problem with not distinguish the order of vocabulary, and the voice detection is not the accurate of noise in accordance with recognized environmental changes, and retrieval system, mismatches to user's request are problems because of the various meanings of keywords. In this article, we proposed to event based semantic ontology inference model, and proposed system have a model to extract the speech recognition feature extract using ERB filter. The proposed model was used to evaluate the performance of the train station, train noise. Noise environment of the SNR-10dB, -5dB in the signal was performed to remove the noise. Distortion measure results confirmed the improved performance of 2.17dB, 1.31dB.

A Study on Processing of Speech Recognition Korean Words (한글 단어의 음성 인식 처리에 관한 연구)

  • Nam, Kihun
    • The Journal of the Convergence on Culture Technology
    • /
    • v.5 no.4
    • /
    • pp.407-412
    • /
    • 2019
  • In this paper, we propose a technique for processing of speech recognition in korean words. Speech recognition is a technology that converts acoustic signals from sensors such as microphones into words or sentences. Most foreign languages have less difficulty in speech recognition. On the other hand, korean consists of vowels and bottom consonants, so it is inappropriate to use the letters obtained from the voice synthesis system. That improving the conventional structure speech recognition can the correct words recognition. In order to solve this problem, a new algorithm was added to the existing speech recognition structure to increase the speech recognition rate. Perform the preprocessing process of the word and then token the results. After combining the result processed in the Levenshtein distance algorithm and the hashing algorithm, the normalized words is output through the consonant comparison algorithm. The final result word is compared with the standardized table and output if it exists, registered in the table dose not exists. The experimental environment was developed by using a smartphone application. The proposed structure shows that the recognition rate is improved by 2% in standard language and 7% in dialect.

Classification of Three Different Emotion by Physiological Parameters

  • Jang, Eun-Hye;Park, Byoung-Jun;Kim, Sang-Hyeob;Sohn, Jin-Hun
    • Journal of the Ergonomics Society of Korea
    • /
    • v.31 no.2
    • /
    • pp.271-279
    • /
    • 2012
  • Objective: This study classified three different emotional states(boredom, pain, and surprise) using physiological signals. Background: Emotion recognition studies have tried to recognize human emotion by using physiological signals. It is important for emotion recognition to apply on human-computer interaction system for emotion detection. Method: 122 college students participated in this experiment. Three different emotional stimuli were presented to participants and physiological signals, i.e., EDA(Electrodermal Activity), SKT(Skin Temperature), PPG(Photoplethysmogram), and ECG (Electrocardiogram) were measured for 1 minute as baseline and for 1~1.5 minutes during emotional state. The obtained signals were analyzed for 30 seconds from the baseline and the emotional state and 27 features were extracted from these signals. Statistical analysis for emotion classification were done by DFA(discriminant function analysis) (SPSS 15.0) by using the difference values subtracting baseline values from the emotional state. Results: The result showed that physiological responses during emotional states were significantly differed as compared to during baseline. Also, an accuracy rate of emotion classification was 84.7%. Conclusion: Our study have identified that emotions were classified by various physiological signals. However, future study is needed to obtain additional signals from other modalities such as facial expression, face temperature, or voice to improve classification rate and to examine the stability and reliability of this result compare with accuracy of emotion classification using other algorithms. Application: This could help emotion recognition studies lead to better chance to recognize various human emotions by using physiological signals as well as is able to be applied on human-computer interaction system for emotion recognition. Also, it can be useful in developing an emotion theory, or profiling emotion-specific physiological responses as well as establishing the basis for emotion recognition system in human-computer interaction.

Speaker Adapted Real-time Dialogue Speech Recognition Considering Korean Vocal Sound System (한국어 음운체계를 고려한 화자적응 실시간 단모음인식에 관한 연구)

  • Hwang, Seon-Min;Yun, Han-Kyung;Song, Bok-Hee
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.6 no.4
    • /
    • pp.201-207
    • /
    • 2013
  • Voice Recognition technique has been developed and it has been actively applied to various information devices such as smart phones and car navigation system. But the basic research technique related the speech recognition is based on research results in English. Since the lip sync producing generally requires tedious hand work of animators and it serious affects the animation producing cost and development period to get a high quality lip animation. In this research, a real time processed automatic lip sync algorithm for virtual characters in digital contents is studied by considering Korean vocal sound system. This suggested algorithm contributes to produce a natural lip animation with the lower producing cost and the shorter development period.

A Study of Connection of Facility DB for People with Disability to Smartphone for Location and Voice Recognition and QR Code Recognition and to Navigation (위치 음성인식 QR코드가 인식된 스마트폰과 네비게이션에 대한 장애인 편의시설 DB 연결)

  • Yang, Sung-Yong;Park, Dea-Woo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2013.10a
    • /
    • pp.177-180
    • /
    • 2013
  • The number of people with a disability registered in the Korea's Ministry of Health and Welfare is more than 2.5 million in 2010, and approximately 2.51 million as of late 2012. In consideration of potential disabilities led by aging which is in fast progress in Korea, the number of people with a disability will further increase. It is necessary to study the method of enabling people with a disability to easily access facilities and of using facilities for people with a disability, in order to comply with the Disability Discrimination Act (DDA) which is enforced in Korea. In this study, a smartphone recently used more and more is used to develop a smartphone which can recognize locations, voice and the QR code for enabling people with a disability to recognize them. The developed smartphone for people with a disability is connected to a database established as a facility for people with a disability, and to the navigation system in the wheelchair accessible call taxis, to enable the disability database to be used. This study will contribute to observing the Disability Discrimination Act of Korea, and improving the national welfare system by enhancing the disability care services.

  • PDF

Color Space Mapping and Medium Access Control Techniques in Visible Light Communication (가시광통신을 위한 색채공간매핑과 MAC 기법 연구)

  • Rahman, Mohammad Shaifur;Kim, Byung-Yeon;Bang, Min-Suk;Park, Young-Il;Kim, Ki-Doo
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.9 no.4
    • /
    • pp.99-107
    • /
    • 2009
  • Visible Light Communication (VLC) is a promising wireless communication technology. It offers huge, worldwide available and free bandwidth without electro-magnetic interference, which makes it very attractive for RF-sensitive operating environments. We propose colored LED-based VLC system for hospital use which includes voice recognition system for operating the medical equipments. New Mr hlation Scheme based on the Light Color Space is suggested to overcome the effect of noise generated by background light. Different color space constellations for different symbol sizes are also suggested which would give better bit error rate performance. Finally, Slotted ALOHA or TDMA medium access control protocols are suggested for multi-user operations.

  • PDF

Implementation of speech recognition based smart mirror system for the elderly (노인을 위한 음성인식 기반의 스마트 미러 시스템 구현)

  • Jung, Wonseok;Kwon, Guhyeon;Seo, Jeongwook;Song, Myungkyu
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.05a
    • /
    • pp.727-729
    • /
    • 2017
  • According to the National Statistical Office and the Ministry of Health and Welfare Family Department, Korea has already estimated that the proportion of the elderly population aged 65 or over is estimated to be higher than the global average, and has already entered the "Aging Society". Moreover, it is estimated that the quality of life of the people improved by reaching the information society earlier than expected. However, unlike young people, elderly people show clear differences in access and use of information that is difficult to access new machines easily and cause intergenerational digital divide phenomenon. In order to solve such a phenomenon, we implemented a speech recognition based smart mirror system for elderly people. The installed system can provide lighting control, weather forecast and subway time information search service of the using voice recognition. Through the implemented system, it is expected that the elderly will have accessibility to new machines, information that gives a feeling of participating in the change of the new era will be obtained and the effect of living convenience will be expected.

  • PDF

Lip-reading System based on Bayesian Classifier (베이지안 분류를 이용한 립 리딩 시스템)

  • Kim, Seong-Woo;Cha, Kyung-Ae;Park, Se-Hyun
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.25 no.4
    • /
    • pp.9-16
    • /
    • 2020
  • Pronunciation recognition systems that use only video information and ignore voice information can be applied to various customized services. In this paper, we develop a system that applies a Bayesian classifier to distinguish Korean vowels via lip shapes in images. We extract feature vectors from the lip shapes of facial images and apply them to the designed machine learning model. Our experiments show that the system's recognition rate is 94% for the pronunciation of 'A', and the system's average recognition rate is approximately 84%, which is higher than that of the CNN tested for comparison. Our results show that our Bayesian classification method with feature values from lip region landmarks is efficient on a small training set. Therefore, it can be used for application development on limited hardware such as mobile devices.

Automatic Detection of Korean Accentual Phrase Boundaries

  • Lee, Ki-Yeong;Song, Min-Suck
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.1E
    • /
    • pp.27-31
    • /
    • 1999
  • Recent linguistic researches have brought into focus the relations between prosodic structures and syntactic, semantic or phonological structures. Most of them prove that prosodic information is available for understanding syntactic, semantic and discourse structures. But this result has not been integrated yet into recent Korean speech recognition or understanding systems. This study, as a part of integrating prosodic information into the speech recognition system, proposes an automatic detection technique of Korean accentual phrase boundaries by using one-stage DP, and the normalized pitch pattern. For making the normalized pitch pattern, this study proposes a method of modified normalization for Korean spoken language. For the experiment, this study employs 192 sentential speech data of 12 men's voice spoken in standard Korean, in which 720 accentual phrases are included, and 74.4% of the accentual phrase boundaries are correctly detected while 14.7% are the false detection rate.

  • PDF