• Title/Summary/Keyword: Voice recognition system

Search Result 334, Processing Time 0.022 seconds

A study on the lip shape recognition algorithm using 3-D Model (3차원 모델을 이용한 입모양 인식 알고리즘에 관한 연구)

  • 남기환;배철수
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.6 no.5
    • /
    • pp.783-788
    • /
    • 2002
  • Recently, research and developmental direction of communication system is concurrent adopting voice data and face image in speaking to provide more higher recognition rate then in the case of only voice data. Therefore, we present a method of lipreading in speech image sequence by using the 3-D facial shape model. The method use a feature information of the face image such as the opening-level of lip, the movement of jaw, and the projection height of lip. At first, we adjust the 3-D face model to speeching face Image sequence. Then, to get a feature information we compute variance quantity from adjusted 3-D shape model of image sequence and use the variance quality of the adjusted 3-D model as recognition parameters. We use the intensity inclination values which obtaining from the variance in 3-D feature points as the separation of recognition units from the sequential image. After then, we use discrete HMM algorithm at recognition process, depending on multiple observation sequence which considers the variance of 3-D feature point fully. As a result of recognition experiment with the 8 Korean vowels and 2 Korean consonants, we have about 80% of recognition rate for the plosives md vowels.

Monosyllable Speech Recognition through Facial Movement Analysis (안면 움직임 분석을 통한 단음절 음성인식)

  • Kang, Dong-Won;Seo, Jeong-Woo;Choi, Jin-Seung;Choi, Jae-Bong;Tack, Gye-Rae
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.63 no.6
    • /
    • pp.813-819
    • /
    • 2014
  • The purpose of this study was to extract accurate parameters of facial movement features using 3-D motion capture system in speech recognition technology through lip-reading. Instead of using the features obtained through traditional camera image, the 3-D motion system was used to obtain quantitative data for actual facial movements, and to analyze 11 variables that exhibit particular patterns such as nose, lip, jaw and cheek movements in monosyllable vocalizations. Fourteen subjects, all in 20s of age, were asked to vocalize 11 types of Korean vowel monosyllables for three times with 36 reflective markers on their faces. The obtained facial movement data were then calculated into 11 parameters and presented as patterns for each monosyllable vocalization. The parameter patterns were performed through learning and recognizing process for each monosyllable with speech recognition algorithms with Hidden Markov Model (HMM) and Viterbi algorithm. The accuracy rate of 11 monosyllables recognition was 97.2%, which suggests the possibility of voice recognition of Korean language through quantitative facial movement analysis.

Integrated System of Mobile Manipulator with Speech Recognition and Deep Learning-based Object Detection (음성인식과 딥러닝 기반 객체 인식 기술이 접목된 모바일 매니퓰레이터 통합 시스템)

  • Jang, Dongyeol;Yoo, Seungryeol
    • The Journal of Korea Robotics Society
    • /
    • v.16 no.3
    • /
    • pp.270-275
    • /
    • 2021
  • Most of the initial forms of cooperative robots were intended to repeat simple tasks in a given space. So, they showed no significant difference from industrial robots. However, research for improving worker's productivity and supplementing human's limited working hours is expanding. Also, there have been active attempts to use it as a service robot by applying AI technology. In line with these social changes, we produced a mobile manipulator that can improve the worker's efficiency and completely replace one person. First, we combined cooperative robot with mobile robot. Second, we applied speech recognition technology and deep learning based object detection. Finally, we integrated all the systems by ROS (robot operating system). This system can communicate with workers by voice and drive autonomously and perform the Pick & Place task.

Development of IoT System Based on Context Awareness to Assist the Visually Impaired

  • Song, Mi-Hwa
    • International Journal of Advanced Culture Technology
    • /
    • v.9 no.4
    • /
    • pp.320-328
    • /
    • 2021
  • As the number of visually impaired people steadily increases, interest in independent walking is also increasing. However, there are various inconveniences in the independent walking of the visually impaired at present, reducing the quality of life of the visually impaired. The white cane, which is an existing walking aid for the visually impaired, has difficulty in recognizing upper obstacles and obstacles outside the effective distance. In addition, it is inconvenient to cross the street because the sound signal to help the visually impaired cross the crosswalk is lacking or damaged. These factors make it difficult for the visually impaired to walk independently. Therefore, we propose the design of an embedded system that provides traffic light recognition through object recognition technology, voice guidance using TTS, and upper obstacle recognition through ultrasonic sensors so that blind people can realize safe and high-quality independent walking.

Selective Speech Feature Extraction using Channel Similarity in CHMM Vocabulary Recognition (CHMM 어휘인식에서 채널 유사성을 이용한 선택적 음성 특징 추출)

  • Oh, Sang Yeon
    • Journal of Digital Convergence
    • /
    • v.11 no.10
    • /
    • pp.453-458
    • /
    • 2013
  • HMM Speech recognition systems have a few weaknesses, including failure to recognize speech due to the mixing of environment noise other voices. In this paper, we propose a speech feature extraction methode using CHMM for extracting selected target voice from mixture of voices and noises. we make use of channel similarity and correlate relation for the selective speech extraction composes. This proposed method was validated by showing that the average distortion of separation of the technique decreased by 0.430 dB. It was shown that the performance of the selective feature extraction is better than another system.

Computerization and Application of Hangeul Standard Pronunciation Rule (음성처리를 위한 표준 발음법의 전산화)

  • 이계영
    • Proceedings of the IEEK Conference
    • /
    • 2003.07d
    • /
    • pp.1363-1366
    • /
    • 2003
  • This paper introduces computerized version of Hangout(Korean Language) Standard Pronunciation Rule that can be used in Korean processing systems such as Korean voice synthesis system and Korean voice recognition system. For this purpose, we build Petri net models for each items of the Standard Pronunciation Rule, and then integrate them into the vocal sound conversion table. The reversion of Hangul Standard Pronunciation Rule regulates the way of matching vocal sounds into grammatically correct written characters. This paper presents not only the vocal sound conversion table but also character conversion table obtained by reversely converting the vocal sound conversion table. Making use of these tables, we have implemented a Hangeul character into a vocal sound system and a Korean vocal sound into character conversion system, and tested them with various data sets reflecting all the items of the Standard Pronunciation Rule to verify the soundness and completeness of our tables. The test results shows that the tables improves the process speed in addition to the soundness and completeness.

  • PDF

Implementation of User-friendly Intelligent Space for Ubiquitous Computing (유비쿼터스 컴퓨팅을 위한 사용자 친화적 지능형 공간 구현)

  • Choi, Jong-Moo;Baek, Chang-Woo;Koo, Ja-Kyoung;Choi, Yong-Suk;Cho, Seong-Je
    • The KIPS Transactions:PartD
    • /
    • v.11D no.2
    • /
    • pp.443-452
    • /
    • 2004
  • The paper presents an intelligent space management system for ubiquitous computing. The system is basically a home/office automation system that could control light, electronic key, and home appliances such as TV and audio. On top of these basic capabilities, there are four elegant features in the system. First, we can access the system using either a cellular Phone or using a browser on the PC connected to the Internet, so that we control the system at any time and any place. Second, to provide more human-oriented interface, we integrate voice recognition functionalities into the system. Third, the system supports not only reactive services but also proactive services, based on the regularities of user behavior. Finally, by exploiting embedded technologies, the system could be run on the hardware that has less-processing power and storage. We have implemented the system on the embedded board consisting of StrongARM CPU with 205MHz, 32MB SDRAM, 16MB NOR-type flash memory, and Relay box. Under these hardware platforms, software components such as embedded Linux, HTK voice recognition tools, GoAhead Web Server, and GPIO driver are cooperated to support user-friendly intelligent space.

Implementation of the Multi-Channel Speech Recognition System for the Telephone Speech (전화음성인식을 위한 멀티채널 음성인식 시스템 구현)

  • Yi Siong-Hun;Suh Youngjoo;Kang Dong-Gyu
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.179-182
    • /
    • 2000
  • 본 논문은 전화음성 서비스 시스템의 핵심 기술인 멀티채널 음성인식 시스템의 구현에 대해서 기술하고자 한다. 구현한 시스템은 전화망 인터페이스 모듈, 음성입력 모듈, 음성인식 모듈, 및 서비스 제어모듈로 구성되어 있다. 전화망 인터페이스 모듈은 전화망을 이용한 교환기와의 호 처리 및 이벤트 처리를 담당하며, 전화망 접속카드와 밀접한 관계를 가지고 있다. 음성입력 및 인식 모들은 호 접속이 이루어진 채널로부터 음성을 입력받아 단어인식 기능을 수행하는 부분으로서 멀티 채널을 수용할 수 있는 구조로 설계되어 있다. 음성인식 모델은 문맥 종속형 CHMM 모델이며, 각각의 HMM 모델은 3-state, skip path 로 구성되어 있다. 음성인식 모듈내의 함수들은 모두 re-entrant 하도록 구성함으로써 멀티 채별이 가능하며, 각각의 채널은 모두 독립적인 메모리 공간에서 동작하도록 되어있다. 이와 같은 멀티채널 전화음성인식 시스템은 Dialogic보드를 이용하여 Windows NT에서 동작하도록 구현하였다. 실험결과, 구현된 시스템은 실시간으로 상용서비스가 가능한 인식율을 보였으며 원활한 멀티채널 지원이 가능하였다.

  • PDF

Emotion Recognition of Facial Expression using the Hybrid Feature Extraction (혼합형 특징점 추출을 이용한 얼굴 표정의 감성 인식)

  • Byun, Kwang-Sub;Park, Chang-Hyun;Sim, Kwee-Bo
    • Proceedings of the KIEE Conference
    • /
    • 2004.05a
    • /
    • pp.132-134
    • /
    • 2004
  • Emotion recognition between human and human is done compositely using various features that are face, voice, gesture and etc. Among them, it is a face that emotion expression is revealed the most definitely. Human expresses and recognizes a emotion using complex and various features of the face. This paper proposes hybrid feature extraction for emotions recognition from facial expression. Hybrid feature extraction imitates emotion recognition system of human by combination of geometrical feature based extraction and color distributed histogram. That is, it can robustly perform emotion recognition by extracting many features of facial expression.

  • PDF

Emotion Recognition using Short-Term Multi-Physiological Signals

  • Kang, Tae-Koo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.3
    • /
    • pp.1076-1094
    • /
    • 2022
  • Technology for emotion recognition is an essential part of human personality analysis. To define human personality characteristics, the existing method used the survey method. However, there are many cases where communication cannot make without considering emotions. Hence, emotional recognition technology is an essential element for communication but has also been adopted in many other fields. A person's emotions are revealed in various ways, typically including facial, speech, and biometric responses. Therefore, various methods can recognize emotions, e.g., images, voice signals, and physiological signals. Physiological signals are measured with biological sensors and analyzed to identify emotions. This study employed two sensor types. First, the existing method, the binary arousal-valence method, was subdivided into four levels to classify emotions in more detail. Then, based on the current techniques classified as High/Low, the model was further subdivided into multi-levels. Finally, signal characteristics were extracted using a 1-D Convolution Neural Network (CNN) and classified sixteen feelings. Although CNN was used to learn images in 2D, sensor data in 1D was used as the input in this paper. Finally, the proposed emotional recognition system was evaluated by measuring actual sensors.