• Title/Summary/Keyword: Voice recognition system

Search Result 334, Processing Time 0.026 seconds

An Optical Character Recognition Method using a Smartphone Gyro Sensor for Visually Impaired Persons (스마트폰 자이로센서를 이용한 시각장애인용 광학문자인식 방법)

  • Kwon, Soon-Kak;Kim, Heung-Jun
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.21 no.4
    • /
    • pp.13-20
    • /
    • 2016
  • It is possible to implement an optical character recognition system using a high-resolution camera mounted on smart phones in the modern society. Further, characters extracted from the implemented application is possible to provide the voice service for the visually impaired person by using TTS. But, it is difficult for the visually impaired person to properly shoot the objects that character information are included, because it is very hard to accurately understand the current state of the object. In this paper, we propose a method of inducing an appropriate shooting for the visually impaired persons by using a smartphone gyro sensor. As a result of simulation using the implemented program, we were able to see that it is possible to recognize the more character from the same object using the proposed method.

Implementation of a Refusable Human-Robot Interaction Task with Humanoid Robot by Connecting Soar and ROS (Soar (State Operator and Result)와 ROS 연계를 통해 거절가능 HRI 태스크의 휴머노이드로봇 구현)

  • Dang, Chien Van;Tran, Tin Trung;Pham, Trung Xuan;Gil, Ki-Jong;Shin, Yong-Bin;Kim, Jong-Wook
    • The Journal of Korea Robotics Society
    • /
    • v.12 no.1
    • /
    • pp.55-64
    • /
    • 2017
  • This paper proposes combination of a cognitive agent architecture named Soar (State, operator, and result) and ROS (Robot Operating System), which can be a basic framework for a robot agent to interact and cope with its environment more intelligently and appropriately. The proposed Soar-ROS human-robot interaction (HRI) agent understands a set of human's commands by voice recognition and chooses to properly react to the command according to the symbol detected by image recognition, implemented on a humanoid robot. The robotic agent is allowed to refuse to follow an inappropriate command like "go" after it has seen the symbol 'X' which represents that an abnormal or immoral situation has occurred. This simple but meaningful HRI task is successfully experimented on the proposed Soar-ROS platform with a small humanoid robot, which implies that extending the present hybrid platform to artificial moral agent is possible.

Acoustic parameters for induced emotion categorizing and dimensional approach (자연스러운 정서 반응의 범주 및 차원 분류에 적합한 음성 파라미터)

  • Park, Ji-Eun;Park, Jeong-Sik;Sohn, Jin-Hun
    • Science of Emotion and Sensibility
    • /
    • v.16 no.1
    • /
    • pp.117-124
    • /
    • 2013
  • This study examined that how precisely MFCC, LPC, energy, and pitch related parameters of the speech data, which have been used mainly for voice recognition system could predict the vocal emotion categories as well as dimensions of vocal emotion. 110 college students participated in this experiment. For more realistic emotional response, we used well defined emotion-inducing stimuli. This study analyzed the relationship between the parameters of MFCC, LPC, energy, and pitch of the speech data and four emotional dimensions (valence, arousal, intensity, and potency). Because dimensional approach is more useful for realistic emotion classification. It results in the best vocal cue parameters for predicting each of dimensions by stepwise multiple regression analysis. Emotion categorizing accuracy analyzed by LDA is 62.7%, and four dimension regression models are statistically significant, p<.001. Consequently, this result showed the possibility that the parameters could also be applied to spontaneous vocal emotion recognition.

  • PDF

Accelerometer-based Gesture Recognition for Robot Interface (로봇 인터페이스 활용을 위한 가속도 센서 기반 제스처 인식)

  • Jang, Min-Su;Cho, Yong-Suk;Kim, Jae-Hong;Sohn, Joo-Chan
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.53-69
    • /
    • 2011
  • Vision and voice-based technologies are commonly utilized for human-robot interaction. But it is widely recognized that the performance of vision and voice-based interaction systems is deteriorated by a large margin in the real-world situations due to environmental and user variances. Human users need to be very cooperative to get reasonable performance, which significantly limits the usability of the vision and voice-based human-robot interaction technologies. As a result, touch screens are still the major medium of human-robot interaction for the real-world applications. To empower the usability of robots for various services, alternative interaction technologies should be developed to complement the problems of vision and voice-based technologies. In this paper, we propose the use of accelerometer-based gesture interface as one of the alternative technologies, because accelerometers are effective in detecting the movements of human body, while their performance is not limited by environmental contexts such as lighting conditions or camera's field-of-view. Moreover, accelerometers are widely available nowadays in many mobile devices. We tackle the problem of classifying acceleration signal patterns of 26 English alphabets, which is one of the essential repertoires for the realization of education services based on robots. Recognizing 26 English handwriting patterns based on accelerometers is a very difficult task to take over because of its large scale of pattern classes and the complexity of each pattern. The most difficult problem that has been undertaken which is similar to our problem was recognizing acceleration signal patterns of 10 handwritten digits. Most previous studies dealt with pattern sets of 8~10 simple and easily distinguishable gestures that are useful for controlling home appliances, computer applications, robots etc. Good features are essential for the success of pattern recognition. To promote the discriminative power upon complex English alphabet patterns, we extracted 'motion trajectories' out of input acceleration signal and used them as the main feature. Investigative experiments showed that classifiers based on trajectory performed 3%~5% better than those with raw features e.g. acceleration signal itself or statistical figures. To minimize the distortion of trajectories, we applied a simple but effective set of smoothing filters and band-pass filters. It is well known that acceleration patterns for the same gesture is very different among different performers. To tackle the problem, online incremental learning is applied for our system to make it adaptive to the users' distinctive motion properties. Our system is based on instance-based learning (IBL) where each training sample is memorized as a reference pattern. Brute-force incremental learning in IBL continuously accumulates reference patterns, which is a problem because it not only slows down the classification but also downgrades the recall performance. Regarding the latter phenomenon, we observed a tendency that as the number of reference patterns grows, some reference patterns contribute more to the false positive classification. Thus, we devised an algorithm for optimizing the reference pattern set based on the positive and negative contribution of each reference pattern. The algorithm is performed periodically to remove reference patterns that have a very low positive contribution or a high negative contribution. Experiments were performed on 6500 gesture patterns collected from 50 adults of 30~50 years old. Each alphabet was performed 5 times per participant using $Nintendo{(R)}$ $Wii^{TM}$ remote. Acceleration signal was sampled in 100hz on 3 axes. Mean recall rate for all the alphabets was 95.48%. Some alphabets recorded very low recall rate and exhibited very high pairwise confusion rate. Major confusion pairs are D(88%) and P(74%), I(81%) and U(75%), N(88%) and W(100%). Though W was recalled perfectly, it contributed much to the false positive classification of N. By comparison with major previous results from VTT (96% for 8 control gestures), CMU (97% for 10 control gestures) and Samsung Electronics(97% for 10 digits and a control gesture), we could find that the performance of our system is superior regarding the number of pattern classes and the complexity of patterns. Using our gesture interaction system, we conducted 2 case studies of robot-based edutainment services. The services were implemented on various robot platforms and mobile devices including $iPhone^{TM}$. The participating children exhibited improved concentration and active reaction on the service with our gesture interface. To prove the effectiveness of our gesture interface, a test was taken by the children after experiencing an English teaching service. The test result showed that those who played with the gesture interface-based robot content marked 10% better score than those with conventional teaching. We conclude that the accelerometer-based gesture interface is a promising technology for flourishing real-world robot-based services and content by complementing the limits of today's conventional interfaces e.g. touch screen, vision and voice.

A Study on Citizen Participation System based on Design Thinking, Design Science - Smart City case

  • SUH, Eung-Kyo
    • The Journal of Economics, Marketing and Management
    • /
    • v.9 no.2
    • /
    • pp.11-20
    • /
    • 2021
  • Purpose: The importance of creativity has been emphasized in the transition from industrial society to knowledge-based society. Recently, design thinking has attracted great attention as one of the ways to increase the creativity of the organization. From the perspective of solving urban problems through collaboration between technology and citizens, the active participation of citizens is indispensable for realizing smart cities. Research design, data and methodology: From the perspective of solving urban problems through collaboration between technology and citizens, the active participation of citizens is indispensable for realizing smart cities. Results: Therefore, the purpose of this research was to design a citizen-participation type system and contents using a specific space to realize a smart city. This system utilizes the concept of space as a tool to promote innovation activities with the participation of citizens and makes it easy for users of space to participate based on urban problems derived from living labs and the internal structure and user flow line have been designed. Conclusions: It was been also used voice recognition, artificial intelligence, the Internet of Things, and big data as important technologies for experiencing smart cities. The system and content were designed with an emphasis on allowing citizens to directly recognize and experience smart city technology, especially through space-based information visualization and multi-faceted stimulus elements.

A Real-time Bus Arrival Notification System for Visually Impaired Using Deep Learning (딥 러닝을 이용한 시각장애인을 위한 실시간 버스 도착 알림 시스템)

  • Seyoung Jang;In-Jae Yoo;Seok-Yoon Kim;Youngmo Kim
    • Journal of the Semiconductor & Display Technology
    • /
    • v.22 no.2
    • /
    • pp.24-29
    • /
    • 2023
  • In this paper, we propose a real-time bus arrival notification system using deep learning to guarantee movement rights for the visually impaired. In modern society, by using location information of public transportation, users can quickly obtain information about public transportation and use public transportation easily. However, since the existing public transportation information system is a visual system, the visually impaired cannot use it. In Korea, various laws have been amended since the 'Act on the Promotion of Transportation for the Vulnerable' was enacted in June 2012 as the Act on the Movement Rights of the Blind, but the visually impaired are experiencing inconvenience in using public transportation. In particular, from the standpoint of the visually impaired, it is impossible to determine whether the bus is coming soon, is coming now, or has already arrived with the current system. In this paper, we use deep learning technology to learn bus numbers and identify upcoming bus numbers. Finally, we propose a method to notify the visually impaired by voice that the bus is coming by using TTS technology.

  • PDF

Study on Gesture and Voice-based Interaction in Perspective of a Presentation Support Tool

  • Ha, Sang-Ho;Park, So-Young;Hong, Hye-Soo;Kim, Nam-Hun
    • Journal of the Ergonomics Society of Korea
    • /
    • v.31 no.4
    • /
    • pp.593-599
    • /
    • 2012
  • Objective: This study aims to implement a non-contact gesture-based interface for presentation purposes and to analyze the effect of the proposed interface as information transfer assisted device. Background: Recently, research on control device using gesture recognition or speech recognition is being conducted with rapid technological growth in UI/UX area and appearance of smart service products which requires a new human-machine interface. However, few quantitative researches on practical effects of the new interface type have been done relatively, while activities on system implementation are very popular. Method: The system presented in this study is implemented with KINECT$^{(R)}$ sensor offered by Microsoft Corporation. To investigate whether the proposed system is effective as a presentation support tool or not, we conduct experiments by giving several lectures to 40 participants in both a traditional lecture room(keyboard-based presentation control) and a non-contact gesture-based lecture room(KINECT-based presentation control), evaluating their interests and immersion based on contents of the lecture and lecturing methods, and analyzing their understanding about contents of the lecture. Result: We check that whether the gesture-based presentation system can play effective role as presentation supporting tools or not depending on the level of difficulty of contents using ANOVA. Conclusion: We check that a non-contact gesture-based interface is a meaningful tool as a sportive device when delivering easy and simple information. However, the effect can vary with the contents and the level of difficulty of information provided. Application: The results presented in this paper might help to design a new human-machine(computer) interface for communication support tools.

A Deep Learning System for Emotional Cat Sound Classification and Generation (감정별 고양이 소리 분류 및 생성 딥러닝 시스템)

  • Joo Yong Shim;SungKi Lim;Jong-Kook Kim
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.10
    • /
    • pp.492-496
    • /
    • 2024
  • Cats are known to express their emotions through a variety of vocalizations during interactions. These sounds reflect their emotional states, making the understanding and interpretation of these sounds crucial for more effective communication. Recent advancements in artificial intelligence has introduced research related to emotion recognition, particularly focusing on the analysis of voice data using deep learning models. Building on this background, the study aims to develop a deep learning system that classifies and generates cat sounds based on their emotional content. The classification model is trained to accurately categorize cat vocalizations by emotion. The sound generation model, which uses deep learning based models such as SampleRNN, is designed to produce cat sounds that reflect specific emotional states. The study finally proposes an integrated system that takes recorded cat vocalizations, classify them by emotion, and generate cat sounds based on user requirements.

A Study on the Development of Text Communication System based on AIS and ECDIS for Safe Navigation (항해안전을 위한 AIS와 ECDIS 기반의 문자통신시스템 개발에 관한 연구)

  • Ahn, Young-Joong;Kang, Suk-Young;Lee, Yun-Sok
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.21 no.4
    • /
    • pp.403-408
    • /
    • 2015
  • A text-based communication system has been developed with a communication function on AIS and display and input function on ECDIS as a way to complement voice communication. It features no linguistic error and is not affected by VHF restrictions on use and noise. The text communication system is designed to use messages for clear intentions and further improves convenience of users by using various UI through software. It works without additional hardware installation and modification and can transmit a sentence by selecting only via Message Banner Interface without keyboard input and furthermore has a advantage to enhance processing speed through its own message coding and decoding. It is determined as the most useful alternative to reduce language limitations and recognition errors of the user and solve the problem of various voice communications on VHF. In addition, it will help to prevent collisions between ships with decrease in VHF use, accurate communication and request of cooperation based on text at heavy traffic areas.

Overview on Smart Sensor Technology for Biometrics in IoT Era (사물인터넷 시대의 생체인식 스마트 센서 기술과 연구 동향)

  • Kim, Kwang-Seok;Kim, Dae Up
    • Journal of the Microelectronics and Packaging Society
    • /
    • v.23 no.2
    • /
    • pp.29-35
    • /
    • 2016
  • With the pace of rapid innovation in technology of IoT (Internet of Things) and smart devices, biometric technology becomes one of the most progressive industries. Recent trends in biometrics show most are focused on embedding biometric sensors in mobile devices for user authentication. Multifactor biometrics such as fingerprint, retina, voice, etc. are considering as identification system to provide users with services more secured and convenient. Here we, therefore, demonstrate some major technologies and market trends of mobile biometric technology with its concerns and issues.