• Title/Summary/Keyword: Voice recognition system

Search Result 334, Processing Time 0.022 seconds

An Implementation of an Android Mobile System for Extracting and Retrieving Texts from Images (이미지 내 텍스트 추출 및 검색을 위한 안드로이드 모바일 시스템 구현)

  • Go, Eun-Bi;Ha, Yu-Jin;Choi, Soo-Ryum;Lee, Ki-Hoon;Park, Young-Ho
    • Journal of Digital Contents Society
    • /
    • v.12 no.1
    • /
    • pp.57-67
    • /
    • 2011
  • Recently, an interest in a mobile search is increasing according to the growing propagation of smart phones. However, a keypad, which is not appropriate for mobile environment, is the only input media for the mobile search. As an alternative, voice emerged as a new media for the mobile search, but this also has weaknesses. Thus, in the paper, we propose a mobile content called Orthros for searching the Internet using images as an input. Orthros extracts texts from images, and then inserts the texts to public search engines as a keyword. Also, Orthros can repeat searching with the extracted texts by storing result URL to internal databases. As an experiment, we analyze properties of recognizable images and present the implementation method in details.

HunMinJeomUm: Text Extraction and Braille Conversion System for the Learning of the Blind (시각장애인의 학습을 위한 텍스트 추출 및 점자 변환 시스템)

  • Kim, Chae-Ri;Kim, Ji-An;Kim, Yong-Min;Lee, Ye-Ji;Kong, Ki-Sok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.5
    • /
    • pp.53-60
    • /
    • 2021
  • The number of visually impaired and blind people is increasing, but braille translation textbooks for them are insufficient, which violates their rights to education despite their will. In order to guarantee their rights, this paper develops a learning system, HunMinJeomUm, that helps them access textbooks, documents, and photographs that are not available in braille, without the assistance of others. In our system, a smart phone app and web pages are designed to promote the accessibility of the blind, and a braille kit is produced using Arduino and braille modules. The system supports the following functions. First, users select documents or pictures that they want, and the system extracts the text using OCR. Second, the extracted text is converted into voice and braille. Third, a membership registration function is provided so that the user can view the extracted text. Experiments have confirmed that our system generates braille and audio outputs successfully, and provides high OCR recognition rates. The study has also found that even completely blind users can easily access the smart phone app.

Ubiquitous u-Health System using RFID & ZigBee (RFID와 ZigBee를 이용한 유비쿼터스 u-Health 시스템 구현)

  • Kim Jin-Tai;Kwon Youngmi
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.43 no.1 s.343
    • /
    • pp.79-88
    • /
    • 2006
  • In this paper, we designed and implemented ubiquitous u-Health system using RFE and ZigBee. We made a wireless protocol Kit which combines RFE Tag recognition and ZigBee data communication capability. The software is designed and developed on the TinyOS. Wireless communication technologies which hold multi-protocol stacks with RFID and result in the wireless ubiquitous world could be Bluetooth, ZigBee, 802.11x WLAN and so on. The environments that the suggested u-Health system may be used is un-manned nursing, which would be utilized in dense sensor networks such as a hospital. The the size of devices with RFID and ZigBee will be so smaller and smaller as a bracelet, a wrist watch and a ring. The combined wireless RFID-ZigBee system could be applied to applications which requires some actions corresponding to the collected (or sensed) information in WBAN(Wireless Body Area Network) and/or WPAN(Wireless Person Area Network). The proposed ubiquitous u-Health system displays some text-type alert message on LCD which is attached to the system or gives voice alert message to the adequate node users. RFE will be used as various combinations with other wireless technologies for some application-specific purposes.

Research on Generative AI for Korean Multi-Modal Montage App (한국형 멀티모달 몽타주 앱을 위한 생성형 AI 연구)

  • Lim, Jeounghyun;Cha, Kyung-Ae;Koh, Jaepil;Hong, Won-Kee
    • Journal of Service Research and Studies
    • /
    • v.14 no.1
    • /
    • pp.13-26
    • /
    • 2024
  • Multi-modal generation is the process of generating results based on a variety of information, such as text, images, and audio. With the rapid development of AI technology, there is a growing number of multi-modal based systems that synthesize different types of data to produce results. In this paper, we present an AI system that uses speech and text recognition to describe a person and generate a montage image. While the existing montage generation technology is based on the appearance of Westerners, the montage generation system developed in this paper learns a model based on Korean facial features. Therefore, it is possible to create more accurate and effective Korean montage images based on multi-modal voice and text specific to Korean. Since the developed montage generation app can be utilized as a draft montage, it can dramatically reduce the manual labor of existing montage production personnel. For this purpose, we utilized persona-based virtual person montage data provided by the AI-Hub of the National Information Society Agency. AI-Hub is an AI integration platform aimed at providing a one-stop service by building artificial intelligence learning data necessary for the development of AI technology and services. The image generation system was implemented using VQGAN, a deep learning model used to generate high-resolution images, and the KoDALLE model, a Korean-based image generation model. It can be confirmed that the learned AI model creates a montage image of a face that is very similar to what was described using voice and text. To verify the practicality of the developed montage generation app, 10 testers used it and more than 70% responded that they were satisfied. The montage generator can be used in various fields, such as criminal detection, to describe and image facial features.

A Study on Improving of Access to School Library Collection through High School Students' DLS Search Behavior Analysis (고등학생의 DLS 검색행태 분석을 통한 학교도서관 자료 접근성 향상 방안 고찰)

  • Jung, Youngmi;Kang, Bong-Suk
    • Journal of Korean Library and Information Science Society
    • /
    • v.51 no.2
    • /
    • pp.355-379
    • /
    • 2020
  • Digital Library System(DLS) for the school library is a key access tool for school library materials. The purpose of this study was to find ways to improve the accessibility of materials through analysis of students' information search behavior in DLS. Data were collected through recording of 42 participants' DLS search process, and questionnaire. As a result, the search success rate and search satisfaction were found to be lower when the main purpose of DLS is simple leisure reading, information needs are relatively ambiguous, and when user experiences the complicated situations in the search process. The satisfaction level of search time sufficiency was the highest, and the search result satisfaction was the lowest. Besides, there was a need to improve DLS, such as integrated search of other library collection information, the recommendation of related materials, the print output of collection location, voice recognition through mobile apps, and automatic correction of search errors. Through this, the following can be suggested. First, DLS should complement the function of providing career information by reflecting the demand of education consumers. Second, improvements to DLS functionality to the general information retrieval system level must be made. Third, an infrastructure must be established for close cooperation between school library field personnel and DLS management authorities.

Selection of Auditory Icons in Ship Bridge Alarm Management System Using the Sensibility Evaluation (감성평가를 이용한 선교알람관리시스템의 청각아이콘 평가)

  • Oh, Seungbin;Jang, Jun-Hyuk;Park, Jin Hyoung;Kim, Hongtae
    • Journal of Navigation and Port Research
    • /
    • v.37 no.4
    • /
    • pp.401-407
    • /
    • 2013
  • In parallel with the development of ship equipment, bridge systems have been improved, but marine accidents due to human error have not been reduced. Recently, research in nautical bridge equipment has focused on suitable ergonomic designs in order to reduce these errors due to human factors. In a bridge of a ship, there are numerous auditory signals that deliver important information clearly to the sailors. However, only a few studies have been conducted related to the human recognition of these auditory signals. There are three types of auditory signals: voice alarms, abstract sounds, and auditory icons. This study was conducted in order to design more appropriate auditory icons using a sensibility evaluation method. The auditory icons were rated to have five warning situations (engine failure, fire, steering failure, low power, and collision) using the Semantic Differential Method. It is expected that the results of this study will be used as basic data for auditory displays inside bridges and for integrated bridge alarm systems.

Applying Social Strategies for Breakdown Situations of Conversational Agents: A Case Study using Forewarning and Apology (대화형 에이전트의 오류 상황에서 사회적 전략 적용: 사전 양해와 사과를 이용한 사례 연구)

  • Lee, Yoomi;Park, Sunjeong;Suk, Hyeon-Jeong
    • Science of Emotion and Sensibility
    • /
    • v.21 no.1
    • /
    • pp.59-70
    • /
    • 2018
  • With the breakthrough of speech recognition technology, conversational agents have become pervasive through smartphones and smart speakers. The recognition accuracy of speech recognition technology has developed to the level of human beings, but it still shows limitations on understanding the underlying meaning or intention of words, or understanding long conversation. Accordingly, the users experience various errors when interacting with the conversational agents, which may negatively affect the user experience. In addition, in the case of smart speakers with a voice as the main interface, the lack of feedback on system and transparency was reported as the main issue when the users using. Therefore, there is a strong need for research on how users can better understand the capability of the conversational agents and mitigate negative emotions in error situations. In this study, we applied social strategies, "forewarning" and "apology", to conversational agent and investigated how these strategies affect users' perceptions of the agent in breakdown situations. For the study, we created a series of demo videos of a user interacting with a conversational agent. After watching the demo videos, the participants were asked to evaluate how they liked and trusted the agent through an online survey. A total of 104 respondents were analyzed and found to be contrary to our expectation based on the literature study. The result showed that forewarning gave a negative impression to the user, especially the reliability of the agent. Also, apology in a breakdown situation did not affect the users' perceptions. In the following in-depth interviews, participants explained that they perceived the smart speaker as a machine rather than a human-like object, and for this reason, the social strategies did not work. These results show that the social strategies should be applied according to the perceptions that user has toward agents.

Experience Design Guideline for Smart Car Interface (스마트카의 인터페이스를 위한 경험 디자인 가이드라인)

  • Yoo, Hoon Sik;Ju, Da Young
    • Design Convergence Study
    • /
    • v.15 no.1
    • /
    • pp.135-150
    • /
    • 2016
  • Due to the development of communication technology and expansion of Intelligent Transport System (ITS), the car is changing from a simple mechanical device to second living space which has comprehensive convenience function and is evolved into the platform which is playing as an interface for this role. As the interface area to provide various information to the passenger is being expanded, the research importance about smart car based user experience is rising. This study has a research objective to propose the guidelines regarding the smart car user experience elements. In order to conduct this study, smart car user experience elements were defined as function, interaction, and surface and through the discussions of UX/UI experts, 8 representative techniques, 14 representative techniques, and 8 locations of the glass windows were specified for each element. Following, the smart car users' priorities of the experience elements, which were defined through targeting 100 drivers, were analyzed in the form of questionnaire survey. The analysis showed that the users' priorities in applying the main techniques were in the order of safety, distance, and sensibility. The priorities of the production method were in the order of voice recognition, touch, gesture, physical button, and eye tracking. Furthermore, regarding the glass window locations, users prioritized the front of the driver's seat to the back. According to the demographic analysis on gender, there were no significant differences except for two functions. Therefore this showed that the guidelines of male and female can be commonly applied. Through user requirement analysis about individual elements, this study provides the guides about the requirement in each element to be applied to commercialized product with priority.

Study on Improving Maritime English Proficiency Through the Use of a Maritime English Platform (해사영어 플랫폼을 활용한 표준해사영어 실력 향상에 관한 연구)

  • Jin Ki Seor;Young-soo Park;Dongsu Shin;Dae Won Kim
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.29 no.7
    • /
    • pp.930-938
    • /
    • 2023
  • Maritime English is a specialized language system designed for ship operations, maritime safety, and external and internal communication onboard. According to the International Maritime Organization's (IMO) International Convention on Standards of Training, Certification and Watchkeeping for Seafarers (STCW), it is imperative that navigational officers engaged in international voyages have a thorough understanding of Maritime English including the use of Standard Marine Communication Phrases (SMCP). This study measured students' proficiency in Maritime English using a learning and testing platform that includes voice recognition, translation, and word entry tasks to evaluate the resulting improvement in Maritime English exam scores. Furthermore, the study aimed to investigate the level of platform use needed for cadets to qualify as junior navigators. The experiment began by examining the correlation between students' overall English skills and their proficiency in SMCP through an initial test, followed by the evaluation of improvements in their scores and changes in exam duration during the mid-term and final exams. The initial test revealed a significant dif erence in Maritime English test scores among groups based on individual factors, such as TOEIC scores and self-assessment of English ability, and both the mid-term and final tests confirmed substantial score improvements for the group using the platform. This study confirmed the efficacy of a learning platform that could be extensively applied in maritime education and potentially expanded beyond the scope of Maritime English education in the future.

The Audience Behavior-based Emotion Prediction Model for Personalized Service (고객 맞춤형 서비스를 위한 관객 행동 기반 감정예측모형)

  • Ryoo, Eun Chung;Ahn, Hyunchul;Kim, Jae Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.73-85
    • /
    • 2013
  • Nowadays, in today's information society, the importance of the knowledge service using the information to creative value is getting higher day by day. In addition, depending on the development of IT technology, it is ease to collect and use information. Also, many companies actively use customer information to marketing in a variety of industries. Into the 21st century, companies have been actively using the culture arts to manage corporate image and marketing closely linked to their commercial interests. But, it is difficult that companies attract or maintain consumer's interest through their technology. For that reason, it is trend to perform cultural activities for tool of differentiation over many firms. Many firms used the customer's experience to new marketing strategy in order to effectively respond to competitive market. Accordingly, it is emerging rapidly that the necessity of personalized service to provide a new experience for people based on the personal profile information that contains the characteristics of the individual. Like this, personalized service using customer's individual profile information such as language, symbols, behavior, and emotions is very important today. Through this, we will be able to judge interaction between people and content and to maximize customer's experience and satisfaction. There are various relative works provide customer-centered service. Specially, emotion recognition research is emerging recently. Existing researches experienced emotion recognition using mostly bio-signal. Most of researches are voice and face studies that have great emotional changes. However, there are several difficulties to predict people's emotion caused by limitation of equipment and service environments. So, in this paper, we develop emotion prediction model based on vision-based interface to overcome existing limitations. Emotion recognition research based on people's gesture and posture has been processed by several researchers. This paper developed a model that recognizes people's emotional states through body gesture and posture using difference image method. And we found optimization validation model for four kinds of emotions' prediction. A proposed model purposed to automatically determine and predict 4 human emotions (Sadness, Surprise, Joy, and Disgust). To build up the model, event booth was installed in the KOCCA's lobby and we provided some proper stimulative movie to collect their body gesture and posture as the change of emotions. And then, we extracted body movements using difference image method. And we revised people data to build proposed model through neural network. The proposed model for emotion prediction used 3 type time-frame sets (20 frames, 30 frames, and 40 frames). And then, we adopted the model which has best performance compared with other models.' Before build three kinds of models, the entire 97 data set were divided into three data sets of learning, test, and validation set. The proposed model for emotion prediction was constructed using artificial neural network. In this paper, we used the back-propagation algorithm as a learning method, and set learning rate to 10%, momentum rate to 10%. The sigmoid function was used as the transform function. And we designed a three-layer perceptron neural network with one hidden layer and four output nodes. Based on the test data set, the learning for this research model was stopped when it reaches 50000 after reaching the minimum error in order to explore the point of learning. We finally processed each model's accuracy and found best model to predict each emotions. The result showed prediction accuracy 100% from sadness, and 96% from joy prediction in 20 frames set model. And 88% from surprise, and 98% from disgust in 30 frames set model. The findings of our research are expected to be useful to provide effective algorithm for personalized service in various industries such as advertisement, exhibition, performance, etc.