• Title/Summary/Keyword: Voice recognition system

Search Result 334, Processing Time 0.031 seconds

An User-Friendly Kiosk System Based on Deep Learning (딥러닝 기반 사용자 친화형 키오스크 시스템)

  • Su Yeon Kang;Yu Jin Lee;Hyun Ah Jung;Seung A Cho;Hyung Gyu Lee
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.29 no.1
    • /
    • pp.1-13
    • /
    • 2024
  • This study aims to provide a customized dynamic kiosk screen that considers user characteristics to cope with changes caused by increased use of kiosks. In order to optimize the screen composition according to the characteristics of the digital vulnerable group such as the visually impaired, the elderly, children, and wheelchair users, etc., users are classified into nine categories based on real-time analysis of user characteristics (wheelchair use, visual impairment, age, etc.). The kiosk screen is dynamically adjusted according to the characteristics of the user to provide efficient services. This study shows that the system communication and operation were performed in the embedded environment, and the used object detection, gait recognition, and speech recognition technologies showed accuracy of 74%, 98.9%, and 96%, respectively. The proposed technology was verified for its effectiveness by implementing a prototype, and through this, this study showed the possibility of reducing the digital gap and providing user-friendly "barrier-free kiosk" services.

A Study on LMS Using Effective User Interface in Mobile Environment (모바일 환경에서 효과적인 사용자 인터페이스를 이용한 LMS에 관한 연구)

  • Kim, Si-Jung;Cho, Do-Eun
    • Journal of Advanced Navigation Technology
    • /
    • v.16 no.1
    • /
    • pp.76-81
    • /
    • 2012
  • With the spread of the various mobile devices, the studies on the learning management system based on the u-learning are actively proceeding. The u-learning-based learning management system is very convenient in that there are no restrictions on the various access devices as well as the access time and place. However, the judgments on the authentication for the user and whether learning is focused on are difficult. In this paper, the voice and user face capture interface rather than the common user event oriented interface was applied to the learning management system. When a user is accessing the learning management system, user's registered password is input and login as voice, and the user's learning attitude is judged through the response utterance of simple words during the process of learning through contents. As a result of evaluating the proposed learning management system, the user's learning achievement and concentration were improved, thus enabling the manager to monitor the user's abnormal learning attitude.

Effective Feature Vector for Isolated-Word Recognizer using Vocal Cord Signal (성대신호 기반의 명령어인식기를 위한 특징벡터 연구)

  • Jung, Young-Giu;Han, Mun-Sung;Lee, Sang-Jo
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.3
    • /
    • pp.226-234
    • /
    • 2007
  • In this paper, we develop a speech recognition system using a throat microphone. The use of this kind of microphone minimizes the impact of environmental noise. However, because of the absence of high frequencies and the partially loss of formant frequencies, previous systems developed with those devices have shown a lower recognition rate than systems which use standard microphone signals. This problem has led to researchers using throat microphone signals as supplementary data sources supporting standard microphone signals. In this paper, we present a high performance ASR system which we developed using only a throat microphone by taking advantage of Korean Phonological Feature Theory and a detailed throat signal analysis. Analyzing the spectrum and the result of FFT of the throat microphone signal, we find that the conventional MFCC feature vector that uses a critical pass filter does not characterize the throat microphone signals well. We also describe the conditions of the feature extraction algorithm which make it best suited for throat microphone signal analysis. The conditions involve (1) a sensitive band-pass filter and (2) use of feature vector which is suitable for voice/non-voice classification. We experimentally show that the ZCPA algorithm designed to meet these conditions improves the recognizer's performance by approximately 16%. And we find that an additional noise-canceling algorithm such as RAST A results in 2% more performance improvement.

A Study on the Weight Allocation Method of Humanist Input Value and Multiplex Modality using Tacit Data (암묵 데이터를 활용한 인문학 인풋값과 다중 모달리티의 가중치 할당 방법에 관한 연구)

  • Lee, Won-Tae;Kang, Jang-Mook
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.14 no.4
    • /
    • pp.157-163
    • /
    • 2014
  • User's sensitivity is recognized as a very important parameter for communication between company, government and personnel. Especially in many studies, researchers use voice tone, voice speed, facial expression, moving direction and speed of body, and gestures to recognize the sensitivity. Multiplex modality is more precise than single modality however it has limited recognition rate and overload of data processing according to multi-sensing also an excellent algorithm is needed to deduce the sensing value. That is as each modality has different concept and property, errors might be happened to convert the human sensibility to standard values. To deal with this matter, the sensibility expression modality is needed to be extracted using technologies like analyzing of relational network, understanding of context and digital filter from multiplex modality. In specific situation to recognize the sensibility if the priority modality and other surrounding modalities are processed to implicit values, a robust system can be composed in comparison to the consuming of computer resource. As a result of this paper, it is proposed how to assign the weight of multiplex modality using implicit data.

Elderly Speech Signal Processing: A Systematic Review for Analysis of Gender Innovation (노인음성신호처리: 젠더혁신 분석에 대한 체계적 문헌고찰)

  • Lee, JiYeoun
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.8
    • /
    • pp.148-154
    • /
    • 2019
  • The purpose of this study is to review systematically the literatures on the research of the elderly speech signal processing based on the domestic gender innovation and to introduce the utility and innovation of the gender analysis methods. From 2000 to present, among the 37 research papers published in the Korean Journal, 25 papers were selected according to the inclusion and exclusion criteria. And gender analysis methods were applied to gender research subject and design. Research results show diversity of research field and high gender recognition of R & D team are needed in research and development of engineering of gender innovation perspective. In addition, government-level regulation and research funding should be systematically applied to gender innovation research processes in the elderly voice signal processing and various gender innovation projects. In the future gender innovation in the elderly speech signal processing can contribute to the creation of a new market by developing a voice recognition system and service that reflects the needs of both men and women.

Personalized Smart Mirror using Voice Recognition (음성인식을 이용한 개인맞춤형 스마트 미러)

  • Dae-Cheol, Kang;Jong-Seok, Lim;Gil-Ho, Lee;Beom-Hee, Lee;Hyoung-Keun, Park
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.17 no.6
    • /
    • pp.1121-1128
    • /
    • 2022
  • Information about the present invention is made available for business use. You are helping to use the LCD, you can't use the LCD screen. During software configuration, Raspbian was used to provide the system environment. We made our way through the menu and made our financial through play. It provides various information such as weather, weather, apps, streamer music, and web browser search function, and it can be charged. Currently, the 'Google Assistant' will be provided through the GUI within a predetermined time.

Analysis of Delay Characteristics in Advanced Intelligent Network-Intelligent Peripheral (AIN IP) (차세대 지능망 지능형 정보제공 시스템의 지연 특성 분석)

  • 이일우;최고봉
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.25 no.8A
    • /
    • pp.1124-1133
    • /
    • 2000
  • Advanced Intelligent Network Intelligent Peripheral (AIN IP) is one of the AIN elements which consist of Service Control Point (SCP), Service Switching Point (SSP), and IP for AIN services, such as play announcement, digit collect, voice recognition/synthesis, voice prompt and receipt. This paper, featuring ISUP/INAP protocols, describes the procedures for call setup/release bearer channels between SSP/SCP and IP, todeliver specialized resources through the bearer channels, and it describes the structure and procedure for AIN services such as Automatic Collect Call (ACC), Universal Personal Telecommunication (UPT), and teleVOTing(VOT). In this environments, the delay characteristics of If system is investigated as the performance analysis, Policy establishment.

  • PDF

Generative Interactive Psychotherapy Expert (GIPE) Bot

  • Ayesheh Ahrari Khalaf;Aisha Hassan Abdalla Hashim;Akeem Olowolayemo;Rashidah Funke Olanrewaju
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.4
    • /
    • pp.15-24
    • /
    • 2023
  • One of the objectives and aspirations of scientists and engineers ever since the development of computers has been to interact naturally with machines. Hence features of artificial intelligence (AI) like natural language processing and natural language generation were developed. The field of AI that is thought to be expanding the fastest is interactive conversational systems. Numerous businesses have created various Virtual Personal Assistants (VPAs) using these technologies, including Apple's Siri, Amazon's Alexa, and Google Assistant, among others. Even though many chatbots have been introduced through the years to diagnose or treat psychological disorders, we are yet to have a user-friendly chatbot available. A smart generative cognitive behavioral therapy with spoken dialogue systems support was then developed using a model Persona Perception (P2) bot with Generative Pre-trained Transformer-2 (GPT-2). The model was then implemented using modern technologies in VPAs like voice recognition, Natural Language Understanding (NLU), and text-to-speech. This system is a magnificent device to help with voice-based systems because it can have therapeutic discussions with the users utilizing text and vocal interactive user experience.

A Study of Pedestrian Navigation Service System for Visual Disabilities (시각장애인용 길안내 서비스 시스템에 대한 연구)

  • Jang, Young Gun;Cha, J.H.
    • Journal of rehabilitation welfare engineering & assistive technology
    • /
    • v.11 no.4
    • /
    • pp.315-321
    • /
    • 2017
  • This paper is a study on the design and realization of Pedestrian navigation service system for the visually impaired. As it is an user interface considering visually impaired, voice recognition functioned smartphone was used as the input tool and the Osteoacusis headset, which can vocally guide directions while recognizing the surrounding environment sound, was used as the output tool. Unlike the pre-existing pedestrian navigation smartphone apps, the developed system guides walking direction by the scale of the left and right stereo sound of the headset wearing, and the voice guidance about the forked or curved path is given several meters before according to the speed of the user, and the user is immediately warned of walking opposite direction or proceeding off the path. The system can acquire stable and reliable directional information using the motion tracker with the dynamic heading accuracy of 1.5 degrees. In order to overcome GPS position error, we proposed a robust trajectory planning algorithm for position error. Experimental results for the developed system show that the average directional angle error is 6.82 degrees (standard deviation: 5.98) in the experimental path, which can be stated that it stably navigated the user relatively.

A Korean Multi-speaker Text-to-Speech System Using d-vector (d-vector를 이용한 한국어 다화자 TTS 시스템)

  • Kim, Kwang Hyeon;Kwon, Chul Hong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.3
    • /
    • pp.469-475
    • /
    • 2022
  • To train the model of the deep learning-based single-speaker TTS system, a speech DB of tens of hours and a lot of training time are required. This is an inefficient method in terms of time and cost to train multi-speaker or personalized TTS models. The voice cloning method uses a speaker encoder model to make the TTS model of a new speaker. Through the trained speaker encoder model, a speaker embedding vector representing the timbre of the new speaker is created from the small speech data of the new speaker that is not used for training. In this paper, we propose a multi-speaker TTS system to which voice cloning is applied. The proposed TTS system consists of a speaker encoder, synthesizer and vocoder. The speaker encoder applies the d-vector technique used in the speaker recognition field. The timbre of the new speaker is expressed by adding the d-vector derived from the trained speaker encoder as an input to the synthesizer. It can be seen that the performance of the proposed TTS system is excellent from the experimental results derived by the MOS and timbre similarity listening tests.