• Title/Summary/Keyword: speech technology

Search Result 1,900, Processing Time 0.025 seconds

An User-Friendly Kiosk System Based on Deep Learning (딥러닝 기반 사용자 친화형 키오스크 시스템)

  • Su Yeon Kang;Yu Jin Lee;Hyun Ah Jung;Seung A Cho;Hyung Gyu Lee
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.29 no.1
    • /
    • pp.1-13
    • /
    • 2024
  • This study aims to provide a customized dynamic kiosk screen that considers user characteristics to cope with changes caused by increased use of kiosks. In order to optimize the screen composition according to the characteristics of the digital vulnerable group such as the visually impaired, the elderly, children, and wheelchair users, etc., users are classified into nine categories based on real-time analysis of user characteristics (wheelchair use, visual impairment, age, etc.). The kiosk screen is dynamically adjusted according to the characteristics of the user to provide efficient services. This study shows that the system communication and operation were performed in the embedded environment, and the used object detection, gait recognition, and speech recognition technologies showed accuracy of 74%, 98.9%, and 96%, respectively. The proposed technology was verified for its effectiveness by implementing a prototype, and through this, this study showed the possibility of reducing the digital gap and providing user-friendly "barrier-free kiosk" services.

Robust Real-time Pose Estimation to Dynamic Environments for Modeling Mirror Neuron System (거울 신경 체계 모델링을 위한 동적 환경에 강인한 실시간 자세추정)

  • Jun-Ho Choi;Seung-Min Park
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.3
    • /
    • pp.583-588
    • /
    • 2024
  • With the emergence of Brain-Computer Interface (BCI) technology, analyzing mirror neurons has become more feasible. However, evaluating the accuracy of BCI systems that rely on human thoughts poses challenges due to their qualitative nature. To harness the potential of BCI, we propose a new approach to measure accuracy based on the characteristics of mirror neurons in the human brain that are influenced by speech speed, depending on the ultimate goal of movement. In Chapter 2 of this paper, we introduce mirror neurons and provide an explanation of human posture estimation for mirror neurons. In Chapter 3, we present a powerful pose estimation method suitable for real-time dynamic environments using the technique of human posture estimation. Furthermore, we propose a method to analyze the accuracy of BCI using this robotic environment.

Ultrasensitive Crack-based Mechanosensor Inspired by Spider's Sensory Organ (거미의 감각기관을 모사한 초민감 균열기반 진동압력센서)

  • Suyoun Oh;Tae-il Kim
    • Journal of the Microelectronics and Packaging Society
    • /
    • v.31 no.1
    • /
    • pp.1-6
    • /
    • 2024
  • Spiders detect even tiny vibrations through their vibrational sensory organs. Leveraging their exceptional vibration sensing abilities, they can detect vibrations caused by prey or predators to plan attacks or perceive threats, utilizing them for survival. This paper introduces a nanoscale crack-based sensor mimicking the spider's sensory organ. Inspired by the slit sensory organ used by spiders to detect vibrations, the sensor with the cracks detects vibrations and pressure with high sensitivity. By controlling the depth of these cracks, they developed a sensor capable of detecting external mechanical signals with remarkable sensitivity. This sensor achieves a gauge factor of 16,000 at 2% strain with an applied tensile stress of 10 N. With high signal-to-noise ratio, it accurately recognizes desired vibrations, as confirmed through various evaluations of external force and biological signals (speech pattern, heart rate, etc.). This underscores the potential of utilizing biomimetic technology for the development of new sensors and their application across diverse industrial fields.

Voice Recognition Chatbot System for an Aging Society: Technology Development and Customized UI/UX Design (고령화 사회를 위한 음성 인식 챗봇 시스템 : 기술 개발과 맞춤형 UI/UX 설계)

  • Yun-Ji Jeong;Min-Seong Yu;Joo-Young Oh;Hyeon-Seok Hwang;Won-Whoi Hun
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.24 no.4
    • /
    • pp.9-14
    • /
    • 2024
  • This study developed a voice recognition chatbot system to address depression and loneliness among the elderly in an aging society. The system utilizes the Whisper model, GPT 2.5, and XTTS2 to provide high-performance voice recognition, natural language processing, and text-to-speech conversion. Users can express their emotions and states and receive appropriate responses, with voice recognition functionality using familiar voices for comfort and reassurance. The UX/UI design considers the cognitive responses, visual impairments, and physical limitations of the smart senior generation, using high contrast colors and readable fonts for enhanced usability. This research is expected to improve the quality of life for the elderly through voice-based interfaces.

Identification and Clinical Implications of Novel MYO15A Mutations in a Non-consanguineous Korean Family by Targeted Exome Sequencing

  • Chang, Mun Young;Kim, Ah Reum;Kim, Nayoung K.D.;Lee, Chung;Lee, Kyoung Yeul;Jeon, Woo-Sung;Koo, Ja-Won;Oh, Seung Ha;Park, Woong-Yang;Kim, Dongsup;Choi, Byung Yoon
    • Molecules and Cells
    • /
    • v.38 no.9
    • /
    • pp.781-788
    • /
    • 2015
  • Mutations of MYO15A are generally known to cause severe to profound hearing loss throughout all frequencies. Here, we found two novel MYO15A mutations, c.3871C>T (p.L1291F) and c.5835T>G (p.Y1945X) in an affected individual carrying congenital profound sensorineural hearing loss (SNHL) through targeted resequencing of 134 known deafness genes. The variant, p.L1291F and p.Y1945X, resided in the myosin motor and IQ2 domains, respectively. The p.L1291F variant was predicted to affect the structure of the actin-binding site from three-dimensional protein modeling, thereby interfering with the correct interaction between actin and myosin. From the literature analysis, mutations in the N-terminal domain were more frequently associated with residual hearing at low frequencies than mutations in the other regions of this gene. Therefore we suggest a hypothetical genotype-phenotype correlation whereby MYO15A mutations that affect domains other than the N-terminal domain, lead to profound SNHL throughout all frequencies and mutations that affect the N-terminal domain, result in residual hearing at low frequencies. This genotype-phenotype correlation suggests that preservation of residual hearing during auditory rehabilitation like cochlear implantation should be intended for those who carry mutations in the N-terminal domain and that individuals with mutations elsewhere in MYO15A require early cochlear implantation to timely initiate speech development.

Interactive content development of voice pattern recognition (음성패턴인식 인터랙티브 콘텐츠 개발)

  • Na, Jong-Won
    • Journal of Advanced Navigation Technology
    • /
    • v.16 no.5
    • /
    • pp.864-870
    • /
    • 2012
  • Voice pattern recognition technology to solve the problems of the existing problems and common issues that you may have in language learning content analysis. This is the first problem of language-learning content, online learning posture. Game open another web page through the lesson, but the concentration of the students fell. Have not been able to determine the second issue according Speaking has made the learning process actually reads. Third got a problem with the mechanical process by a learning management system, as well by the teacher in the evaluation of students and students who are learning progress between the difference in the two. Finally, the biggest problem, while maintaining their existing content made to be able to solve the above problem. Speaking learning dedicated learning programs under this background, voice pattern recognition technology learning process for speech recognition and voice recognition capabilities for learning itself has been used in the recognition process the data of the learner's utterance as an audio file of the desired change to a transfer to a specific location of the server or SQL server may be easily inserted into any system or program, any and all applicable content that has already been created without damaging the entire component because the new features were available. Contributed to this paper, active participation in class more interactive teaching methods to change.

Voice Interactions with A. I. Agent : Analysis of Domestic and Overseas IT Companies (A.I.에이전트와의 보이스 인터랙션 : 국내외 IT회사 사례연구)

  • Lee, Seo-Young
    • Journal of Korea Entertainment Industry Association
    • /
    • v.15 no.4
    • /
    • pp.15-29
    • /
    • 2021
  • Many countries and companies are pursuing and developing Artificial intelligence as it is the core technology of the 4th industrial revolution. Global IT companies such as Apple, Microsoft, Amazon, Google and Samsung have all released their own AI assistant hardware products, hoping to increase customer loyalty and capture market share. Competition within the industry for AI agent is intense. AI assistant products that command the biggest market shares and customer loyalty have a higher chance of becoming the industry standard. This study analyzed the current status of major overseas and domestic IT companies in the field of artificial intelligence, and suggested future strategic directions for voice UI technology development and user satisfaction. In terms of B2B technology, it is recommended that IT companies use cloud computing to store big data, innovative artificial intelligence technologies and natural language technologies. Offering voice recognition technologies on the cloud enables smaller companies to take advantage of such technologies at considerably less expense. Companies also consider using GPT-3(Generative Pre-trained Transformer 3) an open source artificial intelligence language processing software that can generate very natural human-like interactions and high levels of user satisfaction. There is a need to increase usefulness and usability to enhance user satisfaction. This study has practical and theoretical implications for industry and academia.

Applying Social Strategies for Breakdown Situations of Conversational Agents: A Case Study using Forewarning and Apology (대화형 에이전트의 오류 상황에서 사회적 전략 적용: 사전 양해와 사과를 이용한 사례 연구)

  • Lee, Yoomi;Park, Sunjeong;Suk, Hyeon-Jeong
    • Science of Emotion and Sensibility
    • /
    • v.21 no.1
    • /
    • pp.59-70
    • /
    • 2018
  • With the breakthrough of speech recognition technology, conversational agents have become pervasive through smartphones and smart speakers. The recognition accuracy of speech recognition technology has developed to the level of human beings, but it still shows limitations on understanding the underlying meaning or intention of words, or understanding long conversation. Accordingly, the users experience various errors when interacting with the conversational agents, which may negatively affect the user experience. In addition, in the case of smart speakers with a voice as the main interface, the lack of feedback on system and transparency was reported as the main issue when the users using. Therefore, there is a strong need for research on how users can better understand the capability of the conversational agents and mitigate negative emotions in error situations. In this study, we applied social strategies, "forewarning" and "apology", to conversational agent and investigated how these strategies affect users' perceptions of the agent in breakdown situations. For the study, we created a series of demo videos of a user interacting with a conversational agent. After watching the demo videos, the participants were asked to evaluate how they liked and trusted the agent through an online survey. A total of 104 respondents were analyzed and found to be contrary to our expectation based on the literature study. The result showed that forewarning gave a negative impression to the user, especially the reliability of the agent. Also, apology in a breakdown situation did not affect the users' perceptions. In the following in-depth interviews, participants explained that they perceived the smart speaker as a machine rather than a human-like object, and for this reason, the social strategies did not work. These results show that the social strategies should be applied according to the perceptions that user has toward agents.

Improving QoS using Cellular-IP/PRC in Hospital Wireless Network

  • Kim, Sung-Hong
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.1 no.2
    • /
    • pp.120-126
    • /
    • 2006
  • In this paper, we propose for improving QoS in hospital wireless network using Cellular-IP/PRC(Paging Route Cache) with Paging Cache and Route Cache in Cellular-IP. Although the Cellular-IP/PRC technology is devised for mobile internet communication, it has its vulnerability in frequent handoff environment. This handoff state machine using differentiated handoff improves quality of services in Cellular-IP/PRC. Suggested algorithm shows better performance than existing technology in wireless mobile internet communication environment. When speech quality is secured considering increment of interference to receive in case of suppose that proposed acceptance method grooves base radio station capacity of transfer node is plenty, and most of contiguity cell transfer node was accepted at groove base radio station with a blow, groove base radio station new trench lake acceptance method based on transmission of a message electric power estimate of transfer node be. Do it so that may apply composing PC(Paging Cache) and RC(Routing Cache) that was used to manage paging and router in radio Internet network in integral management and all nodes as one PRC(Paging Router Cache), and add hand off state machine in transfer node so that can manage hand off of transfer node and Roaming state efficiently, and studies so that achieve connection function at node. Analyze benevolent person who influence on telephone traffic in system environment and forecasts each link currency rank and imbalance degree, forecast most close and important lake interception probability and lake falling off probability, GoS(Grade of Service), efficiency of cell capacity in QoS because applies algorithm proposing based on algorithm use gun send-receive electric power that judge by looking downward link whether currency book was limited and accepts or intercept lake and handles and displays QoS performance improvement.

  • PDF

A Study about the Users's Preferred Playing Speeds on Categorized Video Content using WSOLA method (WSOLA를 이용한 동영상 미세배속 재생 서비스에 대한 콘텐츠별 배속 선호도 분석 연구)

  • Kim, I-Gil
    • Journal of Digital Contents Society
    • /
    • v.16 no.2
    • /
    • pp.291-298
    • /
    • 2015
  • In a fast-paced information technology environment, consumption of video content is changing from one-way television viewing to VOD (Video on Demand) playing anywhere, anytime, on any device. This video-watching trend gives additional importance to videos with fine-speed-control, in addition to the strength of the digital video signal. Currently, many video players provide a fine-speed-control function which can speed up the video to skip a boring part, or slow it down to focus on an exciting scene. The audio information is just as important as the visual information for understanding the content of the speed-controlled video. Thus, a number of algorithms for fine-speed-control video-playing technologies have been proposed to solve the pitch distortion in the audio-processing area. In this study, well-known techniques for prosodic modification of speech signals, WSOLA (Waveform-Similarity-Based Overlap-Add), have been applied to analyze users' needs for fine-speed-control video playing. By surveying the users' preferred speeds on categorized video content and analyzing the results, this paper proposes that various fine-speed adjustments are needed to accommodate users' preferred video consumption.