• Title/Summary/Keyword: Voice interface

Search Result 298, Processing Time 0.04 seconds

Trends and Implications of Digital Transformation in Vehicle Experience and Audio User Interface (차내 경험의 디지털 트랜스포메이션과 오디오 기반 인터페이스의 동향 및 시사점)

  • Kim, Kihyun;Kwon, Seong-Geun
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.2
    • /
    • pp.166-175
    • /
    • 2022
  • Digital transformation is driving so many changes in daily life and industry. The automobile industry is in a similar situation. In some cases, element techniques in areas called metabuses are also being adopted, such as 3D animated digital cockpit, around view, and voice AI, etc. Through the growth of the mobile market, the norm of human-computer interaction (HCI) has been evolving from keyboard-mouse interaction to touch screen. The core area was the graphical user interface (GUI), and recently, the audio user interface (AUI) has partially replaced the GUI. Since it is easy to access and intuitive to the user, it is quickly becoming a common area of the in-vehicle experience (IVE), especially. The benefits of a AUI are freeing the driver's eyes and hands, using fewer screens, lower interaction costs, more emotional and personal, effective for people with low vision. Nevertheless, when and where to apply a GUI or AUI are actually different approaches because some information is easier to process as we see it. In other cases, there is potential that AUI is more suitable. This is a study on a proposal to actively apply a AUI in the near future based on the context of various scenes occurring to improve IVE.

A Design and Implementation of the VoiceXML Multiple-View Editor Using MVC Framework (MVC 프레임 워크를 사용한 VoiceXML 다중 뷰 편집기의 설계 및 구현)

  • 유재우;염세훈
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.5
    • /
    • pp.390-399
    • /
    • 2004
  • In this paper, we design and implement a multiple-view VoiceXML editor to improve editing efficiency of the VoiceXML. The VoiceXML multiple-view Editor uses a MVC framework to support multiple views and paradigm. Our multiple-view editor consists of Model. View and Controller using MVC framework. A model, core data structure. is constructed of abstract syntax tree and abstract grammar. A view. user interface. is formalized in unparsing rules and unparser. A controller. to control model and view. is made of command interpreter and tree handler. The VoiceXML multiple-view editor overcomes a drawbacks of existing XML editors by showing document structure and context concurrently. as well as document flows. Our VoiceXML multiple-view editor. which MVC framework has been applied, provides various editing views concurrently to users. Thereby. it supports efficient and convenient editing environments for voice-web documents to users and it guarantees transparency of editors. as various views have a same consistent model.

A study on a design of developed-ERES/WCS using the ASR and fuzzy set theory as a part of human interface technique (Human interface 기술의 일환으로서 ASR과 fuzzy set theory를 이용한 developed-ERES/WCS 설계에 관한 연구)

  • 이순요;이창민;박세권
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1988.10a
    • /
    • pp.76-81
    • /
    • 1988
  • As a means of human interface, this study designs Developed-ERES/WCS with voice recognition capability and fuzzy set theory. In the advanced teleoperator system, when an error occurs on the automatic mode, the error is recovered after the automatic mode is changed into the manual mode intervened by a human. The purpose of this study is to reduce human work load and to shorten error recovery time during error recovery.

  • PDF

Distant-talking of Speech Interface for Humanoid Robots (휴머노이드 로봇을 위한 원거리 음성 인터페이스 기술 연구)

  • Lee, Hyub-Woo;Yook, Dong-Suk
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.39-40
    • /
    • 2007
  • For efficient interaction between human and robots, speech interface is a core problem especially in noisy and reverberant conditions. This paper analyzes main issues of spoken language interface for humanoid robots, such as sound source localization, voice activity detection, and speaker recognition.

  • PDF

Voice Portal based on SMS Authentication at CTI Module Implementation by Speech Recognition (SMS 인증 기반의 보이스포탈에서의 음성인식을 위한 CTI 모듈 구현)

  • Oh, Se-Il;Kim, Bong-Hyun;Koh, Jin-Hwan;Park, Won-Tea
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2001.04b
    • /
    • pp.1177-1180
    • /
    • 2001
  • 전화를 통해 인터넷 정보를 들을 수 있는 보이스 포탈(Voice Portal) 서비스가 인기를 얻고 있다. Voice Portal 서비스란 알고자 하는 정보를 Speech Recognition System에 음성으로 명령하면 전화를 통해 음성으로 원하는 정보를 듣는 서비스이다. Authentication의 절차를 수행하는 SMS (Short Message Service) 서버 Module, PSTN과 Database 서버사이의 Interface를 제공하는 CTI (Computer Telephony Integration) Module, CTI 서버와 WWW (World Wide Web) 사이의 Voice XML Module, 정보를 검색하기 위한 Searching Module들이 필요하다. 본 논문은 Speech Recognition technology를 기반으로 한 CTI Module 설계를 구현하였다. 또한 인정 방식으로 Random한 일회용 password를 기반으로 한 SMS Authentication을 택하므로 더욱 더 안정된 서비스 제공을 목적으로 하였다.

  • PDF

Greeting, Function, and Music: How Users Chat with Voice Assistants

  • Wang, Ji;Zhang, Han;Zhang, Cen;Xiao, Junjun;Lee, Seung Hee
    • Science of Emotion and Sensibility
    • /
    • v.23 no.2
    • /
    • pp.61-74
    • /
    • 2020
  • Voice user interface has become a commercially viable and extensive interaction mechanism with the development of voice assistants. Despite the popularity of voice assistants, the academic community does not utterly understand about what, when, and how users chat with them. Chatting with a voice assistant is crucial as it defines how a user will seek the help of the assistant in the future. This study aims to cover the essence and construct of conversational AI, to develop a classification method to deal with user utterances, and, most importantly, to understand about what, when, and how Chinese users chat with voice assistants. We collected user utterances from the real conventional database of a commercial voice assistant, NetEase Sing in China. We also identified different utterance categories on the basis of previous studies and real usage conditions and annotated the utterances with 17 labels. Furthermore, we found that the three top reasons for the usage of voice assistants in China are the following: (1) greeting, (2) function, and (3) music. Chinese users like to interact with voice assistants at night from 7 PM to 10 PM, and they are polite toward the assistants. The whole percentage of negative feedback utterances is less than 6%, which is considerably low. These findings appear to be useful in voice interaction designs for intelligent hardware.

Design and Implementation of Vocal Interface-Inventory Management System (음성 인터페이스 기반의 재고 관리 시스템의 설계 및 구현)

  • Park Se Jin;Kwon Chul Hong
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.119-122
    • /
    • 2002
  • This paper focuses on building up a database of commercial stocks using XML syntax and looks into a way of building up a system with the combination of XML and XSLT that provides connectivity to client-server databases through vocal means. The use of XSLT has several advantages. Most importantly, it can transform a type of data into different formats. A vocal interface minimizes some space and time limits imposed on users outside premises when they need an instant connection to their database. In this fashion, the users can check information on stock lists without being pressurized by certain limits. PC, PDAs and cellular phones are some examples of mobile connection. The use of VoiceXML creates vocal applications. In VoiceXML servies, users can gain immediate access to data upon the input of their voices and the DTMF signals of the telephone.

  • PDF

An Experimental Study on Barging-In Effects for Speech Recognition Using Three Telephone Interface Boards

  • Park, Sung-Joon;Kim, Ho-Kyoung;Koo, Myoung-Wan
    • Speech Sciences
    • /
    • v.8 no.1
    • /
    • pp.159-165
    • /
    • 2001
  • In this paper, we make an experiment on speech recognition systems with barging-in and non-barging-in utterances. Barging-in capability, with which we can say voice commands while voice announcement is coming out, is one of the important elements for practical speech recognition systems. Barging-in capability can be realized by echo cancellation techniques based on the LMS (least-mean-square) algorithm. We use three kinds of telephone interface boards with barging-in capability, which are respectively made by Dialogic Company, Natural MicroSystems Company and Korea Telecom. Speech database was made using these three kinds of boards. We make a comparative recognition experiment with this speech database.

  • PDF

A Novel Computer Human Interface to Remotely Pick up Moving Human's Voice Clearly by Integrating ]Real-time Face Tracking and Microphones Array

  • Hiroshi Mizoguchi;Takaomi Shigehara;Yoshiyasu Goto;Hidai, Ken-ichi;Taketoshi Mishima
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1998.10a
    • /
    • pp.75-80
    • /
    • 1998
  • This paper proposes a novel computer human interface, named Virtual Wireless Microphone (VWM), which utilizes computer vision and signal processing. It integrates real-time face tracking and sound signal processing. VWM is intended to be used as a speech signal input method for human computer interaction, especially for autonomous intelligent agent that interacts with humans like as digital secretary. Utilizing VWM, the agent can clearly listen human master's voice remotely as if a wireless microphone was put just in front of the master.

  • PDF

N-gram Adaptation using Information Retrieval and Dynamic Interpolation Coefficient (정보검색 기법과 동적 보간 계수를 이용한 N-gram 적응)

  • Choi, Joon-Ki;Oh, Yung-Hwan
    • Proceedings of the KSPS conference
    • /
    • 2005.11a
    • /
    • pp.107-112
    • /
    • 2005
  • 연속음성인식을 위한 언어모델 적응기법은 특정 영역의 정보만을 담고 있는 적응 코퍼스를 이용해 작성한 적응 언어모델과 기본 언어모델을 병합하는 방법이다. 본 논문에서는 추가되는 자료 없이 인식 시스템이보유하고 있는 코퍼스만을 사용하여 적응 코퍼스를 구축하기 위해 언어모델에 기반한 정보검색 기법을 사영하였다. 검색된 적응 코퍼스로 작성된 적응 언어모델과 기본 언어모델과의 병합을 위해 본 논문에서는 입력음성을 분할하여 각 구간에 최적인 동적 보간 계수를 구하는 방법을 제안하였다. 제안된 적응 코퍼스를 구하는 방법과 동적 보간 계수는 기본 언어모델 대비절대 3.6%의 한국어 방송뉴스 인식 성능 향상을 보여주었으며 기존의 검증자료를 이용한 정적 보간 계수에 비해 상대 13.6%의 한국어 방송뉴스 인식 성능 향상을 보여 주었다.

  • PDF