• Title/Summary/Keyword: Voice interface

Search Result 297, Processing Time 0.026 seconds

A Study on Development of VUI(Voice User Interface) using VoiceXML (VoiceXML을 이용한 VUI 개발에 관한 연구)

  • Jang, Min-Seok;Yang, Woon-Mo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2002.04b
    • /
    • pp.1495-1498
    • /
    • 2002
  • 한국현재의 컴퓨팅환경은 Text위주의 Command Line상에서의 입출력에서 GUI(Graphic User Interface)환경으로 전환되었다. 이는 사용자에게 좀더 친근한 방법으로의 컴퓨팅환경을 제공하고 있는 것이다. 하지만 아직까지 그러한 환경에 익숙해지기 위해서는 많은 습득시간이 필요하며 또한 응용프로그램간의 인터페이싱 기능 등을 익히기 위해서는 추가적인 학습을 통해야 원활한 작업을 수행할 수 있다.이를 해결하고자 본 연구는 음성인식/ 합성과, 현재 음성마크업 언어인 VoiceXML 등을 통해서 모색해보고자 한다.

  • PDF

Development of an Integrated Packet Voice/Data Terminal (패킷 음성/데이터 집적 단말기의 개발)

  • 전홍범;은종관;조동호
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.13 no.2
    • /
    • pp.171-181
    • /
    • 1988
  • In this study, a packet voice/data terminal(PVDT) that services both voice and data in the packet-switched network is implemented. The software structure of the PVDT is designed according to the OSI 7 layer architecture. The discrimination of voice and data is made in the link layer. Voice packets have priority over data packets in order to minimize the transmission delay, and are serviced by a simple protocol so that the overhead arising form the retransmission of packets may be minimized. The hardware structure of the PVDT is divided into five modules; a master control module, a speech proessing module, a speech activity detection module, a telephone interface module, and an input/output interface module. In addition to the hardware implementation, the optimal reconstruction delay of voice packets to reduce the influence of delay variance is analyzed.

  • PDF

Research on Emotional Factors and Voice Trend by Country to be considered in Designing AI's Voice - An analysis of interview with experts in Finland and Norway (AI의 음성 디자인에서 고려해야 할 감성적 요소 및 국가별 음성 트랜드에 관한 연구 - 핀란드와 노르웨이의 전문가 인뎁스 인터뷰를 중심으로)

  • Namkung, Kiechan
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.9
    • /
    • pp.91-97
    • /
    • 2020
  • Use of voice-based interfaces that can interact with users is increasing as AI technology develops. To date, however, most of the research on voice-based interfaces has been technical in nature, focused on areas such as improving the accuracy of speech recognition. Thus, the voice of most voice-based interfaces is uniform and does not provide users with differentiated sensibilities. The purpose of this study is to add a emotional factor suitable for the AI interface. To this end, we have derived emotional factors that should be considered in designing voice interface. In addition, we looked at voice trends that differed from country to country. For this study, we conducted interviews with voice industry experts from Finland and Norway, countries that use their own independent languages.

Implementation of speech interface for windows 95 (Windows95 환경에서의 음성 인터페이스 구현)

  • 한영원;배건성
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.34S no.5
    • /
    • pp.86-93
    • /
    • 1997
  • With recent development of speech recognition technology and multimedia computer systems, more potential applications of voice will become a reality. In this paper, we implement speech interface on the windows95 environment for practical use fo multimedia computers with voice. Speech interface is made up of three modules, that is, speech input and detection module, speech recognition module, and application module. The speech input and etection module handles th elow-level audio service of win32 API to input speech data on real time. The recognition module processes the incoming speech data, and then recognizes the spoken command. DTW pattern matching method is used for speech recognition. The application module executes the voice command properly on PC. Each module of the speech interface is designed and examined on windows95 environments. Implemented speech interface and experimental results are explained and discussed.

  • PDF

Design and Implementation of VoiceXML VUI Browser (VoiceXML VUI Browser 설계/구현)

  • 장민석;예상후
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2002.11a
    • /
    • pp.788-791
    • /
    • 2002
  • The present Web surroundings is composed of HTML(Hypertext Mark-up Language) and thereby users obtains web informations mainly in GUI(Graphical User Interface) environment by clicking mouse in order to keep up with hyperlinked informations. However it is very inconvenient to work in this environment comparing with easily accessed one in which human's voice is utilized for obtaining informations. Using VoiceXML, resulted from XML, for supplying the information through telephone on the basis of the contemporary matured technology of voice recognition/synthesis to work out the inconvenience problem, this paper presents the research results about VoiceXML Web Browser designed and implemented for realizing its technology.

  • PDF

The Effects of Interface Modality on Cognitive Load and Task Performance in Media Multitasking Environment (미디어 멀티태스킹 환경에서 인터페이스의 감각양식 차이가 인지부하와 과업수행에 미치는 영향에 관한 연구 다중 자원 이론과 스레드 인지 모델을 기반으로)

  • Lee, Dana;Han, Kwang-Hee
    • Journal of the HCI Society of Korea
    • /
    • v.14 no.2
    • /
    • pp.31-39
    • /
    • 2019
  • This research examined the changes that fast-growing voice-based devices would bring in the media multitasking environment. Based on the theoretical background that information processing efficiency improves when performing multiple tasks requiring different resource structures at the same time, we conducted an experiment where participants searched for information with voice-based or screen-based devices while performing an additional visual task. Results showed that both task performance environment and interface modality had significant main effects on cognitive load. The overall cognitive load level was higher in the voice interface group, but the difference in cognitive load between the two groups decreased in a multitasking environment where the additional visual resources was required. The visual task performance was significantly higher when using the voice interface than the screen interface. Our findings suggest that voice interfaces offered advantages in the cognitive load and task performance by distributing two tasks to the auditory and visual channels. The results of this study imply that voice-based devices have the potential to facilitate efficient information processing in the screen-centric environment where visual resources collide. We provided theoretical evidence of resource distribution using multiple resource theory and tried to identify the advantages of the voice interface more specifically based on the threaded cognition model.

Implementation of a Gateway Protocol between LAN and PABX for Voice Communication (근거리 통신망과 사설교환기의 음성통신을 위한 게이트웨이의 구현)

  • 안용철;신병철
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.7
    • /
    • pp.1346-1363
    • /
    • 1994
  • Packet voice protocols have been realized in many research works. But few studies for the interconnection of LAN and PABX to facilitate the voice communication have been done. In this paper, the gateway to interconnect the Ethernet LAN with the existing PABX telephone network for voice communication has been designed and implemented. The implemented gateway protocol is a modified protocol based on CCITT`s G.764 packetized voice protocol. To accomplish this goal the hardware system has been realized, which is divided into five parts: interface part with the telephone line, voice-processing part, PC interface part, controller part, and finally DTMF part. And the gateway software is divided into three parts: interface to make use of the packet driver which drives the network card, driver to drive the PABX gateway, and the protocol handling part.

  • PDF

Development of a Work Management System Based on Speech and Speaker Recognition

  • Gaybulayev, Abdulaziz;Yunusov, Jahongir;Kim, Tae-Hyong
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.16 no.3
    • /
    • pp.89-97
    • /
    • 2021
  • Voice interface can not only make daily life more convenient through artificial intelligence speakers but also improve the working environment of the factory. This paper presents a voice-assisted work management system that supports both speech and speaker recognition. This system is able to provide machine control and authorized worker authentication by voice at the same time. We applied two speech recognition methods, Google's Speech application programming interface (API) service, and DeepSpeech speech-to-text engine. For worker identification, the SincNet architecture for speaker recognition was adopted. We implemented a prototype of the work management system that provides voice control with 26 commands and identifies 100 workers by voice. Worker identification using our model was almost perfect, and the command recognition accuracy was 97.0% in Google API after post- processing and 92.0% in our DeepSpeech model.

Implementation of Human and Computer Interface for Detecting Human Emotion Using Neural Network (인간의 감정 인식을 위한 신경회로망 기반의 휴먼과 컴퓨터 인터페이스 구현)

  • Cho, Ki-Ho;Choi, Ho-Jin;Jung, Seul
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.13 no.9
    • /
    • pp.825-831
    • /
    • 2007
  • In this paper, an interface between a human and a computer is presented. The human and computer interface(HCI) serves as another area of human and machine interfaces. Methods for the HCI we used are voice recognition and image recognition for detecting human's emotional feelings. The idea is that the computer can recognize the present emotional state of the human operator, and amuses him/her in various ways such as turning on musics, searching webs, and talking. For the image recognition process, the human face is captured, and eye and mouth are selected from the facial image for recognition. To train images of the mouth, we use the Hopfield Net. The results show 88%$\sim$92% recognition of the emotion. For the vocal recognition, neural network shows 80%$\sim$98% recognition of voice.

Interface Modeling for Digital Device Control According to Disability Type in Web

  • Park, Joo Hyun;Lee, Jongwoo;Lim, Soon-Bum
    • Journal of Multimedia Information System
    • /
    • v.7 no.4
    • /
    • pp.249-256
    • /
    • 2020
  • Learning methods using various assistive and smart devices have been developed to enable independent learning of the disabled. Pointer control is the most important consideration for the disabled when controlling a device and the contents of an existing graphical user interface (GUI) environment; however, difficulties can be encountered when using a pointer, depending on the disability type; Although there are individual differences depending on the blind, low vision, and upper limb disability, problems arise in the accuracy of object selection and execution in common. A multimodal interface pilot solution is presented that enables people with various disability types to control web interactions more easily. First, we classify web interaction types using digital devices and derive essential web interactions among them. Second, to solve problems that occur when performing web interactions considering the disability type, the necessary technology according to the characteristics of each disability type is presented. Finally, a pilot solution for the multimodal interface for each disability type is proposed. We identified three disability types and developed solutions for each type. We developed a remote-control operation voice interface for blind people and a voice output interface applying the selective focusing technique for low-vision people. Finally, we developed a gaze-tracking and voice-command interface for GUI operations for people with upper-limb disability.