• Title/Summary/Keyword: Speech Interface

Search Result 249, Processing Time 0.03 seconds

A Study on Voice User Interface for Domestic Appliance (가전제품의 음성 인터페이스 디자인 적용에 대한 연구)

  • Hong, Ji-Young;Jeon, Myoung-Hoon;Han, Kwang-Hee;Chae, Haeng-Suk
    • Science of Emotion and Sensibility
    • /
    • v.10 no.1
    • /
    • pp.55-68
    • /
    • 2007
  • This paper describes a Voice User Interface(VUI) method and a design guideline tool which supports the studies for domestic appliance. This issue covers specification of user requirement and selection of appropriate VUI to represent speech generation. The criteria for paper is interaction design to enhance user engagement. The studies were carried out to measure prototype of domestic appliance such as a refrigerator, a washing machine, a Gimchi refrigerator, an oven range, a dishwasher and an air conditioner. This paper is presented a study of user preferences and suitability. The results of these findings to voice interface design are discussed and it is suggested that VUI guideline and optimal prototyping can provide a useful application tools in the design process.

  • PDF

Development of a Voice User Interface for Web Browser using VoiceXML (VoiceXML을 이용한 VUI 지원 웹브라우저 개발)

  • Yea SangHoo;Jang MinSeok
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.11 no.2
    • /
    • pp.101-111
    • /
    • 2005
  • The present web informations are mainly described in terms of HTML, which users obtain through input devices such as mouse, keyboard, etc. Thus the existing GUI environment have not supported human's most natural information acquisition means, that is, voice. To solve the problem, several vendors are developing voice user interface. However these products are deficient in man -machine interactivity and their accommodation of existing web environment. This paper presents a VUI(Voice User Interface) supporting web browser by utilizing more and more maturing speech recognition technology and VoiceXML, a markup language derived from XML. It provides users with both interfaces, VUI as well as GUI. In addition, XML Island technology is applied to the bowser in a way that VoiceXML fragments are nested in HTML documents to accommodate the existing web environment. Also for better interactivity, dialogue scenarios for menu, bulletin, and search engine are suggested.

Design and Implementation of a Usability Testing Tool for User-oriented Design of Command-and-Control Voice User Interfaces (명령 제어 음성 인터페이스 사용자 중심 설계를 위한 사용성 평가도구의 설계 및 구현)

  • Lee, Myeong-Ji;Hong, Ki-Hyung
    • Phonetics and Speech Sciences
    • /
    • v.3 no.2
    • /
    • pp.79-87
    • /
    • 2011
  • Recently, usability has become very important in voice user interface systems. In this paper, we have designed and implemented a wizard-of-oz (WOZ) usability testing tool for command-and-control voice user interfaces. We have proposed the VUIDML (Voice User Interface Design Markup Language) to design the usability test scenario of command-and-control voice interfaces in the early design stages. For highly satisfactory voice user interfaces, we have to select highly preferred voice commands and prompts. In VUIDML, we can specify possible prompt candidates. The WOZ usability testing tool can also be used to collect user-preferred voice commands and feedback from real users.

  • PDF

Design and Implementation of Multimodal Middleware for Mobile Environments (모바일 환경을 위한 멀티모달 미들웨어의 설계 및 구현)

  • Park, Seong-Soo;Ahn, Se-Yeol;Kim, Won-Woo;Koo, Myoung-Wan;Park, Sung-Chan
    • MALSORI
    • /
    • no.60
    • /
    • pp.125-144
    • /
    • 2006
  • W3C announced a standard software architecture for multimodal context-aware middleware that emphasizes modularity and separates structure, contents, and presentation. We implemented a distributed multimodal interface system followed the W3C architecture, based on SCXML. SCXML uses parallel states to invoke both XHTML and VoiceXML contents as well as to gather composite or sequential multimodal inputs through man-machine interactions. We also hire Delivery Context Interface(DCI) module and an external service bundle enabling middleware to support context-awareness services for real world environments. The provision of personalized user interfaces for mobile devices is expected to be used for different devices with a wide variety of capabilities and interaction modalities. We demonstrated the implemented middleware could maintain multimodal scenarios in a clear, concise and consistent manner by some experiments.

  • PDF

A Basic Performance Evaluation of the Speech Recognition APP of Standard Language and Dialect using Google, Naver, and Daum KAKAO APIs (구글, 네이버, 다음 카카오 API 활용앱의 표준어 및 방언 음성인식 기초 성능평가)

  • Roh, Hee-Kyung;Lee, Kang-Hee
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.7 no.12
    • /
    • pp.819-829
    • /
    • 2017
  • In this paper, we describe the current state of speech recognition technology and identify the basic speech recognition technology and algorithms first, and then explain the code flow of API necessary for speech recognition technology. We use the application programming interface (API) of Google, Naver, and Daum KaKao, which have the most famous search engine among the speech recognition APIs, to create a voice recognition app in the Android studio tool. Then, we perform a speech recognition experiment on people's standard words and dialects according to gender, age, and region, and then organize the recognition rates into a table. Experiments were conducted on the Gyeongsang-do, Chungcheong-do, and Jeolla-do provinces where the degree of tongues was severe. And Comparative experiments were also conducted on standardized dialects. Based on the resultant sentences, the accuracy of the sentence is checked based on spacing of words, final consonant, postposition, and words and the number of each error is represented by a number. As a result, we aim to introduce the advantages of each API according to the speech recognition rate, and to establish a basic framework for the most efficient use.

Text to Speech System from Web Images (웹상의 영상 내의 문자 인식과 음성 전환 시스템)

  • 안희임;정기철
    • Proceedings of the IEEK Conference
    • /
    • 2001.06c
    • /
    • pp.5-8
    • /
    • 2001
  • The computer programs based upon graphic user interface(GUI) became commonplace with the advance of computer technology. Nevertheless, programs for the visually-handicapped have still remained at the level of TTS(text to speech) programs and this prevents many visually-handicapped from enjoying the pleasure and convenience of the information age. This paper is, paying attention to the importance of character recognition in images, about the configuration of the system that converts text in the image selected by a user to the speech by extracting the character part, and carrying out character recognition.

  • PDF

An Implementatin of a Multi-Channel Speech Surveillance System Over Telephone Lines

  • Kim, Sung-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.17 no.4E
    • /
    • pp.17-21
    • /
    • 1998
  • This paper presents an implementation of a multi-channel speech surveillance system over telephone lines using TMS320C31 DSP chips. The incoming speech into each telephone line are first compressed simultaneously in real-time by the popular vector-sum excited linear predictive (VSELP) speech coding algorithm at the rate of 8 Kbps. The compressed steech bit streams are then multiplexed with those of other users. The multiplexed speech bit streams are transferred to the system storage equipments with some other required information so that a system operator can later monitor the stored speech data whenever it is necessary. The host program runs under Microsoft Windows95 for an efficient man-machine interface and a future upgrade-ability. We have confirmed that the overall 64-channel system operates satisfactorily in realtime. We also have checked approximately up to 2,880 total hours of recording capability of the system on a playback module and two removable backup drives.

  • PDF

Lipreading과 음성인식에 의한 향상된 화자 인증 시스템

  • 지승남;이종수
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2000.10a
    • /
    • pp.274-274
    • /
    • 2000
  • In the future, the convenient speech command system will become an widely-using interface in automation systems. But the previous research in speech recognition didn't give satisfactory recognition results for the practical realization in the noise environment. The purpose of this research is the development of a practical system, which reliably recognizes the speech command of the registered users, by complementing an existing research which used the image information with the speech signal. For the lip-reading feature extraction from a image, we used the DWT(Discrete Wavelet Transform), which reduces the size and gives useful characteristics of the original image. And to enhance the robustness to the environmental changes of speakers, we acquired the speech signal by stereo method. We designed an economic stand-alone system, which adopted a Bt829 and an AD1819B with a TMS320C31 DSP based add-on board.

  • PDF

Speech-Oriented Multimodal Usage Pattern Analysis for TV Guide Application Scenarios (TV 가이드 영역에서의 음성기반 멀티모달 사용 유형 분석)

  • Kim Ji-Young;Lee Kyong-Nim;Hong Ki-Hyung
    • MALSORI
    • /
    • no.58
    • /
    • pp.101-117
    • /
    • 2006
  • The development of efficient multimodal interfaces and fusion algorithms requires knowledge of usage patterns that show how people use multiple modalities. We analyzed multimodal usage patterns for TV-guide application scenarios (or tasks). In order to collect usage patterns, we implemented a multimodal usage pattern collection system having two input modalities: speech and touch-gesture. Fifty-four subjects participated in our study. Analysis of the collected usage patterns shows a positive correlation between the task type and multimodal usage patterns. In addition, we analyzed the timing between speech-utterances and their corresponding touch-gestures that shows the touch-gesture occurring time interval relative to the duration of speech utterance. We believe that, for developing efficient multimodal fusion algorithms on an application, the multimodal usage pattern analysis for the given application, similar to our work for TV guide application, have to be done in advance.

  • PDF

Development of an Autonomous Mobile Robot with the Function of Teaching a Moving Path by Speech and Avoiding a Collision (음성에 의한 경로교시 기능과 충돌회피 기능을 갖춘 자율이동로봇의 개발)

  • Park, Min-Gyu;Lee, Min-Cheol;Lee, Suk
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.17 no.8
    • /
    • pp.189-197
    • /
    • 2000
  • This paper addresses that the autonomous mobile robot with the function of teaching a moving path by speech and avoiding a collision is developed. The use of human speech as the teaching method provides more convenient user-interface for a mobile robot. In speech recognition system a speech recognition algorithm using neural is proposed to recognize Korean syllable. For the safe navigation the autonomous mobile robot needs abilities to recognize a surrounding environment and to avoid collision with obstacles. To obtain the distance from the mobile robot to the various obstacles in surrounding environment ultrasonic sensors is used. By the navigation algorithm the robot forecasts the collision possibility with obstacles and modifies a moving path if it detects a dangerous obstacle.

  • PDF