Human-Computer Interaction Based Only on Auditory and Visual Information

  • Sha, Hui (Tellabs, Lisle, Illinois U.S.A) ;
  • Agah, Arvin (Department of Electrical Engineering and Computer Science The University of Kansas, Lawrence, Kansas 66045 U.S.A)
  • Published : 2000.12.01

Abstract

One of the research objectives in the area of multimedia human-computer interaction is the application of artificial intelligence and robotics technologies to the development of computer interfaces. This involves utilizing many forms of media, integrating speed input, natural language, graphics, hand pointing gestures, and other methods for interactive dialogues. Although current human-computer communication methods include computer keyboards, mice, and other traditional devices, the two basic ways by which people communicate with each other are voice and gesture. This paper reports on research focusing on the development of an intelligent multimedia interface system modeled based on the manner in which people communicate. This work explores the interaction between humans and computers based only on the processing of speech(Work uttered by the person) and processing of images(hand pointing gestures). The purpose of the interface is to control a pan/tilt camera to point it to a location specified by the user through utterance of words and pointing of the hand, The systems utilizes another stationary camera to capture images of the users hand and a microphone to capture the users words. Upon processing of the images and sounds, the systems responds by pointing the camera. Initially, the interface uses hand pointing to locate the general position which user is referring to and then the interface uses voice command provided by user to fine-the location, and change the zooming of the camera, if requested. The image of the location is captured by the pan/tilt camera and sent to a color TV monitor to be displayed. This type of system has applications in tele-conferencing and other rmote operations, where the system must respond to users command, in a manner similar to how the user would communicate with another person. The advantage of this approach is the elimination of the traditional input devices that the user must utilize in order to control a pan/tillt camera, replacing them with more "natural" means of interaction. A number of experiments were performed to evaluate the interface system with respect to its accuracy, efficiency, reliability, and limitation.

Keywords

References

  1. ACM SIGCHI curricula for human-computer interaction ACM SIGCHI
  2. AT&T Watson Speech Recognition
  3. Communications of ACM v.36 no.7 Charade: remote control of objects using free-hand gestures Baudel, T.;Baudouin-Lafon, M.
  4. Image Formation and Processing Group Battle view: A multimedia HCI research application Berry, G. A.;Paviovic, V.;Huang, T. S.
  5. Computer Graphics v.14 no.3 Put-That-There: Voice and gesture at the graphics interface Bolt, R. A.
  6. Microcomputer Research Labs Gesture and speech for video content navigation Bradski, C.;Yeo, B.-L.;Yeung, M.M.
  7. Speech, language and audition Buxton, B.
  8. Dragon Systems
  9. Speech recognition software requires lots of training to work correctly Feldman, S.
  10. IEEE Transactions on Neural Networks v.4 Glove-Talk: a neural network interface between a data-glove and a speech synthesizer Fels, S.S.;Hinton, G.E.
  11. FreeSpeech98
  12. IBM ViaVoice98 IBM
  13. Input devices and techniques Jacob, R.J.K.
  14. Hand gesture recognition using morphological PCA and an improved CombNET-(( Lamar, M.V.;Bhuiyan, M.S.;Iwata, A.
  15. A CHI 96 WORKSHOP, SIGCHI v.28 no.4 Design the user interface for speech recognition applications Mane, A.;Boyce, S.;Karis, D.;Yankelovich, N.
  16. Peter Norton's Guide to Windows 95/NT 4 Programming with MFC McGregor, R.W.
  17. Microsoft
  18. Proceedings of the 1998 International Conference on Image Processing v.3 Spotting dynamic hand gestures in video image sequences using hidden Markov models Morguel, P.;Lang, M.
  19. Newton Newton Research Labs
  20. PlainTalk speech recognition Noon, B.
  21. IEEE Transactions on Pattern Analysis and Machine Intelligence v.19 no.7 Visual interpretation of hand gestures for human-computer interaction: a review Pavlovic, V. I.;Sharma, R. S.;Huang, T. S.
  22. Department of Computer Science and Engineering, Pennsylvania State University Toward natural gesture/speech HCI: a case study of weather narration Poddar, I.;Sethi, Y.;Ozyildiz, E.;Sharma, R.
  23. PULNiX CCD camera PULNiX
  24. Fundamentals of Speech Recognition Rabiner, L.R.;Juang, B.
  25. The design of user interfaces for digital speech recognition software Rozmovits, B.A.
  26. Window 98 Programming from the Ground Up Schildt, H.
  27. Proceedings of the 1998 International Conference on Image Processing v.3 Human-computer interaction using gesture recognition and 3-D hand tracking Segen, J.;Kumar, S.
  28. M.S. Thesis, Department of Electrical Engineering and Computer Science, The University of Kansas Human-computer interaction using only images and sounds Sha, H.
  29. Proceedings of the IEEE v.86 no.5 Toward multimodal human-computer interface Sharma, R.;Pavlovic, V. I.;Huang, T.S.
  30. Proceedings of the 1997 IEEE International Conference on Systems, Man and Cybernetics Locating hands in complex images using color analysis Soh, J.;Yoon, H.-S.;Wang, M.;Min, B.-W.
  31. Sony
  32. Intelligent User Interfaces Sullivan, J.W.;Tyler, S.W.
  33. Verbex VoiceSystems
  34. Building an application framework for speech and pen input integration in multimodal learning interfaces Vo, M.T.;Wood, C.
  35. Visual C++ 5 Bible Yao, P.;Leinecher, R. C.
  36. Foundations of Visual C++ Programming for Windows 95 Yao, P.;Yao, J.
  37. Foundations of Visual C++ Programming for Windows 95 Yao, P.;Yao, J.