DOI QR코드

DOI QR Code

Enhanced Sign Language Transcription System via Hand Tracking and Pose Estimation

  • Kim, Jung-Ho (School of Computing, Korea Advanced Institute of Science and Technology) ;
  • Kim, Najoung (School of Computing, Korea Advanced Institute of Science and Technology) ;
  • Park, Hancheol (School of Computing, Korea Advanced Institute of Science and Technology) ;
  • Park, Jong C. (School of Computing, Korea Advanced Institute of Science and Technology)
  • Received : 2016.09.11
  • Accepted : 2016.09.12
  • Published : 2016.09.30

Abstract

In this study, we propose a new system for constructing parallel corpora for sign languages, which are generally under-resourced in comparison to spoken languages. In order to achieve scalability and accessibility regarding data collection and corpus construction, our system utilizes deep learning-based techniques and predicts depth information to perform pose estimation on hand information obtainable from video recordings by a single RGB camera. These estimated poses are then transcribed into expressions in SignWriting. We evaluate the accuracy of hand tracking and hand pose estimation modules of our system quantitatively, using the American Sign Language Image Dataset and the American Sign Language Lexicon Video Dataset. The evaluation results show that our transcription system has a high potential to be successfully employed in constructing a sizable sign language corpus using various types of video resources.

Keywords

References

  1. World Federation of the Deaf, "Sign language," http://wfdeaf.org/human-rights/crpd/sign-language.
  2. L. Besacier, E. Barnard, A. Karpov, and T. Schultz, "Automatic speech recognition for under-resourced languages: a survey," Speech Communication, vol. 56, pp. 85-100, 2014. https://doi.org/10.1016/j.specom.2013.07.008
  3. H. Matsuo, S. Igi, S. Lu, Y. Nagashima, Y. Takata, and T. Teshima, "The recognition algorithm with non-contact for Japanese sign language using morphological analysis," in Gesture and Sign Language in Human-Computer Interaction, Heidelberg: Springer, 1997, pp. 273-284.
  4. C. Vogler and D. Metaxas, "ASL recognition based on a coupling between HMMs and 3D motion analysis," in Proceedings of 6th International Conference on Computer Vision, Bombay, India, 1998, pp. 363-369.
  5. A. Utsumi and J. Ohya, "Multiple-hand-gesture tracking using multiple cameras," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Fort Collins, CO, 1999, pp. 473-478.
  6. P. Lu and M. Huenerfauth, "Accessible motion-capture glove calibration protocol for recording sign language data from deaf subjects," in Proceedings of the 11th international ACM SIGACCESS Conference on Computers and Accessibility, Pittsburgh, PA, 2009, pp. 83-90.
  7. C. Oz and M. C. Leu, "American sign language word recognition with a sensory glove using artificial neural networks," Engineering Applications of Artificial Intelligence, vol. 24, no. 7, pp. 1204-1213, 2011. https://doi.org/10.1016/j.engappai.2011.06.015
  8. C. H. Morimoto, D. Koons, A. Amir, and M. Flickner, "Pupil detection and tracking using multiple light sources," Image and Vision Computing, vol. 18, no. 4, pp. 331-335, 2000. https://doi.org/10.1016/S0262-8856(99)00053-0
  9. Z. Zafrulla, H. Brashear, T. Starner, H. Hamilton, and P. Presti, "American sign language recognition with the kinect," in Proceedings of the 13th International Conference on Multimodal Interfaces, Alicante, Spain, 2011, pp. 279-286.
  10. S. Lang, M. Block, and R. Rojas, "Sign language recognition using kinect," in Proceedings of the International Conference on Artificial Intelligence and Soft Computing, Zakopane, Poland, 2012, pp. 394-402.
  11. L. E. Potter, J. Araullo, and L. Carter, "The leap motion controller: a view on sign language," in Proceedings of the 25th Australian Computer-Human Interaction Conference: Augmentation, Application, Innovation, Collaboration, Adelaide, Australia, 2013, pp. 175-178.
  12. G. Marin, F. Dominio, and P. Zanuttigh, "Hand gesture recognition with leap motion and kinect devices," in Proceedings of the 2014 IEEE International Conference on Image Processing (ICIP), Paris, 2014, pp. 1565-1569.
  13. J. Tompson, M. Stein, Y. Lecun, and K. Perlin, "Real-time continuous pose recovery of human hands using convolutional networks," ACM Transactions on Graphics, vol. 33, no. 5, article no. 169, 2014.
  14. D. Park and D. Ramanan, "Articulated pose estimation with tiny synthetic videos," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, 2015, pp. 58-66.
  15. X. Sun, Y. Wei, S. Liang, X. Tang, and J. Sun, "Cascaded hand pose regression," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, 2015, pp. 824-832.
  16. X. Zhou, Q. Wan, W. Zhang, X. Xue, and Y. Wei, "Modelbased deep hand pose estimation," in Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI), New York, NY, 2016.
  17. W. C. Stokoe, D. C. Casterline, and C. G. Croneberg, A Dictionary of American Sign Language on Linguistic Principles, Silver Spring, MD: Linstok Press, 1976.
  18. S. Prillwitz, HamNoSys Version 2.0: Hamburg Notation System for Sign Languages: An Introductory Guide, Hamburg: Signum, 1989.
  19. V. Sutton, Lessons in Sign Writing, La Jolla, CA: SignWriting, 1995.
  20. R. E. Kalman, "A new approach to linear filtering and prediction problems," Journal of Basic Engineering, vol. 82, no. 1, pp. 35-45, 1960. https://doi.org/10.1115/1.3662552
  21. A. Doucet, N. De Freitas, and N. Gordon, "An introduction to sequential Monte Carlo methods," in Sequential Monte Carlo Methods in Practice, Heidelberg: Springer, 2001, pp. 3-14.
  22. Q. Yuan, S. Sclaroff, and V. Athitsos, "Automatic 2D hand tracking in video sequences," in Proceedings of 7th IEEE Workshops on Application of Computer Vision, Breckenridge, CO, 2005, pp. 250-256.
  23. M. Eichner, M. Marin-Jimenez, A. Zisserman, and V. Ferrari, "2D articulated human pose estimation and retrieval in (almost) unconstrained still images," International Journal of Computer Vision, vol. 99, no. 2, pp. 190-214, 2012. https://doi.org/10.1007/s11263-012-0524-9
  24. D. Eigen, C. Puhrsch, and R. Fergus, "Depth map prediction from a single image using a multi-scale deep network," Advances in Neural Information Processing Systems, vol. 27, pp. 2366-2374, 2014.
  25. N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, "Indoor segmentation and support inference from RGBD images," in Proceedings of the European Conference on Computer Vision, Springer, Florence, Italy, 2012, pp. 746-760.
  26. A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, "Vision meets robotics: The KITTI dataset," International Journal of Robotics Research, vol. 32, no. 11, pp. 1231-1237, 2013. https://doi.org/10.1177/0278364913491297
  27. D. Tang, H. Jin Chang, A. Tejani, and T.-K. Kim, "Latent regression forest: structured estimation of 3D articulated hand posture," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, 2014, pp. 3786-3793.
  28. S. Gattupalli, A. Ghaderi, and V. Athitsos, "Evaluation of deep learning based pose estimation for sign language," 2016; http://arxiv.org/pdf/1602.09065v3.pdf.
  29. C. Neidle and C. Vogler, "A new web interface to facilitate access to corpora: development of the ASLLRP data access interface (DAI)," in Proceedings of the 5th Workshop on the Representation and Processing of Sign Languages: Interactions between Corpus and Lexicon (LREC), Istanbul, Turkey, 2012, pp. 137-142.
  30. C. Neidle, A. Thangali, and S. Sclaroff, "Challenges in development of the American sign language lexicon video dataset (ASLLVD) corpus," in Proceedings of the 5th Workshop on the Representation and Processing of Sign Languages: Interactions between Corpus and Lexicon (LREC), Istanbul, Turkey, 2012, pp. 143-150.
  31. Handshapes data from the National Center for Sing Language and gesture resources, http://www.bu.edu/asllrp/cslgr/pages/ncslgr-handshapes.html.

Cited by

  1. Isolated sign language recognition using Convolutional Neural Network hand modelling and Hand Energy Image pp.1573-7721, 2019, https://doi.org/10.1007/s11042-019-7263-7