Object Detection and Optical Character Recognition for Mobile-based Air Writing

모바일 기반 Air Writing을 위한 객체 탐지 및 광학 문자 인식 방법

  • 김태일 (서경대학교 컴퓨터공학과) ;
  • 고영진 (서경대학교 컴퓨터공학과) ;
  • 김태영 (서경대학교 컴퓨터공학과)
  • Received : 2019.08.27
  • Accepted : 2019.10.21
  • Published : 2019.10.31

Abstract

To provide a hand gesture interface through deep learning in mobile environments, research on the light-weighting of networks is essential for high recognition rates while at the same time preventing degradation of execution speed. This paper proposes a method of real-time recognition of written characters in the air using a finger on mobile devices through the light-weighting of deep-learning model. Based on the SSD (Single Shot Detector), which is an object detection model that utilizes MobileNet as a feature extractor, it detects index finger and generates a result text image by following fingertip path. Then, the image is sent to the server to recognize the characters based on the learned OCR model. To verify our method, 12 users tested 1,000 words using a GALAXY S10+ and recognized their finger with an average accuracy of 88.6%, indicating that recognized text was printed within 124 ms and could be used in real-time. Results of this research can be used to send simple text messages, memos, and air signatures using a finger in mobile environments.

모바일 환경에서 딥러닝을 통한 손 제스처 인터페이스를 제공하려면 높은 인식률을 제공하면서 실행속도의 저하를 막기 위한 네트워크 경량화의 연구가 필수적이다. 본 논문은 딥러닝 모델의 경량화를 통해 모바일 기기에서 손가락을 이용하여 공중에 쓴 문자를 실시간으로 인식하는 방법을 제안한다. MobileNet을 특징 추출기로 활용하는 객체 탐지 모델인 SSD (Single Shot Detector)를 기반으로 집게손가락을 탐지하고 손끝 경로를 이어 결과문자 영상을 생성한다. 이 영상은 서버로 전송되어 정규화 과정을 수행한 다음 학습된 OCR 모델을 이용하여 문자를 인식한다. 본 방법을 검증하기 위하여 12명의 사용자가 GALAXY S10+ 기기를 사용하여 1,000개의 단어를 실험한 결과 평균 88.6%의 정확도로 손가락을 인식하고 124 ms 이내로 인식된 텍스트가 출력되어 실시간으로 활용 가능함을 알 수 있었다. 본 연구결과는 모바일 환경에서 손가락을 이용한 간단한 문자 전송, 메모 및 공중 서명 등에 활용될 수 있다.

Keywords

References

  1. 전찬규, 김민규, 이지원, 김진모, "손 인터페이스 기반 3인칭 가상현실 콘텐츠 제작 공정에 관한 연구," 컴퓨터그래픽스학회 논문지, 23.3, pp. 9-17, 2017.
  2. Kim Jinmo, "Gadget Arms: Interactive Data Visualization using Hand Gesture in Extended Reality," J Korea Comput Graph Soc, 25.2, pp. 1-9, 2019.
  3. 김설호, 김경섭, 김계, "ToF 깊이상과 벡터내을 이용한 손 모양 인식," 한국차세대컴퓨팅학회 논문지 12.4, pp. 89-101, 2016.8.
  4. 고택균, 윤민호, 김태영, "HMM 과 MCSVM 기반 손제스처 인터페이스 연구," 한국 차세대 컴퓨팅학회 논문지, 14.1, pp. 57-64, 2018.
  5. 오동한, 이병희, 김태영, "외부 환경에 강인한 딥러닝 기반 손 제스처 인식," 한국 차세대 컴퓨팅학회 논문지, 14.5, pp. 31-39, 2018.
  6. 이용주, 문용혁, 박준용, 민옥기, "경량 딥러닝 기술 동향," 전자통신동향분석, 34.2, pp. 40-50, 2019. https://doi.org/10.22648/ETRI.2019.J.340205
  7. Howard, Andrew G., et al, "Mobilenets: Efficient convolutional neural networks for mobile vision applications," arXiv preprint arXiv:1704.04861, 2017.
  8. Zhang, Xiangyu, et al, "Shufflenet: An extremely efficient convolutional neural network for mobile devices," In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848-6856, 2018.
  9. Han Song, Mao Huizi, Dally, William J, "Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding," arXiv preprint arXiv:1510.00149, 2015.
  10. Jacob, Benoit, et al, "Quantization and training of neural networks for efficient integer-arithmeticonly inference," In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2704-2713, 2018.
  11. Liu, Wei, et al, "SSD: Single shot multibox detector," In: European conference on computer vision. Springer, pp. 21-37, 2016.
  12. Gers, Felix A, Schmidhuber, Jurgen Cummins, Fred. "Learning to forget: Continual prediction with LSTM," 1999.
  13. Ren, Shaoqing, et al, "Faster r-cnn: Towards real-time object detection with region proposal networks," In: Advances in neural information processing systems, pp. 91-99, 2015.
  14. 박현철, 이상웅, "CNN 에 기반한 실제 환경에서의 색상 및 문자 인식," 한국 차세대컴퓨팅학회 논문지, 12.6, pp. 104-115, 2016.
  15. Grabes, Alex, et al, "Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks," In: Proceedings of the 23rd international conference on Machine learning. ACM, pp. 369-376, 2006.
  16. Trier, Oivind Due, Jain, Anil K, "TAXT, Torfinn. Feature extraction methods for character recognition-a survey," Pattern recognition, 29.4, pp. 641-662, 1996. https://doi.org/10.1016/0031-3203(95)00118-2
  17. Sutskever, Ilya, Vinyals, Oriol, LE, Quoc V. "Sequence to sequence learning with neural networks," Advances in neural information processing systems, pp. 3104-3112, 2014.
  18. Cho, Kyunghyun, et al, "Learning phrase representations using RNN encoder-decoder for statistical machine translation," arXiv preprint arXiv:1406.1078, 2014.
  19. Bengio, Yoshua, Patrice Simard, Paolo Frasconi, "Learning long-term dependencies with gradient descent is difficult," IEEE transactions on neural networks, 5.2, pp. 157-166, 1994. https://doi.org/10.1109/72.279181
  20. Colah's Blog, "Understanding LSTM Networks," http://colah.github.io/posts/2015-08-Understanding-LSTMs/ 2015.
  21. Graves, Alex, and Jurgen Schmidhuber, "Framewise phoneme classification with bidirectional LSTM and other neural network architectures," Neural networks, 18.5-6, pp.602-610, 2005. https://doi.org/10.1016/j.neunet.2005.06.042
  22. Graves, Alex, et al, "A novel connectionist system for unconstrained handwriting recognition," IEEE transactions on pattern analysis and machine intelligence, 31.5, pp.855-868, 2008. https://doi.org/10.1109/TPAMI.2008.137
  23. Lin, Tsung-Yi, et al, "Microsoft coco: Common objects in context," In: European conference on computer vision, Springer, pp. 740-755, 2014.
  24. Pan, Sinno Jialin, Yang, Qiang, "A survey on transfer learning," IEEE Transactions on knowledge and data engineering, 22.10, pp. 1345-1359, 2009. https://doi.org/10.1109/TKDE.2009.191
  25. IAM, IAM Handwriting Database, Computer Vision and Artificial Intelligence.