DOI QR코드

DOI QR Code

Accurate Human Localization for Automatic Labelling of Human from Fisheye Images

  • Received : 2017.02.15
  • Accepted : 2017.04.28
  • Published : 2017.05.31

Abstract

Deep learning networks like Convolutional Neural Networks (CNNs) show successful performances in many computer vision applications such as image classification, object detection, and so on. For implementation of deep learning networks in embedded system with limited processing power and memory, deep learning network may need to be simplified. However, simplified deep learning network cannot learn every possible scene. One realistic strategy for embedded deep learning network is to construct a simplified deep learning network model optimized for the scene images of the installation place. Then, automatic training will be necessitated for commercialization. In this paper, as an intermediate step toward automatic training under fisheye camera environments, we study more precise human localization in fisheye images, and propose an accurate human localization method, Automatic Ground-Truth Labelling Method (AGTLM). AGTLM first localizes candidate human object bounding boxes by utilizing GoogLeNet-LSTM approach, and after reassurance process by GoogLeNet-based CNN network, finally refines them more correctly and precisely(tightly) by applying saliency object detection technique. The performance improvement of the proposed human localization method, AGTLM with respect to accuracy and tightness is shown through several experiments.

Keywords

References

  1. J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, et al., "Recent Advances in Convolutional Neural Networks," arXiv:1512.07108, 2017.
  2. N.T. Binh, N.V. Tuan, and S.T. Chung, "Real-time Human Detection under Omni-directional Camera based on CNN with Unified Detection and AGMM for Visual Surveillance," Journal of Korea Multimedia Society, Vol. 19, No. 8, pp. 1345-1360, 2016. https://doi.org/10.9717/kmms.2016.19.8.1345
  3. R. Stewart and M. Andriluka, "End-to-end People Detection in Crowded Scenes," Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2325-2333, 2016.
  4. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, et al, "Going Deeper with Convolutions," Proceeding of Computer Vision and Pattern Recognition (CVPR) , pp. 1-9, 2015.
  5. M. Lin, Q. Chen, and S. Yan, "Network in Network," arXiv:1312.4400, 2013.
  6. O. Russakovsky, "ImageNet Large Scale Visual Recognition Challenge," International Journal of Computer Vision, Vol. 115, No. 3, pp. 211-252, 2015. https://doi.org/10.1007/s11263-015-0816-y
  7. C. Olah, Understanding LSTM Networks, http://colah.github.io/posts/2015-08-Underst anding-LSTMs/ (accessed Feb., 14, 2017).
  8. K.Y. Chang, T.L. Liu, H.T. Chen, and S.H. Lai, "Fusing Generic Objectless and Visual Saliency for Salient Object Detection," Proceeding of International Conference on Computer Vision, pp. 914-921, 2011.
  9. C. Yang, L. Zhang, H. Lu, X. Ruan, and M. Yang, "Saliency Detection via Graph-Based Manifold Ranking," Proceedings of IEEE Conferenceon Computer Vision and Pattern Recognition, pp. 3166-3173, 2013.
  10. VATIC: Video Annotation Tool from Irvine, California, http://web.mit.edu/vondrick/vatic/ (accessed Feb., 14, 2017).
  11. ViPER: The Video Performance Evaluation Resource, http://viper-toolkit.sourceforge.net (accessed Feb., 14, 2017).
  12. LabelMe, http://labelme.csail.mit.edu/Release3.0/ (accessed Feb., 14, 2017).
  13. LabelImg, https://github.com/tzutalin/labelImg (accessed Feb., 14, 2017).
  14. X. Wang, M. Wang, and W. Li, "Scene-Specific Pedestrian Detection for Static Video Surveillance," IEEE Transactionson Pattern Analysis and Machine Intelligence, Vol. 36, No. 2, pp. 361-374, 2014. https://doi.org/10.1109/TPAMI.2013.124
  15. J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, "How Transferable are Features in Deep Neural Networks?," Advances in Neural Information Processing Systems 27, pp. 3320-3328, 2014.
  16. K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman, "Return of the Devil in the Details: Delving Deep into Convolutional Nets," Proceedings of British Machine Vision Conference, pp. 18-19, 2014.
  17. D. Xing, W. Dai, G.R. Xue, and Y. Yu, "Bridged Refinement for Transfer Learning," Proceeding of European Conference on Principles and Practice of Knowledge Discovery in Databases, Lecture Notes in Computer Science, pp. 324-335, 2007.
  18. X. Zeng, W. Ouyang, and M. Wang, "Deep Learning of Scene-Specific Classifier for Pedestrian Detection," Proceeding of Europe an Conference on Computer Vision, pp 472-487, 2014.
  19. A. Mhalla, T. Chateau, and S. Gazzah, "Scene-Specific Pedestrian Detector Using Monte Carlo Framework and Faster R-CNN Deep Model," Proceeding of International Conference on Distributed Smart Camera, pp. 228-229, 2016.
  20. H. Maamatou, T. Chateau, S. Gazzah, Y. Goyat, and N. Essoukri Ben Amara, "Transductive Transfer Learning to Specialize a Generic Classifier Towards a Specific Scene," Proceeding of International Conference on Computer Vision Theory and Applications, pp. 411-422, 2016.
  21. T. Liu, Z. Yuan, J. Sun, J. Wang, N. Zheng, X. Tang, et al., "Learning to Detect a Salient Object," IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 353-367, 2011.
  22. L. Wang, J. Xue, N. Zheng, and G. Hua, "Automatic Salient Object Extraction with Contextual Cue," Proceeding of International Conference on Computer Vision, pp.105-112, 2011.
  23. P. Dollar, C. Wojek, B. Schiele, and P. Perona, "Pedestrian Detection: An Evaluation of th State of the Art," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 34, Issue 4, pp. 743-761, 2011. https://doi.org/10.1109/TPAMI.2011.155
  24. Bomni-DB Hompage, https://www.cmpe.boun.edu.tr/pilab/pilabfiles/databases/bomni/ (accessed Feb., 14, 2017).

Cited by

  1. Deep Learning을 사용한 백색광 주사 간섭계의 높이 측정 방법 vol.21, pp.8, 2017, https://doi.org/10.9717/kmms.2018.21.8.864
  2. 어안렌즈 카메라로 획득한 영상에서 차량 인식을 위한 딥러닝 기반 객체 검출기 vol.22, pp.2, 2017, https://doi.org/10.9717/kmms.2019.22.2.128