DOI QR코드

DOI QR Code

Technology Trends and Analysis of Deep Learning Based Object Classification and Detection

딥러닝 기반 객체 분류 및 검출 기술 분석 및 동향

  • 이승재 (인포콘텐츠기술연구그룹) ;
  • 이근동 (인포콘텐츠기술연구그룹) ;
  • 이수웅 (인포콘텐츠기술연구그룹) ;
  • 고종국 (인포콘텐츠기술연구그룹) ;
  • 유원영 (인포콘텐츠기술연구그룹)
  • Published : 2018.08.01

Abstract

Object classification and detection are fundamental technologies in computer vision and its applications. Recently, a deep-learning based approach has shown significant improvement in terms of object classification and detection. This report reviews the progress of deep-learning based object classification and detection in views of the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), and analyzes recent trends of object classification and detection technology and its applications.

Keywords

Acknowledgement

Grant : 온-오프라인에서의 콘텐츠 비주얼 브라우징 기술 개발

Supported by : 정보통신기술진흥센터

References

  1. M. Everingham et al., "The Pascal Visual Object Classes (VOC) Challenge - A Retrospective," IJCV, vol. 111, no. 1, 2014, pp. 98-136.
  2. O. Russakovsky et al., "ImageNet Large Scale Visual Recognition Challenge," IJCV, vol. 115, no. 3, 2015, pp. 211-252. https://doi.org/10.1007/s11263-015-0816-y
  3. Standford University and Princeton University, "ImageNet," Accessed 2018. http://image-net.org
  4. Standford University, "ILSVRC" Accessed 2018. http://image-net.org/challenges/LSVRC/
  5. Princeton University, "WordNet," Accessed 2018. https://wordnet.princeton.edu/
  6. M. Everingham et al., "PASCAL Visual Object Classes Challenge (VOC) 2005-2012," Accessed 2018. http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html
  7. Kaggle, "ImageNet Challenge (Kaggle)," Accessed 2018. https://www.kaggle.com/image-net
  8. K. Yu, T. Zhang, and Y. Gong, "Nonlinear Learning Using Local Coordinate Coding," in NIPS, Vancouver, Canada, Dec. 2009, pp. 2223-2231.
  9. F. Perronnin and C. Dance, "Fisher Kernels on Visual Vocabularies for Image Categorization," CVPR, Minneapolis, MN, USA, June 17-22, 2007, pp. 1-8.
  10. M.A. Hearst et al., "Support Vector Machines," IEEE Intell. Syst. Their Applicat., vol. 13, no. 4, 1998, pp. 18-28.
  11. F. Perronnin et al., "Large-Scale Image Retrieval with Compressed Fisher Vectors," CVPR, Sna Francisco, CA, USA, June 13-18, 2010, pp. 3384-3391.
  12. J. Sanchez and F. Perronnin, "High-Dimensional Signature Compression for large-Scale Image Classification," CVPR, Colorado Springs, CO, USA, June 20-25, 2011, pp. 1665-1672.
  13. K.E. Van de Sande et al., "Segmentation as Selective Search for Object Recognition," ICCV, Barcelona, Spain, Nov. 6-13, 2011, pp. 1879-1886.
  14. J. Sivic and A. Zisserman, "Video Google: A Text Retrieval Approach to Object Matching in Videos," ICCV, Nice, France, Oct. 14-17, 2003, pp. 1470-1478.
  15. D.G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," IJCV, vol. 60, no. 2, 2004, pp. 91-110. https://doi.org/10.1023/B:VISI.0000029664.99615.94
  16. A. Krizhevsky, I. Sutskever, and G.E. Hinton, "Imagenet Classification with deep Convolutional Neural Networks," NIPS, Lake Tahoe, CA, USA, Dec. 3-8, 2012, pp. 1097-1105.
  17. Clarifai, "Clarifai Website," Accessed 2018. https://clarifai.com/
  18. M.D. Zeiler, and R. Fergus, "Visualizing and Understanding Convolutional Networks," ECCV, Zurich, Swiss, Sept. 6-12, 2014, pp. 818-833.
  19. P. Sermanet et al., "Overfeat: Integrated Recognition, Localization and Detection Using Convolutional Networks," arXiv preprint arXiv: 1312.6229, 2013.
  20. M. Lin, Q. Chen, and S. Yan, "Network in Network," arXiv preprint arXiv: 1312.4400, 2013.
  21. K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," arXiv preprint arXiv:1409.1556, 2014.
  22. C. Szegedy et al., "Going Deeper with Convolutions", CVPR, Boston, MA, USA, June 8-10, 2015.
  23. C. Szegedy et al., "Rethinking the Inception Architecture for Computer Vision," CVPR, Las Vegas, NV, USA, June 26-July 1, 2016, pp. 2818-2826
  24. C. Szegedy et al., "Inception-v4, Inception-Resnet and the Impact of Residual Connections on Learning," AAAI, San Francisco, CA, USA, Feb. 4-9, 2017, pp. 4278-4284.
  25. K. He et al., "Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition," IEEE Trans. Pattern Anal Mach. Intell., vol. 37, no. 9, 2015, pp. 1904-1916. https://doi.org/10.1109/TPAMI.2015.2389824
  26. R. Girshick et al., "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation," CVPR, Columbus, OH, USA, June 24-27, 2014, pp. 580-587.
  27. R. Girshick, "Fast R-CNN," CVPR, Boston, MA, USA, June 8-10, 2015. pp. 1440-1448.
  28. S. Ren et al., "Faster R-CNN: Towards real-Time Object Detection with Region Proposal Networks," NIPS, Montreal, Canada, Dec. 7-12, 2015, pp. 91-99.
  29. ICCV 2015, "ImageNet and MS COCO Visual Recognition Challenges Joint Workshop," Accessed 2018. http://imagenet.org/challenges/ilsvrc+mscoco2015
  30. COCO, "COCO (Common Objects in Context," Accessed 2018. http://cocodataset.org/
  31. K. He et al., "Identity Mappings in Deep Residual Networks," ECCV, Amsterdam, Netherlands, Oct. 8-16, 2016, pp. 630-645.
  32. The Third Research Institute of the Ministry of Public Security, P.R. China, "TRIMPS," Accessed 2018. http://hr.trimps.ac.cn/
  33. SenseTime, "SenseTime," Accessed 2018. https://www.sensetime.com/
  34. Hikvision, "Hikvision," Accessed 2018. http://en.hikrobotics.com/welcome.htm
  35. Y. Li, et al., "Fully Convolutional Instance-Aware Semantic Segmentation," CVPR, Honolulu, HI, USA, July 21-26, 2017, pp. 4438-4446.
  36. S. Xie et al., "Aggregated Residual Transformations for Deep Neural Networks," CVPR, Honolulu, HI, USA, July 21-26, 2017, pp. 5987-5995.
  37. Cityscapes Team, "Cityscapes Dataset," Accessed 2018. https://www.cityscapes-dataset.com/
  38. Google, "Youtube8M Dataset," Accessed 2018. https://research.google.com/youtube8m/
  39. Google, "Open Images Dataset v4," Accessed 2018. https://storage.googleapis.com/openimages/web/index.html
  40. Qihoo 360, "About Qihoo 360," Accessed 2018. https://www.360totalsecurity.com/en/about/
  41. J. Hu, L. Shen, and G. Sun, "Squeeze-and-Excitation Networks," arXiv preprint arXiv:1709.01507, 2017.
  42. Y. Chen et al., "Dual Path Networks," NIPS, Long beach, CA, USA, Dec. 4-9, 2017, pp. 4470-4478.
  43. G. Huang et al., "Densely Connected Convolutional Networks," CVPR, Honolulu, HI, USA, July 21-26, 2017, pp. 4700-4708.
  44. X. Zhang et al., "PolyNet: A Pursuit of Structural Diversity in Very Deep Networks," CVPR, Honolulu, HI, USA, July 21-26, 2017, pp. 718-726.
  45. M. Denil et al., "Predicting Parameters in Deep Learning," NIPS, Lake Tehoe, CA, USA, Dec. 5-10, 2013.
  46. M. Jaderberg, A. Vedaldi, and A. Zis-serman, "Speeding up Convolutional Neural Networks with Low Rank Expansions," BMVC, Nottingham, UK, 2014.
  47. E. Denton et al., "Exploiting Linear Structure within convolutional Networks for Efficient Evaluation," NIPS, Montreal, Canada, Dec. 8-13, 2014.
  48. V. Lebedev et al., "Speeding-up Convolutional Neural Networks Using Fine-Tuned CP-Decomposition," ICLR, San Diego, CA, USA, May 7-9, 2015.
  49. Y. Gong, L. Liu, and L. Bourdev, "Compressing Deep Convolutional Networks Using Vector Quantization," arXiv preprint arXiv:1412.6115v1, Dec. 2014.
  50. S. Hang et al., "Learning Both Weight and CONNEctions for Efficient Neural Networks," NIPS, Montreal, Canada, Dec. 7-12, 2015.
  51. A.G. Howard et al., "Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications," arXiv preprint arXiv:1704.04861, 2017.
  52. M. Sandler et al., "Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation," arXiv preprint arXiv:1801.04381, 2018.
  53. X. Zhang et al., "Shufflenet: An Extremely Efficient Convolutional Neural Network for Mobile Devices," arXiv preprint arXiv:1707.01083, 2017.
  54. F.N. Iandola et al., "SqueezeNet: AlexNet-Level Accuracy with 50x Fewer Parameters and < 0.5 MB Model Size," arXiv preprint arXiv:1602.07360, 2016.
  55. IEEE Rebooting Computing, "Low-Power Image Recognition Challenge," Accessed 2018. https://rebootingcomputing.ieee.org/lpirc
  56. Google, "Tensorflow Lite," Accessed 2018. https://www.tensorflow.org/mobile/tflite/
  57. Facebook, "Caffe2," Accessed 2018. https://caffe2.ai/
  58. Apple, "CoreML," Accessed 2018. https://developer.apple.com/documentation/coreml
  59. Google, "Google Cloud AI," Accessed 2018. https://cloud.google.com/products/machine-learning/
  60. Amazon, "AWS 기반 기계 학습," Accessed 2018. https://aws.amazon.com/ko/machine-learning/
  61. Microsoft, "Microsoft AI," Accessed 2018. https://www.microsoft.com/en-us/ai/
  62. Nokia, "Nokia Smart City," Accessed 2018. https://networks.nokia.com/smart-city
  63. Amazon, "Amazon Go," Accessed 2018. https://www.amazon.com/b?ie=UTF8&node=16008589011
  64. C. Lu et al., "Visual Relationship Detection with Language Priors," ECCV, Amsterdam, Netherlands, Oct. 8-16, 2016, pp. 852-869.
  65. A. Santoro et al., "A Simple Neural Network Module for Relational Reasoning," arXiv preprint arXiv:1706.01427, 2017.