DOI QR코드

DOI QR Code

Investigating the Feature Collection for Semantic Segmentation via Single Skip Connection

깊은 신경망에서 단일 중간층 연결을 통한 물체 분할 능력의 심층적 분석

  • 임종화 (아주대학교 전자공학과) ;
  • 손경아 (아주대학교 소프트웨어학과)
  • Received : 2017.08.08
  • Accepted : 2017.09.26
  • Published : 2017.12.15

Abstract

Since the study of deep convolutional neural network became prevalent, one of the important discoveries is that a feature map from a convolutional network can be extracted before going into the fully connected layer and can be used as a saliency map for object detection. Furthermore, the model can use features from each different layer for accurate object detection: the features from different layers can have different properties. As the model goes deeper, it has many latent skip connections and feature maps to elaborate object detection. Although there are many intermediate layers that we can use for semantic segmentation through skip connection, still the characteristics of each skip connection and the best skip connection for this task are uncertain. Therefore, in this study, we exhaustively research skip connections of state-of-the-art deep convolutional networks and investigate the characteristics of the features from each intermediate layer. In addition, this study would suggest how to use a recent deep neural network model for semantic segmentation and it would therefore become a cornerstone for later studies with the state-of-the-art network models.

최근 심층 컨볼루션 신경망을 활용한 이미지 분할과 물체 위치감지 연구가 활발히 진행되고 있다. 특히 네트워크의 최상위 단에서 추출한 특징 지도뿐만 아니라, 중간 은닉 층들에서 추출한 특징 지도를 활용하면 더욱 정확한 물체 감지를 수행할 수 있고 이에 대한 연구 또한 활발하게 진행되고 있다. 이에 밝혀진 경험적 특성 중 하나로 중간 은닉 층마다 추출되는 특징 지도는 각기 다른 특성을 가지고 있다는 것이다. 그러나 모델이 깊어질수록 가능한 중간 연결과 이용할 수 있는 중간 층 특징 지도가 많아지는 반면, 어떠한 중간 층 연결이 물체 분할에 더욱 효과적일지에 대한 연구는 미비한 상황이다. 또한 중간층 연결 방식 및 중간층의 특징 지도에 대한 정확한 분석 또한 부족한 상황이다. 따라서 본 연구에서 최신 깊은 신경망에서 중간층 연결의 특성을 파악하고, 어떠한 중간 층 연결이 물체 감지에 최적의 성능을 보이는지, 그리고 중간 층 연결마다 특징은 어떠한지 밝혀내고자 한다. 그리고 이전 방식에 비해 더 깊은 신경망을 활용하는 물체 분할의 방법과 중간 연결의 방향을 제시한다.

Keywords

Acknowledgement

Supported by : 한국연구재단

References

  1. J. Long, E. Shelhamer, and T. Darrell, "Fully convolutional networks for semantic segmentation," Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 07-12-June, pp. 3431-3440, 2015.
  2. R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580-587, 2014.
  3. R. Girshick, "Fast r-cnn," Proc. of the IEEE International Conference on Computer Vision, pp. 1440-1448, 2015.
  4. S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards real-time object detection with region proposal networks," Advances in Neural Information Processing Systems, pp. 91-99, 2015.
  5. B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik, "Hypercolumns for object segmentation and fine-grained localization," Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 447-456, 2015.
  6. T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature pyramid networks for object detection," arXiv preprint, 2016.
  7. A. Simonyan, Karen and Zisserman, "Very deep convolutional networks for large-scale image recognition," International Conference on Learning Representations, 2015.
  8. C. Szegedy et al., "Going deeper with convolutions," Proc. of the IEEE conference on computer vision and pattern recognition, pp. 1-9, 2015.
  9. K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016.
  10. O. Russakovsky et al., “Imagenet large scale visual recognition challenge,” Int. J. Comput. Vis., Vol. 115, No. 3, pp. 211-252, 2015. https://doi.org/10.1007/s11263-015-0816-y
  11. V. Badrinarayanan, A. Kendall, and R. Cipolla, "Segnet: A deep convolutional encoder-decoder architecture for image segmentation," PAMI, 2017.
  12. L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, "Semantic image segmentation with deep convolutional nets and fully connected crfs," International Conference on Learning Representations, 2015.
  13. P. O. Pinheiro, T.-Y. Lin, R. Collobert, and P. Dollar, "Learning to refine object segments," European Conference on Computer Vision, pp. 75-91, 2016.
  14. P. O. Pinheiro, R. Collobert, and P. Dollar, "Learning to segment object candidates," Advances in Neural Information Processing Systems, pp. 1990-1998, 2015.
  15. A. Newell, K. Yang, and J. Deng, "Stacked hourglass networks for human pose estimation," European Conference on Computer Vision, pp. 483-499, 2016.
  16. S. Honari, J. Yosinski, P. Vincent, and C. Pal, "Recombinator networks: Learning coarse-to-fine feature aggregation," Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5743-5752, 2016.
  17. L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs," arXiv preprint, 2016.
  18. K. He, G. Gkioxari, P. Dollar, and R. Girshick, "Mask r-cnn," Proc. of the IEEE International Conference on Computer Vision, 2017.
  19. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the inception architecture for computer vision," Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818-2826, 2016.
  20. M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” Int. J. Comput. Vis., Vol. 88, No. 2, pp. 303-338, 2010. https://doi.org/10.1007/s11263-009-0275-4
  21. T.-Y. Lin et al., "Microsoft coco: Common objects in context," European conference on computer vision, pp. 740-755, 2014.
  22. H. Noh, S. Hong, and B. Han, "Learning deconvolution network for semantic segmentation," Proc. of the IEEE International Conference on Computer Vision, pp. 1520-1528, 2015.