Browse > Article
http://dx.doi.org/10.7848/ksgpc.2018.36.6.469

Evaluation of Building Detection from Aerial Images Using Region-based Convolutional Neural Network for Deep Learning  

Lee, Dae Geon (Dept. of Environment, Energy & Geoinformatics, Sejong University)
Cho, Eun Ji (Dept. of Environment, Energy & Geoinformatics, Sejong University)
Lee, Dong-Cheon (Dept. of Environment, Energy & Geoinformatics, Sejong University)
Publication Information
Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography / v.36, no.6, 2018 , pp. 469-481 More about this Journal
Abstract
DL (Deep Learning) is getting popular in various fields to implement artificial intelligence that resembles human learning and cognition. DL based on complicate structure of the ANN (Artificial Neural Network) requires computing power and computation cost. Variety of DL models with improved performance have been developed with powerful computer specification. The main purpose of this paper is to detect buildings from aerial images and evaluate performance of Mask R-CNN (Region-based Convolutional Neural Network) developed by FAIR (Facebook AI Research) team recently. Mask R-CNN is a R-CNN that is evaluated to be one of the best ANN models in terms of performance for semantic segmentation with pixel-level accuracy. The performance of the DL models is determined by training ability as well as architecture of the ANN. In this paper, we characteristics of the Mask R-CNN with various types of the images and evaluate possibility of the generalization which is the ultimate goal of the DL. As for future study, it is expected that reliability and generalization of DL will be improved by using a variety of spatial information data for training of the DL models.
Keywords
Deep Learning; Region-based Convolutional Neural Network; Object Detection; Semantic Segmentation;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 Back, C.S. and Yom, J.H. (2018), Comparison of point cloud volume calculated by artificial intelligence learning method and photogrammetric method, Proceedings of Korean Society of Surveying, Geodesy, Photogrammetry and Cartography, 19-20 April, Yongin, Korea, pp. 227-230.
2 Ball, J., Anderson, D., and Chan, C. (2017), A comprehensive survey of deep learning in remote sensing: Theories, tools and challenges for the community, Journal of Applied Remote Sensing, Vol. 11. No. 4, pp. 1-54.
3 Campos-Taberner, M., Romero-Soriano, A., Gatta, C., Camps-Valls, G., Lagrange, A., Le Saux, B., Beaupere, A., Boulch, A., Chan-Hon-Tong, A., Herbin, S., Randrianarivo, H., Ferecatu, M., Shimoni, M., Moser, G., and Tuia, D. (2016), Processing of extremely highresolution LiDAR and RGB data: Outcome of the 2015 IEEE GRSS data fusion contest-Part A: 2-D contest, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, Vol. 9, No. 12, pp. 5547-5559.   DOI
4 Choe, Y.J. and Yom, J.H. (2017), Downscaling of MODIS land surface temperature to LANDSAT scale using multi-layer perceptron, Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography, Vol. 35, No. 4, pp. 313-318. (in Korean with English abstract)   DOI
5 Chung, D. and Lee, I. (2017), Point cloud classification base on deep learning, Proceedings of Korean Society of Surveying, Geodesy, Photogrammetry, and Cartography, Yeosu, Korea, pp. 110-113. (in Korean with English abstract)
6 Deng, Z., Sun, H., Zhou, S., Zhao, Lei, L., and Zou, H. (2018), Multi-scale object detection in remote sensing imagery with convolutional neural networks, ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 145, pp. 3-22.   DOI
7 Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-Martinez, V., and Garcia-Rodriguez, J. (2017), A review on deep learning techniques applied to semantic segmentation, arXiv:1704.06857.
8 Girshick, R. (2015), Fast R-CNN, IEEE International Conference on Computer Vision, ICCV 2015, 13-16 December, Santiago, Chile, pp. 1440-1448.
9 Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2016), Region-based convolutional networks for accurate object detection and segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 38, No. 1, pp. 1-16.   DOI
10 Hazirbas, C., Ma, L., Domokos, C., and Cremers, D. (2016), FuseNet: Incorporating depth into semantic segmentation via fusion-based CNN architecture, Proceedings of the Asian Conference on Computer Vision, Vol. 2, 20-24 November, Taipei, Taiwan.
11 He, k., Gkioxari, G., Dollar, p., and Girshick, R. (2017), Mask R-CNN, Proceedings of IEEE International Conference on Computer Vision (ICCV) 2017, 22-29 October, Venice, Italy, pp. 2980-2988.
12 Hertz, J., Krogh, A., and Palmer, R. (1991), Introduction to the Theory of Neural Computation, Addison-Wesley, Reading, MA, 327p.
13 Kang, J., Korner, M., Wang, Y., Taubenbock, H., and Zhu, X. (2018), Building instance classification using street view images, ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 145, pp. 44-59.   DOI
14 Kemker, R., Salvaggio, C., and Kanan, C. (2018), Algorithms for semantic segmentation of multispectral remote sensing imagery using deep learning, ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 145, pp. 60-77.   DOI
15 Kim, H. and Bae, T., (2017), Preliminary study of deep learning-based precipitation prediction, Journal of the Korean Society of Surveying, Geodesy, Photogrammetry, and Cartography, Vol. 35, No. 5, 423-430.   DOI
16 Marmanis, D., Wegner, J., Galliani, S., Schindler, K., Datcu, M., and Stilla, U. (2016), Semantic segmentation of aerial images with an ensemble of CNNS, ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. 3-3, XXIII ISPRS Congress, 12-19 July, Prague, Czech Republic, pp. 473-480.
17 Krizhevsky, A., Sutskever, I., and Hinton, G. (2012), ImageNet classification with deep convolutional neural networks, Proceedings of the 25th International Conference on Neural Information Processing Systems, Vol. 1, 3-8 December, Lake Tahoe, Nevada, pp. 1097-1105.
18 LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R. Hubbard, W., and Jackel, L. (1989), Backpropagation applied to handwritten zip code recognition. Neural Computation, No. 1, Vol. 4, pp. 541-551.   DOI
19 Lee, G. and Yom, J.H. (2018), Design and implementation of web-based automatic preprocessing system of remote sensing imagery for machine learning modeling, Journal of the Korean Society for Geospatial Information Science, Vol. 26 No. 1, pp. 61-67. (in Korean with English abstract)
20 Long, J., Shelhamer, E., and Darrell, T. (2015), Fully convolutional networks for semantic segmentation, Proceedings of IEEE Conference on Computer Vision and Patton Recognition, 7-12 June, Boston, MA, pp. 3431-3440.
21 Maturana, D. and Scherer, S. (2015), 3D Convolutional neural networks for landing zone detection from LiDAR, IEEE International Conference on Robotics and Automation, Seattle, Washington, 26-30 May, pp. 3471-3478.
22 McCulloch, W. and Pitts, W. (1943), A logical calculus of the ideas immanent in nervous activity, Bulletin of Mathematical Biophysics, Vol. 7, pp. 115-133.
23 Oh, H. (2010), Landslide detection and landslide susceptibility mapping using aerial photos and artificial neural networks, Korean Journal of Remote Sensing, Vol. 26, No. 1, pp. 47-57. (in Korean with English abstract)
24 Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang Z., Karpathy, A., Khosla, A., Bernstein, M., and Berg, A. (2015), Imagenet large scale visual recognition challenge, International Journal of Computer Vision, Vol. 115, No. 3, pp. 211-252.   DOI
25 Pang, Y., Sun, M., Jiang, X., and Li, X. (2018), Convolution in convolution for network in network, IEEE Transactions on Neural Networks and Learning Systems, Vol. 29, No. 5, pp. 1587-1597.   DOI
26 Parthasarathy, D. (2017), A brief history of CNNs in image segmentation: From R-CNN to Mask R-CNN, https://blog.athelas.com/a-brief-history-of-cnns-in-image-segmentation-from-r-cnn-to-mask-r-cnn-34ea83205de4 (last date accessed: 6 September 2018).
27 Ren, S., He, K., Girshick, R., and Sun, J. (2017), Faster R-CNN: Towards real-time object detection with region proposal networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, No. 6, pp. 1137-1149.   DOI
28 Rosenblatt, F. (1958), The perceptron: A probabilistic model for information storage and organization in the brain, Psychological Review, Vol. 65, No. 6, pp. 386-408.   DOI
29 Rumelhart, D., Hinton, G., and Williams, R. (1986), Learning internal representations by back-propagating errors, Nature, Vol. 323, No. 9, pp. 533-536.   DOI
30 Schenk, T. (1999), Digital Photogrammetry: Volume 1, TerraScience, Laurelville, OH, 428p.
31 Wang, S., Quan, D., Liang, X., Ning, M., Guo, Y., and Jiao, L. (2018), A deep learning framework for remote sensing image registration, ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 145, pp. 148-164.   DOI
32 Shaikh, F. (2018), Automatic image captioning using deep learning (CNN and LSTM) in PyTorch, Analytics vidhya, https://www.analyticsvidhya.com/blog/2018/04/solving-an-image-captioning-task-using-deep-learning/ (last date accessed: 31 October 2018).
33 Simard, P., Steinkraus, D., and Platt, J. (2003), Best practices for convolutional neural networks applied to visual document analysis, Proceedings of the Seventh International Conference on Document Analysis and Recognition, ICDAR 2003, 3-6 August, Vol. 2, pp. 958-962.
34 Tokarczyk, P., Wegner, J., Walk, S., and Schindler, K. (2015), Features, color spaces, and boosting: new insights on semantic classification of remote sensing images, IEEE Transactions on Geoscience And Remote Sensing, Vol. 53, No. 1, pp. 280-295.   DOI
35 You, Q., Jin, H., Wang, Z., Fang, C., and Luo, J. (2016), Image captioning with semantic attention, IEEE Conference on Computer Vision and Pattern Recognition, 26 June-1 July, Las Vegas, Nevada, pp. 4651-4659.
36 Vo, A.V., Truong-Hong, L., Laefer, D., Tiede, D., d'Oleire-Oltmanns, S., Baraldi, A., Shimoni, M., Moser, G., and Tuia, D. (2016), Processing of extremely high resolution LiDAR and RGB Data: Outcome of the 2015 IEEE GRSS data fusion contest-Part B: 3-D Contest, IEEE Journal of Selected Topics In Applied Earth Observations And Remote Sensing, Vol. 9, No. 12, pp. 5560-5575.   DOI
37 Audebert, N., Le Saux, B., and Lefevre, S. (2018), Beyond RGB: Very high resolution urban remote sensing with multimodal deep networks, ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 140, pp. 20-32.   DOI
38 Zhang, B., Gu, J., Chen, C., Han, J., Su, X., Cao, X., and Liu, J. (2018), One-two-one networks for compression artifacts in remote sensing, ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 145, pp. 184-196.   DOI
39 Xing, Y., Wang, M., Yang, S., and Jiao, L. (2018), Pansharpening via deep metric learning, ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 145, pp. 165-183.   DOI
40 Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., Zemel, R., and Bengio, Y. (2015), Show, attend and tell: Neural image caption generation with visual attention, International Conference on Machine Learning, 6-11 July, Lille, France, pp. 2048-2057.