Browse > Article
http://dx.doi.org/10.7780/kjrs.2021.37.1.9

Keypoint-based Deep Learning Approach for Building Footprint Extraction Using Aerial Images  

Jeong, Doyoung (Department of Civil and Environmental Engineering, Seoul National University)
Kim, Yongil (Department of Civil and Environmental Engineering, Seoul National University)
Publication Information
Korean Journal of Remote Sensing / v.37, no.1, 2021 , pp. 111-122 More about this Journal
Abstract
Building footprint extraction is an active topic in the domain of remote sensing, since buildings are a fundamental unit of urban areas. Deep convolutional neural networks successfully perform footprint extraction from optical satellite images. However, semantic segmentation produces coarse results in the output, such as blurred and rounded boundaries, which are caused by the use of convolutional layers with large receptive fields and pooling layers. The objective of this study is to generate visually enhanced building objects by directly extracting the vertices of individual buildings by combining instance segmentation and keypoint detection. The target keypoints in building extraction are defined as points of interest based on the local image gradient direction, that is, the vertices of a building polygon. The proposed framework follows a two-stage, top-down approach that is divided into object detection and keypoint estimation. Keypoints between instances are distinguished by merging the rough segmentation masks and the local features of regions of interest. A building polygon is created by grouping the predicted keypoints through a simple geometric method. Our model achieved an F1-score of 0.650 with an mIoU of 62.6 for building footprint extraction using the OpenCitesAI dataset. The results demonstrated that the proposed framework using keypoint estimation exhibited better segmentation performance when compared with Mask R-CNN in terms of both qualitative and quantitative results.
Keywords
Building footprint extraction; Keypoint detection; Instance segmentation; Deep learning;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Zhang, L., J. Wu, Y. Fan, H. Gao, and Y. Shao, 2020. An Efficient Building Extraction Method from High Spatial Resolution Remote Sensing Images Based on Improved Mask R-CNN, Sensors, 20(5): 1465.   DOI
2 Zhao, K.,J. Kang, J. Jung, and G. Sohn, 2018, Building Extraction From Satellite Images Using Mask R-CNN With Building Boundary Regularization, Proc. of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, Jun. 18-22, pp. 247-251.
3 Dai, J., Y. Li, K. He, and J.J. Sun, 2016. R-fcn: Object detection via region-based fully convolutional networks, arXiv preprint, arXiv(1605.06409): 379-387.
4 He, K., G. Gkioxari, P. Dollar, and R. Girshick, 2017. Mask r-cnn, Proc. of 2017 the IEEE international conference on computer vision, Venice, ITA, Oct. 22-29, pp. 2961-2969.
5 He, K., X. Zhang, S. Ren, and J. Sun, 2016. Deep residual learning for image recognition, Proc. of 2016 the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, Jun. 27-30, Vol. 1, pp. 770-778.
6 Huang, L., Y. Yang, Y. Deng, and Y.J. Yu, 2015. Densebox: Unifying landmark localization with end to end object detection, arXiv preprint, arXiv(1509.04874): 1-13.
7 Acuna, D., H. Ling, A. Kar, and S. Fidler, 2018. Efficient interactive annotation of segmentation datasets with polygon-rnn++, Proc. of 2018 the IEEE conference on Computer Vision and Pattern Recognition, Salt lake city, UT, Jun.18-22, pp. 859-868.
8 Van Etten, A., D. Lindenbaum, and T. Bacastow, 2018. Spacenet: Aremote sensing dataset and challenge series, arXiv preprint, arXiv(1807.01232): 1-21.
9 Wei, F., X. Sun, H. Li, J. Wang, and S. Lin, 2020, Point-set anchors for object detection, instance segmentation and pose estimation, Proc. of 2020 European Conference on Computer Vision, Glasgow, UK, Aug. 23-28, pp. 527-544.   DOI
10 Wu, Y.,A. Kirillov, F. Massa, W.-Y. Lo, and R. Girshick, 2019, Detectron2, https://github.com/facebookresearch/detectron2,Accessed on Nov. 6, 2020.
11 Xu, Y., L. Wu, Z. Xie, and Z. Chen, 2018. Building extraction in very high resolution remote sensing imagery using deep learning and guided filters, Remote Sensing, 10(1): 144.   DOI
12 Lee, Y. and J. Park, 2020. Center Mask: Real-time anchor-free instance segmentation, Proc. of 2020 the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, Jun. 13-19, Vol. 1, pp. 13906-13915.
13 Castrejon, L., K. Kundu, R. Urtasun, and S. Fidler, 2017. Annotating object instances with a polygonrnn, Proc. of 2017 the IEEE conference on computer vision and pattern recognition, Hawaii convention Center Honolulu, HI, Jul. 21-26, pp. 5230-5238.
14 Iglovikov, V. and A.J. Shvets, 2018. Ternausnet: U-net with vgg11 encoder pre-trained on imagenet for image segmentation, arXiv preprint, arXiv(1801.05746): 1-5.
15 Ji, S., S. Wei, and M. Lu, 2019. A scale robust convolutional neural network for automatic building extraction from aerial and satellite imagery, International Journal of Remote Sensing, 40(9): 3308-3322.   DOI
16 Kingma, D.P. and J. Ba, 2014. Adam: A method for stochastic optimization, arXiv preprint, arXiv(1412.6980): 1-15.
17 Law, H. and J. Deng, 2018. Cornernet: Detecting objects as paired keypoints, Proc. of 2018 the European Conference on Computer Vision (ECCV), Munich, GER, Sep. 8-15, pp. 734-750.
18 Li, M., F. Lafarge, and R. Marlet, 2020. Approximating shapes in images with low-complexity polygons, Proc. of 2020 the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, Jun. 13-19, pp. 8633-8641.
19 Lin, T.-Y., P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, 2017. Feature pyramid networks for object detection, Proc. of 2017 the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, Jul. 21-26, pp. 2117-2125.
20 Li, Z., J.D. Wegner, and A. Lucchi, 2019. Topological map extraction from overhead images, Proc. of 2019 the IEEE International Conference on Computer Vision, Seoul, KOR, Oct. 27-Nov. 2 pp. 1715-1724.
21 Lin, T.-Y., P. Goyal, R. Girshick, K. He, and P. Dollar, 2017. Focal loss for dense object detection, Proc. of 2017 the IEEE International Conference on Computer Vision, Venice, ITA, Oct. 22-29, pp. 2980-2988.
22 Long, J., E. Shelhamer, and T. Darrell, 2015, Fully convolutional networks for semantic segmentation, Proc. of 2015 the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, JUN. 7-12, pp. 3431-3440.
23 Luo, C., X. Chu, and A. Yuille, 2018. Orinet: A fully convolutional network for 3d human pose estimation, arXiv preprint, arXiv(1811.04989): 1-14.
24 Pasquali, G., G.C. Iannelli, and F.J. Dell'Acqua, 2019. Building Footprint Extraction from Multispectral, Spaceborne Earth Observation Datasets Using a Structurally Optimized U-Net Convolutional Neural Network, Remote Sensing, 11(23): 2803.   DOI
25 Ren, S., K. He, R. Girshick, and J. Sun, 2015. Faster r-cnn: Towards real-time object detection with region proposal networks, Advances in Neural Information Processing Systems, arXiv(1506.01497): 91-99.
26 Sohn, G., I. Dowman, and R. Sensing, 2007. Data fusion of high-resolution satellite imagery and LiDAR data for automatic building extraction, ISPRS Journal of Photogrammetry and Remote Sensing, 62(1): 43-63.   DOI
27 Tian, Z.,C. Shen, H. Chen, and T. He, 2019. Fcos: Fully convolutional one-stage object detection, Proc. of the IEEE International Conference on computer vision, Seoul, KOR, Oct. 27-Nov. 2, pp. 9627-9636.