DOI QR코드

DOI QR Code

Lightweight Deep Learning Model for Real-Time 3D Object Detection in Point Clouds

실시간 3차원 객체 검출을 위한 포인트 클라우드 기반 딥러닝 모델 경량화

  • Kim, Gyu-Min (School of Electronics and Information Engineering, Korea Aerospace University) ;
  • Baek, Joong-Hwan (School of Electronics and Information Engineering, Korea Aerospace University) ;
  • Kim, Hee Yeong (LinktoTo Co. Ltd)
  • Received : 2022.08.05
  • Accepted : 2022.08.29
  • Published : 2022.09.30

Abstract

3D object detection generally aims to detect relatively large data such as automobiles, buses, persons, furniture, etc, so it is vulnerable to small object detection. In addition, in an environment with limited resources such as embedded devices, it is difficult to apply the model because of the huge amount of computation. In this paper, the accuracy of small object detection was improved by focusing on local features using only one layer, and the inference speed was improved through the proposed knowledge distillation method from large pre-trained network to small network and adaptive quantization method according to the parameter size. The proposed model was evaluated using SUN RGB-D Val and self-made apple tree data set. Finally, it achieved the accuracy performance of 62.04% at mAP@0.25 and 47.1% at mAP@0.5, and the inference speed was 120.5 scenes per sec, showing a fast real-time processing speed.

3D 물체검출은 대체로 자동차, 버스, 사람, 가구 등과 같은 비교적 크기가 큰 데이터를 검출하는 것을 목표로 두어 작은 객체 검출에는 취약하다. 또한, 임베디드 기기와 같은 자원이 제한적인 환경에서는 방대한 연산량 때문에 모델의 적용이 어렵다. 본 논문에서는 1개의 레이어만을 사용하여 로컬 특징에 중점을 두어 작은 객체 검출의 정확도를 높였으며, 제안한 사전 학습된 큰 네트워크에서 작은 네트워크로의 지식 증류법과 파라미터 크기에 따른 적응적 양자화를 통해 추론 속도를 향상시켰다. 제안 모델은 SUN RGB-D Val 와 자체 제작한 모형 사과나무 데이터 셋을 이용하여 성능을 평가하였고 최종적으로 mAP@0.25에서 62.04%, mAP@0.5에서 47.1%의 정확도 성능을 보였으며, 추론 속도는 120.5 scenes per sec로 빠른 실시간 처리속도를 보였다.

Keywords

Acknowledgement

This research was supported by the GRRC program of Gyeonggi province [GRRC Aviation 2017-B04, Development of Intelligent Interactive Media and Space Convergence Application System].

References

  1. A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, "Vision meets Robotics: The KITTI Dataset," The International Journal of Robotics Research, vol. 32, no. 11, pp. 1231-1237, Aug. 2013. https://doi.org/10.1177/0278364913491297
  2. S. Song, S. P. Lichtenberg, and J. Xiao, "SUN RGB-D: A RGB-D scene understanding benchmark suite," in Proceedings of the IEEE conference on computer vision and pattern recognition, Boston: MA, USA, pp. 567-576, 2015.
  3. I. Armeni, O. Sener, A. R. Zamir, H. Jiang, I. Brilakis, M. Fischer, and S. Savarese, "3D Semantic Parsing of Large-Scale Indoor Spaces," in Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas: NV, USA, pp. 1534-1543, 2016.
  4. D. Rukhovich, A. Vorontsova, and A. Konushin, "FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection," arXiv Preprint arXiv:2112.00322, 2021.
  5. R. Q. Charles, H. Su, M. Kaichun, and L. J. Guibas, "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu: HI, USA, pp. 652-660, 2017.
  6. J. Deng, S. Shi, P. Li, W. Zhou, Y. Zhang, and H. Li, "Voxel R-CNN: Towards High Performance Voxel-Based 3D Object Detection," in Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, pp. 1201-1209, 2021.
  7. S. Shi, C. Guo, L. Jiang, Z. Wang, J. Shi, X. Wang, and H. Li, "PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, pp. 10529-10538, 2020.
  8. Y. Zhou and O. Tuzel, "VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City: UT, USA, pp. 4490-4499, 2018.
  9. Y. Yan, Y. Mao, and B. Li, "SECOND: Sparsely Embedded Convolutional Detection," Sensors, vol. 18, no. 10, p. 3337, Oct. 2018. https://doi.org/10.3390/s18103337
  10. B. Graham, "Sparse 3D convolutional neural networks," arXiv Preprint arXiv:1409.6070, 2015.
  11. C. Sager, P. Zschech, and N. Kuhl, "labelCloud: A Lightweight Domain-Independent Labeling Tool for 3D Object Detection in Point Clouds," arXiv Preprint arXiv:2103.04970, 2021.
  12. I. Misra, R. Girdhar, and A. Joulin, "An End-to-End Transformer Model for 3D Object Detection," in Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, pp. 2906-2917, 2021.
  13. Z. Zhang, B. Sun, H. Yang, and Q. Huang, "H3DNet: 3D Object Detection Using Hybrid Geometric Primitives," in Computer Vision - ECCV 2020, Glasgow, UK, pp. 311-329, 2020.
  14. Z. Liu, Z. Zhang, Y. Cao, H. Hu, and X. Tong, "Group-Free 3D Object Detection via Transformers," in Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, pp. 2949-2958, 2021.
  15. B. Cheng, L. Sheng, S. Shi, M. Yang, and D. Xu, "Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville: TN, USA, pp. 8963-8972, 2021.
  16. C. R. Qi, O. Litany, K. He, and L. J. Guibas, "Deep Hough Voting for 3D Object Detection in Point Clouds," in Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, pp. 9277-9286, 2019.
  17. P. Cignoni, M. Callieri, M. Corsini, M. Dellepiane, F. Ganovelli, and G. Ranzuglia, "MeshLab: an Open-Source Mesh Processing Tool," in Eurographics Italian Chapter Conference, Salerno, Italy, vol. 2008, pp. 129-136, 2008.