DOI QR코드

DOI QR Code

EMOS: Enhanced moving object detection and classification via sensor fusion and noise filtering

  • Dongjin Lee (Autonomous Driving Intelligence Research Section, Mobility Robot Research Division, Superintelligence Creative Research Laboratory, Electronics and Telecommunications Research Institute) ;
  • Seung-Jun Han (Autonomous Driving Intelligence Research Section, Mobility Robot Research Division, Superintelligence Creative Research Laboratory, Electronics and Telecommunications Research Institute) ;
  • Kyoung-Wook Min (Autonomous Driving Intelligence Research Section, Mobility Robot Research Division, Superintelligence Creative Research Laboratory, Electronics and Telecommunications Research Institute) ;
  • Jungdan Choi (Autonomous Driving Intelligence Research Section, Mobility Robot Research Division, Superintelligence Creative Research Laboratory, Electronics and Telecommunications Research Institute) ;
  • Cheong Hee Park (Department of Computer Science and Engineering, Chungnam National University)
  • Received : 2023.03.21
  • Accepted : 2023.08.09
  • Published : 2023.10.20

Abstract

Dynamic object detection is essential for ensuring safe and reliable autonomous driving. Recently, light detection and ranging (LiDAR)-based object detection has been introduced and shown excellent performance on various benchmarks. Although LiDAR sensors have excellent accuracy in estimating distance, they lack texture or color information and have a lower resolution than conventional cameras. In addition, performance degradation occurs when a LiDAR-based object detection model is applied to different driving environments or when sensors from different LiDAR manufacturers are utilized owing to the domain gap phenomenon. To address these issues, a sensor-fusion-based object detection and classification method is proposed. The proposed method operates in real time, making it suitable for integration into autonomous vehicles. It performs well on our custom dataset and on publicly available datasets, demonstrating its effectiveness in real-world road environments. In addition, we will make available a novel three-dimensional moving object detection dataset called ETRI 3D MOD.

Keywords

Acknowledgement

This work was supported by the Institute for Information & Communications Technology Planning & Evaluation (IITP) under grant funded by the Korean government (MSIT) (No. 2021-0-00891, Development of AI Service Integrated Framework for Autonomous Driving) and the Korean government (MSIP) (No. 2020-0-00002, Development of Standard SW Platform-Based Autonomous Driving Technology to Solve Social Problems of Mobility and Safety for Marginalized Public Transport Communities).

References

  1. D. Choi, S.-J. Han, K.-W. Min, and J. Choi, PathGAN: Local path planning with attentive generative adversarial networks, ETRI J. 44 (2022), no. 6, 1004-1019. https://doi.org/10.4218/etrij.2021-0192
  2. S.-J. Han, J. Kang, K.-W. Min, and J. Choi, DiLO: Direct light detection and ranging odometry based on spherical range images for autonomous driving, ETRI J. 43 (2021), no. 4, 603-616. https://doi.org/10.4218/etrij.2021-0088
  3. J. Kang, S.-J. Han, N. Kim, and K.-W. Min, ETLi: Efficiently annotated traffic Lidar dataset using incremental and suggestive annotation, ETRI J. 43 (2021), no. 4, 630-639. https://doi.org/10.4218/etrij.2021-0055
  4. Y. Zhou and O. Tuzel, VoxelNet: End-to-end learning for point cloud based 3D object detection, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA), 2018, pp. 4490-4499.
  5. H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, nuScenes: A multimodal dataset for autonomous driving, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA), 2020, pp. 11621-11631.
  6. A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, Vision meets robotics: The KITTI dataset, Int. J. Robot. Res. 32 (2013), no. 11, 1231-1237. https://doi.org/10.1177/0278364913491297
  7. P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine, V. Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Timofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi, Y. Zhang, J. Shlens, Z. Chen, and D. Anguelov, Scalability in perception for autonomous driving: Waymo open dataset, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA), 2020, pp. 2446-2454.
  8. B. Wilson, W. Qi, T. Agarwal, J. Lambert, J. Singh, S. Khandelwal, B. Pan, R. Kumar, A. Hartnett, J. K. Pontes, D. Ramanan, P. Carr, and J. Hays, Argoverse 2: Next generation datasets for self-driving perception and forecasting, arxiv preprint, 2023. DOI 10.48550/arXiv.2301.00493
  9. J. Lambert, A. Carballo, A. M. Cano, P. Narksri, D. R. Wong, E. Takeuchi, and K. Takeda, Performance analysis of 10 models of 3D LiDARs for automated driving, IEEE Access 8 (2020), 131699-131722. https://doi.org/10.1109/ACCESS.2020.3009680
  10. Q. Xu, Y. Zhou, W. Wang, C. R. Qi, and D. Anguelov, SPG: Unsupervised domain adaptation for 3D object detection via semantic point generation, (IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada), 2021, pp. 15446-15456.
  11. X. Bai, Z. Hu, X. Zhu, Q. Huang, Y. Chen, H. Fu, and C.-L. Tai, TransFusion: Robust Lidar-camera fusion for 3D object detection with transformers, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA), 2022, pp. 1090-1099.
  12. X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, Multi-View 3D object detection network for autonomous driving, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA), 2017, pp. 1907-1915.
  13. Z. Liu, H. Tang, A. Amini, X. Yang, H. Mao, D. Rus, and S. Han, BEVFusion: Multi-task multi-sensor fusion with unified bird's-eye view representation, (IEEE International Conference on Robotics and Automation (ICRA)), London, UK), 2023.
  14. AIHub, 2023. Available from: https://aihub.or.kr/ [last accessed February 2023].
  15. C. R. Qi, H. Su, K. Mo, and L. J. Guibas, PointNet: Deep learning on point sets for 3D classification and segmentation, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA), 2017, pp. 652-660.
  16. Y. Yan, Y. Mao, and B. Li, SECOND: Sparsely embedded convolutional detection, Sensors 18 (2018), no. 10, 3337.
  17. A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, PointPillars: Fast encoders for object detection from point clouds, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA), 2019, pp. 12697-12705.
  18. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, PyTorch: An imperative style, high-performance deep learning library, (International Conference on Neural Information Processing Systems (NIPS)), Vancouver, Canada, 2019, pp. 8026-8037.
  19. NVIDIA TensorRT, 2023. Available from: https://developer.nvidia.com/tensorrt [last accessed February 2023].
  20. Y. Zhou, P. Sun, Y. Zhang, D. Anguelov, J. Gao, T. Y. Ouyang, J. Guo, J. Ngiam, and V. Vasudevan, End-to-end multi-view fusion for 3D object detection in LiDAR point clouds, (Conference on Robot Learning (CoRL)), Osaka, Japan), 2019, pp. 923-932.
  21. T. Yin, X. Zhou, and P. Krahenbuhl, Center-based 3D object detection and tracking, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA), 2021, pp. 11784-11793.
  22. A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, Commun. ACM 60 (2017), no. 6, 84-90. https://doi.org/10.1145/3065386
  23. K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, (International Conference on Learning Representations (ICLR), CA, USA), 2015.
  24. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, Going deeper with convolutions, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA), 2015, pp. 1-9.
  25. K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA), 2016, pp. 770-778.
  26. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, ImageNet: A large-scale hierarchical image database, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA), 2009, pp. 248-255.
  27. M. Everingham, L. Gool, C. K. Williams, J. Winn, and A. Zisserman, The Pascal Visual Object Classes (VOC) challenge, Int. J. Robot. Res. 88 (2010), no. 2, 303-338.
  28. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick, Microsoft COCO: Common objects in context, European Conference on Computer Vision (ECCV), D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, (eds.), Springer International Publishing, Zurich, Switzerland, 2014, pp. 740-755.
  29. G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, Densely connected convolutional networks, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA), 2017, pp. 4700-4708.
  30. Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, A ConvNet for the 2020s, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA), 2022, pp. 11976-11986.
  31. S. Xie, R. Girshick, P. Dollar, Z. Tu, and K. He, Aggregated residual transformations for deep neural networks, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA), 2017, pp. 1492-1500.
  32. M. Tan and Q. V. Le, EfficientNet: Rethinking model scaling for convolutional neural networks, (International Conference on Machine Learning(ICML), California, USA), 2019, pp. 6105-6114.
  33. A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, MobileNets: Efficient convolutional neural networks for mobile vision applications, arXiv Preprint, 2017. http://arxiv.org/abs/1704.04861
  34. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, An image is worth 1616 words: Transformers for image recognition at scale, (International Conference on Learning Representations (ICLR), Vienna, Austria), 2021.
  35. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, (IEEE/CVF International Conference on Computer Vision (ICCV)), 2021, pp. 10012-10022.
  36. D. Lee, D.-W. Kang, J. Kang, J.-Y. Kim, K.-W. Min, J.-H. Park, K.-B. Sung, Y.-S. Song, T.-H. An, Y.-W. Jo, D. Choi, J.-D. Choi, and S.-J. Han, Apparatus for recognizing object of automated driving system using error removal based on object classification and method using the same, Tech. Report US11507783B2. U.S. Patent. USA, 2022.
  37. P. Xiao, S. Shao, Z. Zhang, X. Chai, J. Jiao, Z. Li, J. Wu, K. Sun, K. Jiang, Y. Wang, and D. Yang, PandaSet: Advanced sensor suite dataset for autonomous driving, (IEEE Intelligent Transportation Systems Conference (ITSC), Indianapolis, USA), 2021, pp. 3095-3101.
  38. E. Li, S. Wang, C. Li, D. Li, X. Wu, and Q. Hao, SUSTech POINTS: A portable 3D point cloud interactive annotation platform system, (IEEE Intelligent Vehicles symposium (IV)), Nevada, USA), 2020, pp. 1108-1115.
  39. H. Jung, Sensor fusion multi-object tracking and prediction data, 2021. Available from: https://aihub.or.kr/ [last accessed February 2023].
  40. M. Everingham, S. M. A. Eslami, L. V. Gool, C. K. I. Williams, J. M. Winn, and A. Zisserman, The Pascal Visual Object Classes challenge: A retrospective, Int. J. Comput. Vis. 111 (2014), 98-136. https://doi.org/10.1007/s11263-014-0733-5
  41. L. Du, X. Ye, X. Tan, E. Johns, B. Chen, E. Ding, X. Xue, and J. Feng, AGO-Net: Association-guided 3D point cloud object detection network, IEEE Trans. Pattern Anal. Machine Intell. (PAMI) 44 (2022), no. 11, 8097-8109.
  42. I. Loshchilov and F. Hutter, Decoupled weight decay regularization, (International Conference on Learning Representations (ICLR), LA, USA), 2019.
  43. A. Gupta, P. Dollar, and R. Girshick, LVIS: A dataset for large vocabulary instance segmentation, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), California, USA), 2019, pp. 5356-5364.
  44. I. Sutskever, J. Martens, G. Dahl, and G. Hinton, On the importance of initialization and momentum in deep learning, (International Conference on Machine Learning (ICML), Georgia, USA), 2013, pp. 1139-1147.
  45. K. H. Brodersen, C. S. Ong, K. E. Stephan, and J. M. Buhmann, The balanced accuracy and its posterior distribution, (International Conference on Pattern Recognition (ICPR), Istanbul, Turkey), 2010, pp. 3121-3124.
  46. Y. Wang, X. Chen, Y. You, L. E. Li, B. Hariharan, M. Campbell, K. Q. Weinberger, and W.-L. Chao, Train in Germany, Test in the USA: Making 3D object detectors generalize, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA), 2020, pp. 11713-11723.
  47. J. Yang, S. Shi, Z. Wang, H. Li, and X. Q, ST3D++: Denoised self-training for unsupervised domain adaptation on 3D object detection, IEEE Trans. Pattern Anal. Machine Intell. 45 (2023), no. 5, 6354-6371.
  48. OpenPCDet: An open-source toolbox for 3D object detection from point clouds, 2020. Available from: https://github.com/open-mmlab/OpenPCDet [last accessed July 2023].
  49. E. D. Cubuk, B. Zoph, J. Shlens, and Q. V. Le, Randaugment: Practical automated data augmentation with a reduced search space, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Seattle, USA), 2020, pp. 702-703.
  50. J. Ngiam, B. Caine, W. Han, B. Yang, Y. Chai, P. Sun, Y. Zhou, X. Yi, O. Alsharif, P. Nguyen, Z. Chen, J. Shlens, and V. Vasudevan, StarNet: Targeted computation for object detection in point clouds. arXiv Preprint, 2019, https://arxiv.org/abs/1908.11069
  51. Z. Yang, Y. Zhou, Z. Chen, and J. Ngiam, 3DMAN: 3D multi-frame attention network for object detection, (IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Virtual)), 2021, pp. 1863-1872.