DOI QR코드

DOI QR Code

ETLi: Efficiently annotated traffic LiDAR dataset using incremental and suggestive annotation

  • Kang, Jungyu (Artificial Intelligence Research Laboratory, Electronics and Telecommunications Research Institute) ;
  • Han, Seung-Jun (Artificial Intelligence Research Laboratory, Electronics and Telecommunications Research Institute) ;
  • Kim, Nahyeon (Electronics and Avionics Engineering Department, Korea Aerospace University) ;
  • Min, Kyoung-Wook (Artificial Intelligence Research Laboratory, Electronics and Telecommunications Research Institute)
  • Received : 2021.02.16
  • Accepted : 2021.06.16
  • Published : 2021.08.01

Abstract

Autonomous driving requires a computerized perception of the environment for safety and machine-learning evaluation. Recognizing semantic information is difficult, as the objective is to instantly recognize and distinguish items in the environment. Training a model with real-time semantic capability and high reliability requires extensive and specialized datasets. However, generalized datasets are unavailable and are typically difficult to construct for specific tasks. Hence, a light detection and ranging semantic dataset suitable for semantic simultaneous localization and mapping and specialized for autonomous driving is proposed. This dataset is provided in a form that can be easily used by users familiar with existing two-dimensional image datasets, and it contains various weather and light conditions collected from a complex and diverse practical setting. An incremental and suggestive annotation routine is proposed to improve annotation efficiency. A model is trained to simultaneously predict segmentation labels and suggest class-representative frames. Experimental results demonstrate that the proposed algorithm yields a more efficient dataset than uniformly sampled datasets.

Keywords

Acknowledgement

This research work was supported by the Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korean government (MSIP) (No. 2020-0-00002, Development of standard SW platform-based autonomous driving technology to solve social problems of mobility and safety for public transport-marginalized communities)

References

  1. R. F. Salas-Moreno et al., Slam++: Simultaneous localisation and mapping at the level of objects, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (Portland, PR, USA), June 2013, pp. 1352-1359.
  2. A. Flint et al., Growing semantically meaningful models for visual slam, in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (San Francisco, CA, USA), June 2010, pp. 467-474.
  3. A. Rosinol et al., Kimera: An open-source library for real-time metric-semantic localization and mapping, in Proc. IEEE Int. Conf. Robot. Autom. (Paris, France), May 2020, pp. 1689-1696.
  4. C. Yu et al., DS-SLAM: A semantic visual slam towards dynamic environments, in Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst. (Madrid, Spain), Oct. 2018, pp. 1168-1174.
  5. M. Kaneko et al., Mask-slam: Robust feature-based monocular slam by masking using semantic segmentation, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, (Salt Lake City, UT, USA), June 2018, pp. 258-266.
  6. M. Schreiber, C. Knppel, and U. Franke, Laneloc: Lane marking based localization using highly accurate maps, in Proc. IEEE Intell. Veh. Symp. (Gold Coast, Australia), June 2013, pp. 449-454.
  7. J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (Boston, MA, USA), June 2015, pp. 3431-3440.
  8. V. Badrinarayanan, A. Kendall, and R. Cipolla, Segnet: A deep convolutional encoder-decoder architecture for image segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 39 (2017), no. 12, 2481-2495. https://doi.org/10.1109/TPAMI.2016.2644615
  9. L. C. Chen et al., Rethinking atrous convolution for semantic image segmentation, arXiv preprint, CoRR, 2017, arXiv: 1706.05587.
  10. H. Zhao et al., Pyramid scene parsing network, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (Honolulu, HI, USA), July 2017, pp. 2881-2890.
  11. A. Paszke et al., Enet: A deep neural network architecture for real-time semantic segmentation, arXiv preprin, CoRR, 2016, arXiv: 1606.02147.
  12. S. Mehta et al., Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation, in Proc, Eur. Conf. Comput. Vis. (Munich, Germany), Sept. 2018, pp. 552-568.
  13. H. Zhao et al., Icnet for real-time semantic segmentation on high-resolution images, in Proc. Eur. Conf. Comput. Vis. (Munich, Germany), Sept. 2018, pp. 405-420.
  14. E. Romera et al., Erfnet: Efficient residual factorized convnet for real-time semantic segmentation, IEEE Trans. Intell. Transp. Syst. 19 (2017), no. 1, 263-272. https://doi.org/10.1109/tits.2017.2750080
  15. W. Chen et al., Fasterseg: Searching for faster real-time semantic segmentation, arXiv preprint, CoRR, 2019, arXiv: 1912.10917.
  16. A. Geiger, P. Lenz, and R. Urtasun, Are we ready for autonomous driving? the kitti vision benchmark suite, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (Providence, RI, USA), June 2012, pp. 3354-3361.
  17. M. Cordts et al., The cityscapes dataset for semantic urban scene understanding, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (Las Vegas, NV, USA), July 2016, pp. 3213-3223.
  18. G. Neuhold et al., The mapillary vistas dataset for semantic understanding of street scenes, in Proc. IEEE Int. Conf. Comput. Vis. (Venice, Italy), Oct. 2017, pp. 4990-4999.
  19. N. Silberman et al., Indoor segmentation and support inference from RGBD images, in Computer Vision-ECCV 2012, Springer, Berlin, Heidelberg, Germany, 2012, pp. 746-760.
  20. G. Ros et al., The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (Las Vegas, NV, USA), June 2016, pp. 3234-3243.
  21. J. Behley et al., Semantickitti: A dataset for semantic scene understanding of lidar sequences, in Proc. IEEE Int. Conf. Comput. Vis. (Seoul, Rep. of Korea), Oct. 2019, pp. 9297-9307.
  22. K. Geng et al., Deep dual-modal traffic objects instance segmentation method using camera and lidar data for autonomous driving, Remote Sens. 12 (2020), no. 20, 3274. https://doi.org/10.3390/rs12203274
  23. O. Ronneberger, P. Fischer, and T. Brox, U-net: Convolutional networks for biomedical image segmentation, in Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015, Springer, Springer, Cham, Netherlands, 2015, pp. 234-241.
  24. S. Wu, X. Li, and X. Wang, Iou-aware single-stage object detector for accurate localization, Image Vision Comput. 97 (2020), 103911. https://doi.org/10.1016/j.imavis.2020.103911
  25. A. Paszke et al., Automatic differentiation in pytorch, in Proc. Neural Inf. Process. Syst. (Long Beach, CA, USA), Oct. 2017, pp. 1-4.
  26. S. R. Bulo, L. Porzi, and P. Kontschieder, In-place activated batchnorm for memory-optimized training of dnns, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (Salt Lake City, UT, USA), June 2018, pp. 5639-5647.
  27. I. Loshchilov and F. Hutter, Fixing weight decay regularization in adam, in Proc. Int. Conf. Learn. Represent. (Vancouver, Canada), Apr. 2018, pp. 1-14.
  28. L. N. Smith, Cyclical learning rates for training neural networks, in Proc. IEEE Winter Conf. Appl. Comput. Vis. (Santa Rosa, CA, USA), Mar. 2017, pp. 464-472.
  29. M. Berman, A. R. Triki, and M. B. Blaschko, The lovasz-softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (Salt Lake City, NV, USA), June 2018, pp. 4413-4421.
  30. A. Chaurasia and E. Culurciello, Linknet: Exploiting encoder representations for efficient semantic segmentation, in Proc. IEEE Vis. Commun. Image Process. (St. Petersburg, FL, USA), Dec. 2017, pp. 1-4.
  31. T. Pohlen et al., Full-resolution residual networks for semantic segmentation in street scenes, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (Honolulu, HI, USA), July 2017, pp. 4151-4160.
  32. K. He et al., Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, in Proc. IEEE Int. Conf. Comput. Vis. (Santiago, Chile), Dec. 2015, pp. 1026-1034.
  33. P. Izmailov et al., Averaging weights leads to wider optima and better generalization, arXiv preprint, CoRR, 2018, arXiv: 1803.05407.
  34. X. Chen et al., Suma++: Efficient lidar-based semantic slam, in Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst. (Macau, China), Nov. 2019, pp. 4530-4537.