Acknowledgement
This research was conducted with the support of the "2023 Yonsei University Future-Leading Research Initiative (No. 2023-22-0114)" and the "National R&D Project for Smart Construction Technology (No. RS-2020-KA156488)" funded by the Korea Agency for Infrastructure Technology Advancement under the Ministry of Land, Infrastructure and Transport, and managed by the Korea Expressway Corporation. And this research used datasets from 'The Open AI Dataset Project (AI-Hub, S. Korea)'. All data information can be accessed through 'AI-Hub (www.aihub.or.kr)'.
References
- O. Maali, C.-H. Ko, and P. H. D. Nguyen, "Applications of existing and emerging construction safety technologies," Automation in Construction, vol. 158, p. 105231, Feb. 2024
- G. Jocher et al., "ultralytics/yolov5: v7.0 - YOLOv5 SOTA Realtime Instance Segmentation." Zenodo, Nov. 22, 2022.
- D. Bolya, C. Zhou, F. Xiao, and Y. J. Lee, "YOLACT: Real-time Instance Segmentation." arXiv, Oct. 24, 2019.
- W.-C. Chern, J. Hyeon, T. V. Nguyen, V. K. Asari, and H. Kim, "Context-aware safety assessment system for far-field monitoring," Automation in Construction, vol. 149, p. 104779, May 2023.
- H. Guo, Z. Zhang, R. Yu, Y. Sun, and H. Li, "Action Recognition Based on 3D Skeleton and LSTM for the Monitoring of Construction Workers' Safety Harness Usage," Journal of Construction Engineering and Management, vol. 149, no. 4, p. 04023015, Apr. 2023.
- X. Luo, H. Li, X. Yang, Y. Yu, and D. Cao, "Capturing and Understanding Workers' Activities in Far-Field Surveillance Videos with Deep Action Recognition and Bayesian Nonparametric Learning," Computer-Aided Civil and Infrastructure Engineering, vol. 34, no. 4, pp. 333-351, 2019.
- P. Zhai, J. Wang, and L. Zhang, "Extracting Worker Unsafe Behaviors from Construction Images Using Image Captioning with Deep Learning-Based Attention Mechanism," Journal of Construction Engineering and Management, vol. 149, no. 2, p. 04022164, Feb. 2023.
- A. Radford et al., "Learning Transferable Visual Models From Natural Language Supervision." arXiv, Feb. 26, 2021.
- J. Li, D. Li, C. Xiong, and S. Hoi, "BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation." arXiv, Feb. 15, 2022.
- H. Chen et al., "Augmented reality, deep learning and vision-language query system for construction worker safety," Automation in Construction, vol. 157, p. 105158, Jan. 2024.
- M. R. Morris et al., "Levels of AGI: Operationalizing Progress on the Path to AGI." arXiv, Jan. 05, 2024.
- X. Chen et al., "PaLI: A Jointly-Scaled Multilingual Language-Image Model." arXiv, Jun. 05, 2023.
- OpenAI et al., "GPT-4 Technical Report." arXiv, Dec. 18, 2023.
- J. Gu et al., "A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models." arXiv, Jul. 24, 2023.
- H. Strobelt et al., "Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models," IEEE Transactions on Visualization and Computer Graphics, vol. 29, no. 01, pp. 1146-1156, Jan. 2023
- X. Chen et al., "Microsoft COCO Captions: Data Collection and Evaluation Server." arXiv, Apr. 03, 2015.
- S. Changpinyo, P. Sharma, N. Ding, and R. Soricut, "Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts." arXiv, Mar. 30, 2021.
- C. Schuhmann et al., "LAION-5B: An open large-scale dataset for training next generation image-text models." arXiv, Oct. 15, 2022.
- H. Liu, C. Li, Q. Wu, and Y. J. Lee, "Visual Instruction Tuning." arXiv, Dec. 11, 2023.
- F. Gilardi, M. Alizadeh, and M. Kubli, "ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks," Proc. Natl. Acad. Sci. U.S.A., vol. 120, no. 30, p. e2305016120, Jul. 2023.
- E. J. Hu et al., "LoRA: Low-Rank Adaptation of Large Language Models." arXiv, Oct. 16, 2021.