Acknowledgement
이 논문은 2022학년도 충북대학교 학술연구영역 사업의 연구비 지원에 의하여 연구 되었음
References
- 행정안전부, "국가안전시스템 개편 종합대책", 2023.1.27.
- Alairaji, R. A., Aljazaery, I. A., ALRikabi, H. S., "Abnormal behavior detection of students in the examination hall from surveillance videos," In Advanced Computational Paradigms and Hybrid Intelligent Computing, 2021, pp.113-125.
- Chang, C. W., Chang, C. Y., and Lin, Y. Y., "A hybrid CNN and LSTM-based deep learning model for abnormal behavior detection," Multimedia Tools and Applications, Vol. 81, No. 9, 2022, pp.11825-11843. https://doi.org/10.1007/s11042-021-11887-9
- Chen, W., Ma, K. T., Yew, Z. J., Hur, M., and Khoo, D. A., "TEVAD: Improved video anomaly detection with captions," IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 5548-5558.
- Devlin, J., Chang, M. W., Lee, K., and Toutanova, K., "Bert: Pre-training of deep bidirectional transformers for language understanding," Proceedings of NAACL-HLT, 2019, pp.4171-4186.
- Dilawari, A., Khan, M. U. G., Al-Otaibi, Y. D., Rehman, Z. U., Rahman, A. U., and Nam, Y. "Natural language description of videos for smart surveillance," Applied Sciences, Vol. 11, No. 9, 2021, pp.3730-3741. https://doi.org/10.3390/app11093730
- Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., and Unterthiner, T, "Transformers for image recognition at scale," The International Conference on Learning Representations, 2021, arXiv:2010.11929.
- Duan, J., Yu, S., Tan, N., Yi, L., and Tan, C, "BOSS: A Benchmark for Human Belief Prediction in Object-context Scenarios," 2022, arXiv:2206.10665.
- Graves, A., Fernandez, S., and Schmidhuber, J., "Bidirectional LSTM networks for improved phoneme classification and recognition," International conference on artificial neural networks, 2005, pp. 799-804.
- He, K., Zhang, X., Ren, S., and Sun, J., "Deep residual learning for image recognition," IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
- Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wand, L., Chen, W., "LoRA : Low-rank adaptation of large language models," 2021, arXiv: 2106.09685v2.
- Jha, S., Seo, C., Yang, E., and Joshi, G. P., "Real time object detection and tracking system for video surveillance system," Multimedia Tools and Applications, Vol. 80, 2021, pp.3981-3996. https://doi.org/10.1007/s11042-020-09749-x
- Kingma, D. P., and Ba, J., "Adam: A method for stochastic optimization," International Conference on Learning Representations, 2015, arXiv:1412.6980.
- Li, J., Li, D., Savarese, S., and Hoi, S., "Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models," 2023, 10.48550/arXiv.2301.12597.
- Lin, K., Li, L., Lin, C. C., Ahmed, F., Gan, Z., Liu, Z., Lu, Y., and Wang, L., "Swinbert: End-to-end transformers with sparse attention for video captioning," IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 17949-17958.
- Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Zitnick, C. L., and Dollar, P., "Microsoft coco: Common objects in context," European Conference on Computer Vision, 2014, pp. 740-755.
- OpenAI. Gpt-4 technical report, 2023.
- Perez, M., Kot, A. C., and Rocha, A., "Detection of real-world fights in surveillance videos," IEEE International Conference on Acoustics, Speech and Signal Processing, 2019, pp. 2662-2666.
- Radford, A., Narasimhan, K., Salimans, T., and Sutskever, I., "Improving language understanding by generative pre-training," 2018.
- Schuhmann, C., Beaumont, R., Vencu, R., Gordon, C., Wightman, R., Cherti, M., Coombes, T., Katta, A., Mullis, C., Wortsman, M., Schramowski, P., Kundurthy, S., Crowson, K., Schmidt, L., Kaczmarczyk, R., and Jitsev, J., "LAION-5B: An open large-scale dataset for training next generation image-text models," Advances in Neural Information Processing Systems, Vol. 35, 2022, pp.25278-25294.
- Simonyan, K., and Zisserman, A., "Very deep convolutional networks for large-scale image recognition," 3rd International Conference on Learning Representations, 2015, pp. 1-14.
- Sultani, W., Chen, C., and Shah, M., "Real-world anomaly detection in surveillance videos," IEEE conference on computer vision and pattern recognition, 2018. pp. 6479-6488.
- Vaswani, A., Shazeer, N., Parmer N., Uszkoreit, J., Jones, L., Gomes, A.N., Kaiser, L., Polosukhin, L., "Attention is all you need," Advances in neural information processing systems, Vol. 30, 2017.
- Wu, P., Liu, J., Shi, Y., Sun, Y., Shao, F., Wu, Z., and Yang, Z., "Not only look, but also listen: Learning multimodal violence detection under weak supervision," 16th European Conference on Computer Vision, 2020, pp. 322-339.
- Zhang, S., Roller, S., Goyal, N., Artetxe, M., Chen, M., Chen, S., Dewan, C., Diab, M., Li, X., Lin, X. V., Mihaylov, T., Ott, M., Shleifer, S., Shuster, K., Simig, D., Koura, P, S., Sridhar, A., Wang, T., and Zettlemoyer, L., "OPT: Open pre-trained transformer language models," 2022, arXiv:2205.01068v4.
- Wang, P., Yang, A., Men, R., Lin, J., Bai, S., Li, Z., Ma, J., Zhou, C., Zhouand, J., and Yang, H, "OFA: Unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework," International Conference on Machine Learning, 2022, pp. 23318-23340.