[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.13088/jiis.2022.28.2.101

Anomaly Detection Methodology Based on Multimodal Deep Learning

Lee, DongHoon (Graduate School of Business IT, Kookmin University)
Kim, Namgyu (Graduate School of Business IT, Kookmin University)

Publication Information

Journal of Intelligence and Information Systems / v.28, no.2, 2022 , pp. 101-125 More about this Journal

Abstract

Recently, with the development of computing technology and the improvement of the cloud environment, deep learning technology has developed, and attempts to apply deep learning to various fields are increasing. A typical example is anomaly detection, which is a technique for identifying values or patterns that deviate from normal data. Among the representative types of anomaly detection, it is very difficult to detect a contextual anomaly that requires understanding of the overall situation. In general, detection of anomalies in image data is performed using a pre-trained model trained on large data. However, since this pre-trained model was created by focusing on object classification of images, there is a limit to be applied to anomaly detection that needs to understand complex situations created by various objects. Therefore, in this study, we newly propose a two-step pre-trained model for detecting abnormal situation. Our methodology performs additional learning from image captioning to understand not only mere objects but also the complicated situation created by them. Specifically, the proposed methodology transfers knowledge of the pre-trained model that has learned object classification with ImageNet data to the image captioning model, and uses the caption that describes the situation represented by the image. Afterwards, the weight obtained by learning the situational characteristics through images and captions is extracted and fine-tuning is performed to generate an anomaly detection model. To evaluate the performance of the proposed methodology, an anomaly detection experiment was performed on 400 situational images and the experimental results showed that the proposed methodology was superior in terms of anomaly detection accuracy and F1-score compared to the existing traditional pre-trained model.

최근 컴퓨팅 기술의 발전과 클라우드 환경의 개선에 따라 딥 러닝 기술이 발전하게 되었으며, 다양한 분야에 딥 러닝을 적용하려는 시도가 많아지고 있다. 대표적인 예로 정상적인 데이터에서 벗어나는 값이나 패턴을 식별하는 기법인 이상 탐지가 있으며, 이상 탐지의 대표적 유형인 점 이상, 집단적 이상, 맥락적 이중 특히 전반적인 상황을 파악해야 하는 맥락적 이상을 탐지하는 것은 매우 어려운 것으로 알려져 있다. 일반적으로 이미지 데이터의 이상 상황 탐지는 대용량 데이터로 학습된 사전학습 모델을 사용하여 이루어진다. 하지만 이러한 사전학습 모델은 이미지의 객체 클래스 분류에 초점을 두어 생성되었기 때문에, 다양한 객체들이 만들어내는 복잡한 상황을 탐지해야 하는 이상 상황 탐지에 그대로 적용되기에는 한계가 있다. 이에 본 연구에서는 객체 클래스 분류를 학습한 사전학습 모델을 기반으로 이미지 캡셔닝 학습을 추가적으로 수행하여, 객체 파악뿐만 아니라 객체들이 만들어내는 상황까지 이해해야 하는 이상 상황 탐지에 적절한 2 단계 사전학습 모델 구축 방법론을 제안한다. 구체적으로 제안 방법론은 ImageNet 데이터로 클래스 분류를 학습한 사전학습 모델을 이미지 캡셔닝 모델에 전이하고, 이미지가 나타내는 상황을 설명한 캡션을 입력 데이터로 사용하여 학습을 진행한다. 이후 이미지와 캡션을 통해 상황 특질을 학습한 가중치를 추출하고 이에 대한 미세 조정을 수행하여 이상 상황 탐지 모델을 생성한다. 제안 방법론의 성능을 평가하기 위해 직접 구축한 데이터 셋인 상황 이미지 400장에 대해 이상 탐지 실험을 수행하였으며, 실험 결과 제안 방법론이 기존의 단순 사전학습 모델에 비해 이상 상황 탐지 정확도와 F1-score 측면에서 우수한 성능을 나타냄을 확인하였다.

Keywords

Object Recognition; Deep Learning; Multi-modal; Image Captioning; Anomaly Detection;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	Claudio, D. S., C. Sansone, and M. Vento, "To reject or not to reject: that is the question-an answer in case of neural classifiers," IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 30.1 (2000): 84-94. DOI
2	Fu, K., D. Cheng, Y. Tu, and L. Zhang, "Credit card fraud detection using convolutional neural networks," International conference on neural information processing. Springer, Cham, 2016.
3	Xie, X., C. Wang, S. Chen, G. Shi, and Z. Zhao, "Real-time illegal parking detection system based on deep learning," Proceedings of the 2017 International Conference on Deep Learning Technologies. 2017.
4	Shvetsova, N., B. Bakker, I. Fedulova, H. Schulz and D. Dylov, "Anomaly detection in medical imaging with deep perceptual autoencoders," IEEE Access 9 (2021): 118571-118583. DOI
5	Szegedy, C., V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the inception architecture for computer vision," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016).
6	Xu, G., S. Niu, M. Tan, Y. Luo, Q. Du, and Q. Wu, "Towards accurate text-based image captioning with content diversity exploration," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
7	Szegedy, C., W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going deeper with convolutions," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015).
8	Tao. Y., X. Xiao, and S. Zhou. "Mining distance-based outliers from large databases in any metric space," Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, 2006.
9	Van, N. T., T. N. Thinh, and L. T. Sach, "An anomaly-based network intrusion detection system using deep learning." 2017 international conference on system science and engineering (ICSSE). IEEE, 2017.
10	Reiss, T., N. Cohen, L. Bergman, and Y. Hoshen, "Panda: Adapting pretrained features for anomaly detection and segmentation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
11	Chandola, V., A. Banerjee, and V. Kumar, "Anomaly detection: A survey," ACM computing surveys (CSUR) 41.3 (2009): 1-58. DOI
12	Alexey D., L. Beyer, A. Kolesnikov, D. Weissemborn, X. Zhai, T. Unterthiner, M. Dehgani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, "An image is worth 16x16 words: Transformers for image recognition at scale," arXiv preprint arXiv:2010.11929 (2020).
13	Ashfaq, R. A. R., W. Z. Wang, J. Z. Huang, H. Abbas, and Y. L. He, "Fuzziness based semi-supervised learning approach for intrusion detection system," Information sciences 378 (2017): 484-497. DOI
14	He, K., X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016).
15	Christodorescu, M., S. Jha, S. A. Seshia, D. Song, and R. E. Bryant, "Semantics-aware malware detection," 2005 IEEE symposium on security and privacy (S&P'05). IEEE, 2005.
16	Bergmann, P., M. Fauser, D. Sattlegger, and C. Steger, "Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.
17	Bochkovskiy, A., C. Y. Wang, and H. Y. M. Liao, "YOLOv4: Optimal speed and accuracy of object detection," arXiv preprint arXiv:2004.10934 (2020).
18	Chalapathy, R. and S. Chawla, "Deep learning for anomaly detection: A survey," arXiv preprint arXiv:1901.03407 (2019).
19	Chensi C., F. Li, H. Tan, D. Song, W. Shu, W. Li, Y. Zhou, X. Bo and Z. Xie, "Deep learning and its applications in biomedicine," Genomics, proteomics & bioinformatics 16.1 (2018): 17-32. DOI
20	Cohen, N., R. Abutbul, and Y. Hoshen, "Out-of-Distribution Detection without Class Labels," arXiv preprint arXiv:2112.07662 (2021).
21	Fernando, T., S. Denman, D.. Ahmedt-Aristizabal, S. Sridharan, K. R. Laurens, P.Johnston, and C. Fookes, "Neural memory plasticity for medical anomaly detection," Neural Networks 127 (2020): 67-81. DOI
22	Ge, Z., S. Liu, F. Wang, Z. Li, and J. Sun, "YOLOX: Exceeding YOLO series in 2021," arXiv preprint arXiv:2107.08430 (2021).
23	Girshick, R., J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014).
24	Gornitz, N., M. Kloft, K. Rieck, and U. Brefeld, "Toward supervised anomaly detection," Journal of Artificial Intelligence Research 46 (2013): 235-262. DOI
25	Hochreiter, S. and J. Schmidhuber, "Long short-term memory," Neural computation 9.8 (1997): 1735-1780. DOI
26	Howard, A. G., M. Zhu, B. Chen, D. Kalenichenk o, W. Wang, T. Weyand, M. Andreetto, and H. Adam, "Mobilenets: Efficient convolutional neural networks for mobile vision applications," arXiv preprint arXiv:1704.04861. (2017).
27	Jain, A. K. and R. C. Dubes, "Algorithms for clustering data. Prentice-Hall," Inc., 1988.
28	Pierre, B. and K. Hornik, "Neural Networks and Principal Component Analysis: Learning from Examples Without Local Minima," Neural Networks, Vol.2, (1989), 53~58. DOI
29	Seonwoo M., L. Byunghan and Y. Sungroh, "Deep learning in bioinformatics," Briefings in bioinformatics 18.5 (2017): 851-869. DOI
30	Hawkins, D. M., "Identification of outliers," Vol. 11. London: Chapman and Hall, 1980.
31	Simonyan, K. and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556 (2014).
32	Kiran, B. R., D. M. Thomas, and R. Parakkal, "An overview of deep learning based methods for unsupervised and semi-supervised anomaly detection in videos," Journal of Imaging 4.2 (2018): 36. DOI
33	Cohen, M. J. and S. Avidan, "Transformaly--Two (Feature Spaces) Are Better Than One," arXiv preprint arXiv:2112.04185 (2021).
34	Di Biase, G., H. Blum, R. Siegwart, and C. Cadena, "Pixel-wise anomaly detection in complex driving scenes," Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.
35	Dosovitskiy, A., L. Beyer, A. Kolensnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, "An image is worth 16x16 words: Transformers for image recognition at scale," arXiv preprint arXiv:2010.11929 (2020).
36	Jena, B., G. K. Nayak, and S. Saxena, "Convolutional neural network and its pretrained models for image classification and object detection: A survey," Concurrency and Computation: Practice and Experience 34.6 (2022): e6767.
37	Ji, X., J. F. Henriques, and A. Vedaldi, "Invariant information clustering for unsupervised image classification and segmentation," Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.
38	Lecun, Y., B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, "Backpropagation applied to handwritten zip code recognition," Neural computation 1.4 (1989): 541-551. DOI
39	Li, W., V. Mahadevan, and N. Vasconcelos, "Anomaly detection and localization in crowded scenes." IEEE transactions on pattern analysis and machine intelligence 36.1 (2013): 18-32. DOI
40	Johnson, J., A. Karpathy, and L. Fei-Fei, "Densecap: Fully convolutional localization networks for dense captioning," Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
41	Redmon, J. and A. Farhadi, "YOLOv3: An incremental improvement," arXiv preprint arXiv:1804.02767 (2018).
42	Liu, Y., S. Garg, J. Nie, Y, Zhang, Z. Xiong, J. Kang, and M. S. Hossain, "Deep anomaly detection for time-series data in industrial IoT: A communication-efficient on-device federated learning approach," IEEE Internet of Things Journal 8.8 (2020): 6348-6358.
43	Logothetis, N. K. and D. L. Sheinberg, "Visual object recognition," Annual review of neuroscience 19.1 (1996): 577-621. DOI
44	Mitchell, W., G. Ilharco, S. Y. Gadre, R. Roelofs, R. Gontijo-Lopes, A.S. Morcos, H. Namkoong, A. Farhadi, Y. Carmon, S. Kornblith, and L. Schmidt, "Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time," arXiv pre print arXiv:2203.05482 (2022).
45	Redmon, J., S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
46	Lin, T. Y., M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick, "Microsoft coco: Common objects in context," European conference on computer vision. Springer, Cham, 2014.
47	Ren, S., K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards real-time object detection with region proposal networks," Advances in Neural Information Processing Systems (2015).
48	Tao, X., D. Zhang, W. Ma, X. Liu, and D. Xu, "Automatic metallic surface defect detection and recognition with convolutional neural networks," Applied Sciences 8.9 (2018): 1575. DOI
49	Vicente, S., J. Carreira, L. Agapito, and J. Batista, "Reconstructing pascal voc," Proceedings of the IEEE conference on computer vision and pattern recognition. 2014.
50	Xu, K., J. L. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. S. Zemel, and Y. Bengio, "Show, attend and tell: Neural image caption generation with visual attention," In International conference on machine learning, 2015, (pp. 2048-2057). PMLR.
51	Ghosh, S. and D. L. Reilly, "Credit card fraud detection with a neural-network," System Sciences, 1994. Proceedings of the Twenty-Seventh Hawaii International Conference on. Vol. 3. IEEE, 1994.
52	Russakovsky, O., J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, "Imagenet large scale visual recognition challenge," International journal of computer vision 115.3 (2015): 211-252. DOI
53	Shen, A., R. Tong, and Y. Deng, "Application of classification models on credit card fraud detection," 2007 International conference on service systems and service management. IEEE, 2007.
54	Gaus, Y. F. A., N. Bhowmik, S. Akcay, P. M. Guillen-Garcia, J. W. Barker, and T. P. Breckon, "Evaluation of a dual convolutional neural network architecture for object-wise anomaly detection in cluttered X-ray security imagery," 2019 international joint conference on neural networks (IJCNN). IEEE, 2019.
55	Girshick, R., "Fast R-CNN," Proceedings of the IEEE International Conference on Computer Vision (2015).
56	Goldstein, M. and S. Uchida, "A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data," PloS one 11.4 (2016): e0152173. DOI

KSCI

Anomaly Detection Methodology Based on Multimodal Deep Learning 멀티모달 딥 러닝 기반 이상 상황 탐지 방법론

Anomaly Detection Methodology Based on Multimodal Deep Learning