Browse > Article
http://dx.doi.org/10.9717/kmms.2021.24.7.880

A Study on Visual Emotion Classification using Balanced Data Augmentation  

Jeong, Chi Yoon (Human Enhancement & Assistive Technology Research Section, Artificial Intelligence Research Lab., ETRI)
Kim, Mooseop (Human Enhancement & Assistive Technology Research Section, Artificial Intelligence Research Lab., ETRI)
Publication Information
Abstract
In everyday life, recognizing people's emotions from their frames is essential and is a popular research domain in the area of computer vision. Visual emotion has a severe class imbalance in which most of the data are distributed in specific categories. The existing methods do not consider class imbalance and used accuracy as the performance metric, which is not suitable for evaluating the performance of the imbalanced dataset. Therefore, we proposed a method for recognizing visual emotion using balanced data augmentation to address the class imbalance. The proposed method generates a balanced dataset by adopting the random over-sampling and image transformation methods. Also, the proposed method uses the Focal loss as a loss function, which can mitigate the class imbalance by down weighting the well-classified samples. EfficientNet, which is the state-of-the-art method for image classification is used to recognize visual emotion. We compare the performance of the proposed method with that of conventional methods by using a public dataset. The experimental results show that the proposed method increases the F1 score by 40% compared with the method without data augmentation, mitigating class imbalance without loss of classification accuracy.
Keywords
Visual Emotion; Emotion Recognition; Data Augmentation; Class Imbalance; Deep Learning;
Citations & Related Records
연도 인용수 순위
  • Reference
1 K.J. Noh, C.Y. Jeong, J. Lim, S. Chung, G. Kim, J.M. Lim, and H. Jeong, "Multi-Path and Group-Loss-Based Network for Speech Emotion Recognition in Multi-Domain Datasets," Sensors, Vol. 21, pp. 1-18, 2021.   DOI
2 J.M. Johnson and T.M. Khoshgoftaar, "Survey on Deep Learning with Class Imbalance," Journal of Big Data, Vol. 6, pp. 1-54, 2019.   DOI
3 T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, "Focal Loss for Dense Object Detection," Proceedings of the IEEE/CVF Conference on Computer Vision, pp. 2980-2988, 2017.
4 M. Tan and Q.V. Le, "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks," Proceedings of the International Conference on Machine Learning, pp. 6105-6114, 2019.
5 A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," Proceedings of the Conference on Neural Information Processing Systems, pp. 1097-1105, 2012.
6 S. Pouyanfar, Y. Tao, A. Mohan, A.S. Kaseb, K. Gauenn, R. Dailey, and et. al., "Dynamic Sampling in Convolutional Neural Networks for Imbalanced Data Classification," Proceedings of the IEEE Conference on Multimedia Information Processing and Retrieval, pp. 112-117, 2018.
7 P. Hensman and D. Masko, The Impact of Imbalanced Training Data for Convolutional Neural Networks, Bachelor's Thesis of KTH Royal Institute of Technology, pp. 1-28, 2015.
8 K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016.
9 S. Wang, W. Liu, J. Wu, L. Cao, Q. Meng, and P.J. Kennedy, "Training Deep Neural Networks on Imbalanced Datasets," Proceedings of the International Joint Conference on Neural Networks, pp. 4368-4374, 2016.
10 S. Porcu, A. Floris, and L. Atzori, "Evaluation of Data Augmentation Techniques for Facial Expression Recognition Systems," Electronics, Vol. 9, No. 11, pp. 1-12, 2020.
11 E.D. Cubuk, B. Zoph, D. Mane, V. Vasudevna, and Q.V. Le, "AutoAugment: Learning Augmentation Strategies from Data," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 113-123, 2019.
12 K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," Proceedings of the International Conference on Learning Representations, pp. 1-14, 2015.
13 T. DeVries and G.W. Taylor, "Improved Regularization of Convolutional Neural Networks with Cutout," arXiv Preprint, arXiv: 1708.04552, pp. 1-8, 2017.
14 FLOPs calculator for neural network architecture(2020). https://github.com/tokusumi/keras-flops (accessed June 24, 2021).
15 T. Zhang, W. Zheng, Z. Cui, Y. Zong, and Y. Li, "Spatial-temporal Recurrent Neural Network for Emotion Recognition," IEEE Transactions on Cybernetics, Vol. 49, No. 3, pp. 839-847, 2019.   DOI
16 S. Dodge and L. Karam, "A Study and Comparison of Human and Deep Learning Recognition Performance under Visual Distortions," Proceedings of the International Conference on Computer Communications and Networks, pp. 1-7, 2017.
17 H. Bae and O. Kwon, "Untact Face Recognition System Based on Super-Resolution in LowResolution Images," Journal of Korea Multimedia Society, Vol. 23, No. 3, pp. 412-420, 2020.
18 J. Kim, S. Jung, and C. Sim, "A Study on Object Detection using Restructured Retina Net," Journal of Korea Multimedia Society, Vol. 23, No. 12, pp. 1531-1539, 2020.   DOI
19 R. Panda, J. Zhang, H. Li, J.-Y. Lee, X. Lu, and A. K. Roy-Chowdhury, "Contemplating Visual Emotions: Understanding and Overcoming Dataset Bias," Proceedings of the European Conference on Computer Vision, pp. 594-612, 2018.
20 W. Hua, F. Dai, L. Huang, J. Xiong, and G. Gui, "Hero: Human Emotions Recognition for Realizing Intelligent Internet of Things," IEEE Access, Vol. 7, pp. 24321-24332, 2019.   DOI
21 J. Cao, Y. Li, and Y. Tian, "Emotional Modelling and Classification of a Large-scale Collection of Scene Images in a Cluster Environment," PLos ONE, Vol. 13, No. 1, pp. 1-20, 2018.
22 Q. You, J. Luo, H. Jin, and J. Yang, "Building a Large Scale Dataset for Image Emotion Recognition: the Fine Print and the Benchmark," Proceedings of the AAAI Conference on Artificial Intelligence, pp. 308-314, 2016.
23 R. Kosti, J. M. Alvarez, A. Recasens, and A. Lapedriza, "Context Based Emotion Recognition Using EMOTIC Dataset," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 42, pp. 2755-2766, 2020.   DOI
24 L. Chang, Y. Chen, F. Li, M. Sun, and C. Yang, "Affective Image Classification Using MultiScale Emotion Factorization Features," Proceedings of the International Conference on Virtual Reality and Visualization, pp. 170-174, 2016.
25 Z. Wei, J. Zhang, Z. Lin, J.-Y. Lee, N. Balasubramanian, M. Hoai, and D. Samaras, "Learning Visual Emotion Representations from Web Data," Proceedings of the IEEE/CVF Conference on Computer Vision, pp. 13106-13115, 2020.
26 H. Lee, M. Park, and J. Kim, "Plankton Classification on Imbalanced Large Scale Database via Convolutional Neural Networks with Transfer Learning," Proceedings of the IEEE International Conference Image Processing, pp. 3713-3717, 2016.
27 T. Mittal, P. Guhan, U. Bhattacharya, R. Chandra, A. Bera, and D. Manocha, "Emoti Con: Context-Aware Multimodal Emotion Recognition using Frege's Principle," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14234-14243, 2020.