DOI QR코드

DOI QR Code

Two-Stage Deep Learning Based Algorithm for Cosmetic Object Recognition

화장품 물체 인식을 위한 Two-Stage 딥러닝 기반 알고리즘

  • Received : 2023.11.24
  • Accepted : 2023.12.13
  • Published : 2023.12.31

Abstract

With the recent surge in YouTube usage, there has been a proliferation of user-generated videos where individuals evaluate cosmetics. Consequently, many companies are increasingly utilizing evaluation videos for their product marketing and market research. However, a notable drawback is the manual classification of these product review videos incurring significant costs and time. Therefore, this paper proposes a deep learning-based cosmetics search algorithm to automate this task. The algorithm consists of two networks: One for detecting candidates in images using shape features such as circles, rectangles, etc and Another for filtering and categorizing these candidates. The reason for choosing a Two-Stage architecture over One-Stage is that, in videos containing background scenes, it is more robust to first detect cosmetic candidates before classifying them as specific objects. Although Two-Stage structures are generally known to outperform One-Stage structures in terms of model architecture, this study opts for Two-Stage to address issues related to the acquisition of training and validation data that arise when using One-Stage. Acquiring data for the algorithm that detects cosmetic candidates based on shape and the algorithm that classifies candidates into specific objects is cost-effective, ensuring the overall robustness of the algorithm.

Keywords

References

  1. Azizpour, H., Razavian, A.S., Sullivan, J., Maki, A., and Carlsson, S., Factors of transferability for a generic convnet representation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, Vol. 38, No. 9, pp. 1790-1802. https://doi.org/10.1109/TPAMI.2015.2500224
  2. Bochkovskiy, A., Wang, C.Y., and Liao, H.Y.M., Yolov4: Optimal speed and accuracy of object detection, arXiv, 2015, arXiv:2004.10934.
  3. Chosunbiz, https://biz.chosun.com/site/data/html_dir/2021/04/19/2021041901823.html
  4. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Fei-Fei, L., , Imagenet: A large-scale hierarchical image database, IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 2009, pp. 20-25.
  5. Glorot, X. and Bengio, Y., Understanding the difficulty of training deep feedforward neural networks, Proceedings of the thirteenth international conference on artificial intelligence and statistics, JMLR Workshop and Conference Proceedings, Sardinia, Italy, 2010, pp. 249-256.
  6. He, K., Zhang, X., Ren, S., and Sun, J., Deep Residual Learning for Image Recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770-778.
  7. He, K., Zhang, X., Ren, S., and Sun, J., Delving Deep Into Rectifiers: Surpassing Human-level Performance on Imagenet Classification, Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 2015, pp. 1026-1034.
  8. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H., Mobilenets: Efficient convolutional neural networks for mobile vision applications, arXiv, 2017, arXiv:1704.04861.
  9. IMAGENET, https://image-net.org/challenges/LSVRC/2015/.
  10. Khan, M. L., Social Media Engagement: What Motivates User Participation and Consumption on YouTube?, Computers in Huamn Behavior, 2017, Vol. 66, pp. 236-247. https://doi.org/10.1016/j.chb.2016.09.024
  11. Kim, J.H., Byeon, H.S., and Lee, S.H., Enhancement of User Understanding and Service Value Using Online Reviews, The Journal of Information Systems, 2011, Vol. 20, No. 2, pp. 21-36. https://doi.org/10.5859/KAIS.2011.20.2.21
  12. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S., Pytorch: An Imperative Style, High-performance Deep Learning Library, Advances in Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 8024-8035.
  13. Redmon, J. and Farhadi, A., Yolov3: An Incremental Improvement, arXiv, 2018, arXiv:1804.02767.
  14. Redmon, J., Divvala, S., Girshick, R., and Farhadi, A., You only look once: Unified, real-time object detection, 2015, arXiv:1506.02640.
  15. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L., Mobilenetv2: Inverted residuals and linear bottlenecks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, UT, USA, pp. 4510-4520.
  16. Seok, B. and Kim, H., Effects of Motivation to use YouTube Beauty Channels as One-Person Media on Channel Evaluation, Product Attitude, and Purchasing Intention, Korean Association of AD & PR, 2019, Vol. 21, No. 4, pp. 168-198.
  17. Tan, M. and Le, Q., Efficientnet: Rethinking model scaling for convolutional neural networks, International Conference on Machine Learning, Long Beach, CA, USA, 2019, pp. 6105-6114.
  18. Tan, M. and Le, Q., Efficientnetv2: Smaller Models and Faster Training, International Conference on Machine Learning, Virtual, 2021, pp. 10096-10106.
  19. Ying, C., Klein, A., Real, E., Christiansen, E., Murphy, K., and Hutter, F., Nas-bench-101: Towards Reproducible Neural Architecture Search, International Conference on Machine Learning, CA, USA, 2019, pp. 7105-7114.