[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.13088/jiis.2022.28.4.329

MF sampler: Sampling method for improving the performance of a video based fashion retrieval model

Baek, Sanghun (Graduate School of Kookmin University)
Park, Jonghyuk (College of Business Administration, Kookmin University)

Publication Information

Journal of Intelligence and Information Systems / v.28, no.4, 2022 , pp. 329-346 More about this Journal

Abstract

Recently, as the market for short form videos (Instagram, TikTok, YouTube) on social media has gradually increased, research using them is actively being conducted in the artificial intelligence field. A representative research field is Video to Shop, which detects fashion products in videos and searches for product images. In such a video-based artificial intelligence model, product features are extracted using convolution operations. However, due to the limitation of computational resources, extracting features using all the frames in the video is practically impossible. For this reason, existing studies have improved the model's performance by sampling only a part of the entire frame or developing a sampling method using the subject's characteristics. In the existing Video to Shop study, when sampling frames, some frames are randomly sampled or sampled at even intervals. However, this sampling method degrades the performance of the fashion product search model while sampling noise frames where the product does not exist. Therefore, this paper proposes a sampling method MF (Missing Fashion items on frame) sampler that removes noise frames and improves the performance of the search model. MF sampler has improved the problem of resource limitations by developing a keyframe mechanism. In addition, the performance of the search model is improved through noise frame removal using the noise detection model. As a result of the experiment, it was confirmed that the proposed method improves the model's performance and helps the model training to be effective.

Keywords

Video to shop; Fashion retrieval; Sampling; Noise detection; Artificial intelligence;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	Delmas, G., de Rezende, R. S., Csurka, G., & Larlus, D. (2022). ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity. arXiv preprint arXiv:2203.08101.
2	Gajic, B., & Baldrich, R. (2018). Cross-domain fashion image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 1869-1871).
3	Godi, M., Joppi, C., Skenderi, G., & Cristani, M. (2022). MovingFashion: a Benchmark for the Video-to-Shop Challenge. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 1678-1686).
4	Hadi Kiapour, M., Han, X., Lazebnik, S., Berg, A. C., & Berg, T. L. (2015). Where to buy it: Matching street clothing photos in online shops. In Proceedings of the IEEE international conference on computer vision (pp. 3343-3351).
5	Gu, X., Gao, F., Tan, M., & Peng, P. (2020). Fashion analysis and understanding with artificial intelligence. Information Processing & Management, 57(5), 102276. DOI
6	Simonyan, K., & Zisserman, A. (2015). Two-stream convolutional networks for action recognition. In Proceedings of the Neural Information Processing Systems (NIPS).
7	서보윤, 정은경. (2021). 숏폼 동영상 콘텐츠 표현 전략 연구: 국내외 SPA 브랜드를 중심으로 글로벌문화콘텐츠 제49호, 127-147.
8	장성민, 이정우, 박종혁. (2022). 위조번호판 부착 차량 출입 방지를 위한 인공지능 기반의 주차관제시스템 개선 방안. 지능정보연구, 28(2), 57-74. DOI
9	Ge, Y., Zhang, R., Wang, X., Tang, X., & Luo, P. (2019). Deepfashion2: A versatile benchmark for detection, pose estimation, segmentation and re-identification of clothing images. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5337-5345).
10	윤혁, 김영기, 한지형. (2021). 효율적인 모델 학습을 위한 심층 특징의 평균값을 활용한 의미 있는 비디오 프레임 추출 기법. 한국방송 미디어공학회 2021년도 하계학술대회, 318-321.
11	이동훈, & 김남규. (2022). 멀티모달 딥 러닝 기반 이상 상황 탐지 방법론. 지능정보연구, 28(2), 101-125. DOI
12	Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., & Paluri, M. (2018). A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition(pp. 6450-6459).
13	Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., & Van Gool, L. (2016). Temporal segment networks: Towards good practices for deep action recognition. In European conference on computer vision (pp. 20-36). Springer, Cham.
14	Wang, X., Girshick, R., Gupta, A., & He, K. (2018). Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7794-7803).
15	Wang, Y. (2021, August). Clothing Attribute Recognition with Semi-supervised Learning. In 2021 IEEE International Conference on Electronic Technology, Communication and Information (ICETCI) (pp. 507-511). IEEE.
16	Yang, X., Ma, Y., Liao, L., Wang, M., & Chua, T. S. (2019, July). TransNFCM: Translation-based neural fashion compatibility modeling. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, No. 01, pp. 403-410).
17	Zhi, Y., Tong, Z., Wang, L., & Wu, G. (2021). Mgsampler: An explainable sampling strategy for video action recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 1513-1522).
18	Zhu, X., Xiong, Y., Dai, J., Yuan, L., & Wei, Y. (2017). Deep feature flow for video recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2349-2358).
19	김윤하, & 김남규. (2022). 오토인코더 기반 심층지도 네트워크를 활용한 계층형 데이터 분류 방법론. 지능정보연구, 28(3), 185-207. DOI
20	구영현, 유성준. (2020). 인공지능 학습용 패션 데이터셋 최근 동향 조사. 한국방송미디어공학회 학술발표대회 논문집, 515-520.
21	최여여, 김영재. (2020). 숏폼(Shortform) 동영상 기반 브랜드의 전략적 경험 디자인 - 李子柒(이자칠)의 미식(美食) 동영상을 중심으로. 한국콘텐츠학회논문지 v.20 no.7, pp.185 DOI
22	Kang, W. C., Fang, C., Wang, Z., & McAuley, J. (2017). Visually-aware fashion recommendation and design with generative image models. In 2017 IEEE international conference on data mining (ICDM)(pp. 207-216). IEEE.
23	Korbar, B., Tran, D., & Torresani, L. (2019). Scsampler: Sampling salient clips from video for efficient action recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 6232-6242).
24	Li, P., Li, Y., Jiang, X., & Zhen, X. (2019). Two-stream multi-task network for fashion recognition. In 2019 IEEE international conference on image processing (ICIP) (pp. 3038-3042). IEEE.
25	Liu, Z., Yan, S., Luo, P., Wang, X., & Tang, X. (2016, October). Fashion landmark detection in the wild. In European Conference on Computer Vision (pp. 229-245). Springer, Cham.
26	Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg& Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3), 211-252. DOI
27	Mingcheng Zhu,Rongchuan ZhangHaizhou Wang (2022) "Recognizing irrelevant faces in short-form videos based on feature fusion and active learning"
28	Bruno Korbar, Tran, D., & Torresani, L. (2019). Scsampler: Sampling salient clips from video for efficient action recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 6232-6242).
29	Cheng, Z. Q., Wu, X., Liu, Y., & Hua, X. S. (2017). Video2shop: Exact matching clothes in videos to online shopping images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4048-4056).
30	Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28.
31	Vo, N., Jiang, L., Sun, C., Murphy, K., Li, L. J., Fei-Fei, L., & Hays, J. (2019). Composing text and image for image retrieval-an empirical odyssey. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6439-6448).
32	Bhattacharya, G., Kilari, N., Gubbi, J., Lakshmi, V. B., & Balamuralidhar, P. (2021, July). F-AttNet: Towards Multi-scale Feature Fusion for Fashion Attribute Prediction. In 2021 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE.
33	Chen, K., Wang, J., Yang, S., Zhang, X., Xiong, Y., Loy, C. C., & Lin, D. (2018). Optimizing video object detection via a scale-time lattice. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7814-7823).
34	Liu, S., Feng, J., Song, Z., Zhang, T., Lu, H., Xu, C., & Yan, S. (2012, October). Hi, magic closet, tell me what to wear!. In Proceedings of the 20th ACM international conference on Multimedia (pp. 619-628).
35	Corbiere, C., Ben-Younes, H., Rame, A., & Ollion, C. (2017). Leveraging weakly annotated data for fashion image retrieval and label prediction. In Proceedings of the IEEE international conference on computer vision workshops (pp. 2268-2274).
36	Gao, R., Oh, T. H., Grauman, K., & Torresani, L. (2020). Listen to look: Action recognition by previewing audio. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 10457-10467).
37	Kinli, F., Ozcan, B., & Kirac, F. (2019). Fashion image retrieval with capsule networks. In Proceedings of the IEEE/CVF international conference on computer vision workshops (pp. 0-0).
38	Park, S., Shin, M., Ham, S., Choe, S., & Kang, Y. (2019). Study on fashion image retrieval methods for efficient fashion visual search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (pp. 0-0).
39	Wang, L., Shi, J., Song, G., & Shen, I. F. (2007). Object detection combining recognition and segmentation. In Asian conference on computer vision (pp. 189-199). Springer, Berlin, Heidelberg.
40	Wang, W., Xu, Y., Shen, J., & Zhu, S. C. (2018). Attentive fashion grammar network for fashion landmark detection and clothing category classification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4271-4280).
41	Zhu, T., Wang, Y., Li, H., Wu, Y., He, X., & Zhou, B. (2020). Multimodal joint attribute prediction and value extraction for e-commerce product. arXiv preprint arXiv:2009.07162.
42	Chen, L., Chou, H., Xia, Y., & Miyake, H. (2021). Multimodal Item Categorization Fully Based on Transformer. In Proceedings of The 4th Workshop on e-Commerce and NLP (pp. 111-115).
43	Cheng, W. H., Song, S., Chen, C. Y., Hidayati, S. C., & Liu, J. (2021). Fashion meets computer vision: A survey. ACM Computing Surveys (CSUR), 54(4), 1-41.
44	Cong Qi, Jiayi Lyu (2022). Applications of artificial intelligence inchildren and elderly care and shortvideo industries: cases from Cubo Ai and Tiktok. In International Conference on Computer Application and Information Security (ICCAIS 2021) (Vol. 12260, pp. 501-505). SPIE.
45	Zhao, H., Yu, J., Li, Y., Wang, D., Liu, J., Yang, H., & Wu, F. (2021). Dress like an internet celebrity: Fashion retrieval in videos. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence (pp. 1054-1060).

KSCI

MF sampler: Sampling method for improving the performance of a video based fashion retrieval model MF sampler: 동영상 기반 패션 검색 모델의 성능 향상을 위한 샘플링 방법

MF sampler: Sampling method for improving the performance of a video based fashion retrieval model