Virtual Fitting System Using Deep Learning Methodology: HR-VITON Based on Weight Sharing, Mixed Precison & Gradient Accumulation

Lee, Hyun Sang;Oh, Se Hwan;Ha, Sung Ho;

doi:10.5859/KAIS.2022.31.4.145

The Journal of Information Systems (한국정보시스템학회지:정보시스템연구)

Volume 31 Issue 4
/
Pages.145-160
/
2022
/
1229-8476(pISSN)
/
2733-8770(eISSN)

Korea Association of Information Systems (한국정보시스템학회)

DOI QR Code

Virtual Fitting System Using Deep Learning Methodology: HR-VITON Based on Weight Sharing, Mixed Precison & Gradient Accumulation

딥러닝 의류 가상 합성 모델 연구: 가중치 공유 & 학습 최적화 기반 HR-VITON 기법 활용

이현상 (경북대학교 경영학부 ) ;
오세환 (경북대학교 경영학부) ;
하성호 (경북대학교 경영학부 )

Received : 2022.11.04
Accepted : 2022.12.21
Published : 2022.12.31

https://doi.org/10.5859/KAIS.2022.31.4.145 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Purpose The purpose of this study is to develop a virtual try-on deep learning model that can efficiently learn front and back clothes images. It is expected that the application of virtual try-on clothing service in the fashion and textile industry field will be vitalization. Design/methodology/approach The data used in this study used 232,355 clothes and product images. The image data input to the model is divided into 5 categories: original clothing image and wearer image, clothing segmentation, wearer's body Densepose heatmap, wearer's clothing-agnosting. We advanced the HR-VITON model in the way of Mixed-Precison, Gradient Accumulation, and sharing model weights. Findings As a result of this study, we demonstrated that the weight-shared MP-GA HR-VITON model can efficiently learn front and back fashion images. As a result, this proposed model quantitatively improves the quality of the generated image compared to the existing technique, and natural fitting is possible in both front and back images. SSIM was 0.8385 and 0.9204 in CP-VTON and the proposed model, LPIPS 0.2133 and 0.0642, FID 74.5421 and 11.8463, and KID 0.064 and 0.006. Using the deep learning model of this study, it is possible to naturally fit one color clothes, but when there are complex pictures and logos as shown in <Figure 6>, an unnatural pattern occurred in the generated image. If it is advanced based on the transformer, this problem may also be improved.

Keywords

Acknowledgement

이 논문은 2022년도 산업통상자원부 산업혁신기반구축사업 재원으로 수행된 연구임(P114000015).

References

매일신문, "다이텍연구원, '비대면 섬유소재 마케팅 플랫폼' 만든다", 신중언 기고, 2020.12.29.
신은경, 김은미, 홍태호, "Som과 Lstm을 활용한 지역기반의 부동산 가격 예측", 정보시스템연구, 제30권, 2호, 2021, pp. 147-163.
원종관, 홍태호, 배경일, "신용 데이터의 이미지 변환을 활용한 합성곱 신경망과 설명가 능한 인공지능(XAI)을 이용한 개인신용평가", 정보시스템연구, 제30권, 4호, 2021, pp. 203-226.
조보근, 박경배, 하성호, "기계학습 알고리즘을 활용한 지역별 아파트 실거래가격지수 예측모델 비교: Lime 해석력 검증", 정보시스템연구, 제 29, 3권, 2020, pp. 119-144.
AIHub., "패션상품 및 착용 영상," https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=78, 2020.
Ayush., C., "Project Clothes Swap," 2020 Adobe Summit Sneaks Session, https://business.adobe.com/summit/2020/clothes-swap-summit-sneak.html, 2020.
Cao, Z., Tomas, S., Shih-En, W., and Yaser, Sheikh., "Realtime Multi-Person 2d Pose Estimation Using Part Affinity Fields," Paper Presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
Dosovitskiy, A., Lucas, B., Alexander, K., Dirk, W., Xiaohua, Z., Thomas, U., and Mostafa, D., "An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale," ArXiv Preprint ArXiv:2010.11929, 2020.
Gong, K., Xiaodan, L., Dongyu, Z., Xiaohui, S., and Liang, L., "Look into Person: Self-Supervised Structure-Sensitive Learning and a New Benchmark for Human Parsing," Paper Presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
Guler, R, A., Natalia, N., and Iasonas, K., "Densepose: Dense Human Pose Estimation in the Wild," Paper Presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
Han, X., Zuxuan, W., Zhe, W., Ruichi, Y., and Larry, S, D., "Viton: An Image-Based Virtual Try-on Network," Paper Presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
Lee, S., Gyojung G., Sunghyun P., Seunghwan C., and Jaegul C., "High-Resolution Virtual Try-on with Misalignment and Occlusion-Handled Conditions," Paper Presented at the European Conference on Computer Vision, Github: https://github.com/sangyun884/hr-viton, 2022.
Lin, C., Zhao L., Sheng, Z., Shichang, H., Jialun, Z., Linhao, L., Jiarun, Z., Longtao, H., and Yuan, H., "Rmgn: A Regional Mask Guided Network for Parser-Free Virtual Try-On," ArXiv Preprint ArXiv:2204.11258, 2022.
Lyu, Q., Qiu-Feng, W., and Kaizhu, H., "High-Resolution Virtual Try-on Network with Coarse-to-Fine Strategy," Paper Presented at the Journal of Physics: Conference Series, 2021.
Micikevicius, P., Sharan, N., Jonah, Alben., Gregory, D., Erich, E., David, G., and Boris, Ginsburg., "Mixed Precision Training," ArXiv Preprint ArXiv: 1710.03740, 2017.
Ruder, S., "An Overview of Gradient Descent Optimization Algorithms," ArXiv Preprint ArXiv:1609.04747, 2016.
Vaswani, A., Noam, S., Niki, P., Jakob, U., Llion, J., Aidan, N, G., Lukasz, K., and Illia P., "Attention Is All You Need," Paper Presented at the Advances in Neural Information Processing Systems, 2017.
Wang, B., Huabin, Z., Xiaodan, L., Yimin, C., Liang, L., and Meng, Y., "Toward Characteristic-Preserving Image-Based Virtual Try-on Network," Paper Presented at the Proceedings of the European Conference on Computer Vision (ECCV), 2018.
Wang, W., Hangbo, B., Li, D., Johan, B., Zhiliang, P., Qiang, L., and Kriti, A., "Image as a Foreign Language: Beit Pretraining for All Vision and Vision-Language Tasks," ArXiv Preprint ArXiv:2208.10442, 2022
Wang, Z., Eero, P, S., and Alan, C, B., "Multiscale Structural Similarity for Image Quality Assessment," Paper Presented at the The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.
Yang, H., Ruimao, Z., Xiaobao, G., Wei, L., Wangmeng, Z., and Ping, L., "Towards Photo-Realistic Virtual Try-on by Adaptively Generating-Preserving Image Content," Paper Presented at the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
Zhang, H., Feng, L., Shilong, L., Lei, Z., Hang, S., Jun, Z., Lionel, M, N., and Heung-Yeung, S., "Dino: Detr with Improved Denoising Anchor Boxes for End-to-End Object Detection," ArXiv Preprint ArXiv:2203.03605, 2022.
Zhang, R., Phillip, I., Alexei, A, E., Eli, S., and Oliver, Wang., "The Unreasonable Effectiveness of Deep Features as a Perceptual Metric," Paper Presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.

The Journal of Information Systems (한국정보시스템학회지:정보시스템연구)

Virtual Fitting System Using Deep Learning Methodology: HR-VITON Based on Weight Sharing, Mixed Precison & Gradient Accumulation

딥러닝 의류 가상 합성 모델 연구: 가중치 공유 & 학습 최적화 기반 HR-VITON 기법 활용

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)