[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.9717/kmms.2020.23.2.186

Keypoints-Based 2D Virtual Try-on Network System

Pham, Duy Lai (Dept. of Information and Telecommunication Eng., Graduate School, Soognsil University)
Ngyuen, Nhat Tan (Dept. of Intelligent Systems, Graduate School, Soognsil University)
Chung, Sun-Tae (Dept. of Smart System Software, Soongsil University)

Publication Information

Journal of Korea Multimedia Society / v.23, no.2, 2020 , pp. 186-203 More about this Journal

Abstract

Image-based Virtual Try-On Systems are among the most potential solution for virtual fitting which tries on a target clothes into a model person image and thus have attracted considerable research efforts. In many cases, current solutions for those fails in achieving naturally looking virtual fitted image where a target clothes is transferred into the body area of a model person of any shape and pose while keeping clothes context like texture, text, logo without distortion and artifacts. In this paper, we propose a new improved image-based virtual try-on network system based on keypoints, which we name as KP-VTON. The proposed KP-VTON first detects keypoints in the target clothes and reliably predicts keypoints in the clothes of a model person image by utilizing a dense human pose estimation. Then, through TPS transformation calculated by utilizing the keypoints as control points, the warped target clothes image, which is matched into the body area for wearing the target clothes, is obtained. Finally, a new try-on module adopting Attention U-Net is applied to handle more detailed synthesis of virtual fitted image. Extensive experiments on a well-known dataset show that the proposed KP-VTON performs better the state-of-the-art virtual try-on systems.

Keywords

Virtual Try-On; Image Synthesis; Image Warping; Human Body Parsing; Keypoints Prediction;

Citations & Related Records

Reference

1	M-J. Tak, and C-Y. Kim, , "A Study on Virtual Fitting Model System for Internet Fashion Shopping Mall," Journal of Korea Multimedia Society, Vol. 9, No.97, pp. 1184-1195, 2006.
2	FX Mirror, http://www.fxmirror.net (accessed January 9, 2020).
3	P. Isola, J.Y. Zhu, T. Zhou, and A.A. Efros, "Image-to-image Translation with Conditional Adversarial Networks," Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125-1131, 2017.
4	J.Y. Zhu, T. Park, P. Isola, and A.A. Efros, "Unpaired Image-to-image Translation Using Cycle-consistent Adversarial Networks," Proceeding of International Conference on Computer Vision, pp. 2223-2230, 2017.
5	Y. Choi, M. Choi, M. Kim, J.W. Ha, S. Kim, J. Choo, et al., "Stargan: Unified Generative Adversarial Networks for Multi-domain Image-to-image Translation," Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 8789-8795, 2018.
6	X. Han, Z. Wu, Z. Wu, R. Yu, and L.S. Davis, "VITON: An Image-based Virtual Try-on Network," arXiv Preprint arXiv:1711.08447, 2018.
7	Q. Xiao, G. Li, and Q. Chen, "Deep Inception Generative Network for Cognitive Image Inpainting," arXiv Preprint arXiv:1812.01458, 2018.
8	A. Grigorev, A. Sevastopolsky, A. Vakhitov and V. Lempitsky, "Coordinate-based Texture Inpainting for Pose-guided Human Image Generation," Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12135-12142, 2019.
9	X. Han, Z. Zhang, D. Du, M. Yang, J. Yu, P. Pan, et al., "Deep Reinforcement Learning of Volume-guided Progressive View Inpainting for 3D Point Scene Completion from a Single Depth Image," Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 234-241, 2019.
10	B. Wang, H. Zheng, X. Liang, Y. Chen, L. Lin, M. Yang, et al., "Toward Characteristic-preserving Image-based Virtual Try-on Network," arXiv Preprint arXiv:1807.07688, 2018.
11	H. Dong, X. Liang, B. Wang, H. Lai, J. Zhu, J. Yin, et al., "Towards Multi-pose Guided Virtual Try-on Network," arXiv Preprint arXiv:1902.11026, 2019.
12	Z. Cao, T. Simon, S.E. Wei, and Y. Sheikh, "Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields," Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291-7298, 2017.
13	K. Gong, X. Liang, D. Zhang, X. Shen, and L. Lin, "Look into Person: Self-supervised Structure-sensitive Learning and a New Benchmark for Human Parsing," Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 932-939, 2017.
14	S. Schaefer, T. McPhail, and J. Warren, "Image Deformation Using Moving Least Squares," ACM Transactions on Graphics, Vol. 25, No. 3, pp. 533-540, 2006. DOI
15	S. Belongie, J. Malik, and J. Puzicha, "Shape Matching and Object Recognition Using Shape Contexts," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, No. 4, pp. 509-522, 2002. DOI
16	I. Rocco, R. Arandjelovic, and J. Sivic, "Convolutional Neural Network Architecture for Geometric Matching," Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6148-6155, 2017.
17	O. Ronneberger, P. Fischer, and T. Brox, "UNet: Convolutional Networks for Biomedical Image Segmentation," Proceeding of International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234-241, 2015.
18	O. Oktay, J. Schlemper, L.L. Folgoc, M. Lee, M. Heinrich, K. Misawa, et al., "Attention U-Net: Learning Where to Look for the Pancreas," arXiv Preprint arXiv:1804.03999, 2018.
19	Y. Ge, R. Zhang, X. Wang, X. Tang, and P. Luo, "Deep Fashion 2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-identification of Clothing Images," Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5337-5344, 2019.
20	Y. Chen, Z. Wang, Y. Peng, Z. Zhang, G. Yu, J. Sun, et al., "Cascaded Pyramid Network for Multi-person Pose Estimation," Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7103-7110, 2018.
21	T.Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, "Focal Loss for Dense Object Detection," Proceeding of the IEEE International Conference on Computer Vision, pp. 2980-2987, 2017.
22	K. He, G. Gkioxari, P. Dollar, and R. Girshick, "Mask R-CNN," arXiv Preprint arXiv:1703.06870, 2017.
23	Z. Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli, "Image Quality Assessment: From Error Visibility to Structural Similarity," IEEE Transactions on Image Processing, Vol. 13, No. 4, pp. 600-612, 2004. DOI
24	R.A. Güler, N. Neverova, and I. Kokkinos, "Dense Pose: Dense Human Pose Estimation in the Wild," arXiv Preprint arXiv:1802.00434, 2018.
25	J. Johnson, A. Alahi, and L.F. Fei, "Perceptual Losses for Real-time Style Transfer and Super-resolution," arXiv Preprint arXiv:1603.08155, 2016.
26	K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-scale Image Recognition," arXiv Preprint arXiv:1409.1556, 2014.
27	J. Long, E. Shelhamer, and T. Darrell, "Fully Convolutional Networks for Semantic Segmentation," Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431-3438, 2015.
28	T. Salimans, I. Goodfellow, W. Zaremba, and V. Cheung, "Improved Techniques for Training GANs," arXiv Preprint arXiv:1606.03498, 2016.
29	D.P. Kingma and J.L. Ba, "Adam: A Method for Stochastic Optimization," Proceeding of International Conference on Learning Representations, pp. 1-15, 2015.