[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3745/KTSDE.2022.11.11.465

Style Synthesis of Speech Videos Through Generative Adversarial Neural Networks

Choi, Hee Jo (서울과학기술대학교 IT미디어공학과)
Park, Goo Man (서울과학기술대학교 IT전자미디어공학과)

Publication Information

KIPS Transactions on Software and Data Engineering / v.11, no.11, 2022 , pp. 465-472 More about this Journal

Abstract

In this paper, the style synthesis network is trained to generate style-synthesized video through the style synthesis through training Stylegan and the video synthesis network for video synthesis. In order to improve the point that the gaze or expression does not transfer stably, 3D face restoration technology is applied to control important features such as the pose, gaze, and expression of the head using 3D face information. In addition, by training the discriminators for the dynamics, mouth shape, image, and gaze of the Head2head network, it is possible to create a stable style synthesis video that maintains more probabilities and consistency. Using the FaceForensic dataset and the MetFace dataset, it was confirmed that the performance was increased by converting one video into another video while maintaining the consistent movement of the target face, and generating natural data through video synthesis using 3D face information from the source video's face.

Keywords

Generative Adversarial Network; Video Generation; Style Transfer; Style Synthesis Network; Video Synthesis Network;

Citations & Related Records

Times Cited By KSCI : 5 (Citation Analysis)

Reference
Cited By KSCI

1	M. J. Chong, "GANs N' roses: Stable, controllable, diverse image to image translation," Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2021.
2	L. Tran and X. Liu, "On learning 3D face morphable model from in-the-wild images," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.43, No.1, pp.157-171, 2021. https://doi.org/10.1109/TPAMI.2019.2927975. DOI
3	T. C. Wang et al., "Video-to-video synthesis," Advances in Neural Information Processing Systems, 2018-December, 2018.
4	A. Rossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Niessner, "FaceForensics++: Learning to detect manipulated facial images," Proceedings of the IEEE International Conference on Computer Vision, 2019-October, 2019. https://doi.org/10.1109/ICCV.2019.00009. DOI
5	M. R. Koujan, M. C. Doukas, A. Roussos, and S. Zafeiriou, "Head2Head: Video-based neural head synthesis," Proceedings - 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2020, 2020. https://doi.org/10.1109/FG47880.2020.00048. DOI
6	M. C. Doukas, M. R. Koujan, V. Sharmanska, A. Roussos, and S. Zafeiriou, "Head2Head++: Deep facial attributes re-targeting," arXiv e-prints arXiv: 2006.10199, 2020.
7	MetFace dataset [Internet], https://github.com/NVlabs/metfaces-dataset.
8	D. Bank, N, Koenigstein, and R. Giryes, "Autoencoder," arXiv preprint arXiv:2003.05991. 2020.
9	D. P. Kingma and M. Welling, "Auto-encoding variational bayes," 2nd International Conference on Learning Representations, ICLR 2014 - Conference Track Proceedings, 2014.
10	I. J. Goodfellow et al., "Generative adversarial nets," Advances in Neural Information Processing Systems, 3(Jan.), 2014. https://doi.org/10.3156/jsoft.29.5_177_2. DOI
11	A. Radford, L. Metz, and S. Chintala, "Unsupervised representation learning with deep convolutional GANs," International Conference on Learning Representations, 2016.
12	X. Huang and S. Belongie, "Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization," Proceedings of the IEEE International Conference on Computer Vision, 2017. https://doi.org/10.1109/ICCV.2017.167. DOI
13	X. Mao, Q. Li, H. Xie, R. Y. K. Lau, Z. Wang, and S. P. Smolley, "LSGAN," Proceedings of the IEEE International Conference on Computer Vision, 2017.
14	X. Han, L. Zhang, K. Zhou, and X. Wang, "ProGAN: Protein solubility generative adversarial nets for data augmentation in DNN framework," Computers and Chemical Engineering, Vol.131, 2019. https://doi.org/10.1016/j.compchemeng.2019.106533. DOI
15	E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox, "FlowNet 2.0: Evolution of optical flow estimation with deep networks," Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017. https://doi.org/10.1109/CVPR.2017.179. DOI
16	L. Yuan, C. Ruan, H. Hu, and D. Chen, "Image inpainting based on Patch-GANs," in IEEE Access, Vol.7, pp.46411- 46421, 2019, doi: 10.1109/ACCESS.2019.2909553. DOI
17	B. J. B. Rani and L. M. E. Sumathi, "Survey on applying GAN for anomaly detection." 2020 International Conference on Computer Communication and Informatics (ICCCI), 2020. https://doi.org/10.1109/ICCCI48352.2020.9104046 DOI
18	J. Johnson, A. Alahi, and L. Fei-Fei, "Perceptual losses for real-time style transfer and super-resolution." Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9906 LNCS, 2016. https://doi.org/10.1007/978-3-319-46475-6_43. DOI
19	H. Y. Lee et al., "DRIT++: Diverse Image-to-Image Translation via Disentangled Representations," arXiv preprint arXiv:1905.01270, 2019.
20	E. Harkonen, A. Hertzmann, J. Lehtinen, and S. Paris, "GANSpace: Discovering interpretable GAN controls," Advances in Neural Information Processing Systems, Vol.33, pp.9841-9850, 2020.
21	X. Zhu, X. Liu, Z. Lei, and S. Z. Li, "Face alignment in full pose range: A 3D total solution," arXiv preprint arXiv: 1804.01005, 2018.
22	N.-A. Lahonce, Flickr-Faces-HQ Dataset (FFHQ), Nvidia, 2020.
23	J. H. Lee, M. J. Sung, J. W. Kang, and D. Chen, "Learning dense representa tions of phrases at scale," arXiv preprint arXiv:2012.12624, 2020.
24	K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, "MTCNN," IEEE Signal Processing Letters, Vol.23, No.10, 2016.
25	M. D. Hoffman, D. M. Blei, C. Wang, and J. Paisley, "Stochastic variational inference," Journal of Machine Learning Research, Vol.14, 2013. https://doi.org/10.1184/R1/6475463.V1. DOI
26	Z. Zhang and M. R. Sabuncu, "Generalized cross entropy loss for training deep neural networks with noisy labels," Advances in Neural Information Processing Systems, Vol.31, 2018.
27	D. A. Pisner and D. M. Schnyer, "Support vector machine," In Machine Learning: Methods and Applications to Brain Disorders, 2019. https://doi.org/10.1016/B978-0-12-815739-8.00006-7. DOI
28	A. Mathiasen and F. Hvilshoj, "Fast frechet inception distance," arXiv preprint arXiv:2009.14075, 2020.
29	O. Nizan and A. Tal, "Breaking the cycle-colleagues are all you need," Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2020. https://doi.org/10.1109/CVPR42600.2020.00788. DOI
30	P. Isola, J. Y. Zhu, T. Zhou, and A. A. Efros, "Pix2Pix," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
31	J. Y. Zhu, T. Park, P. Isola, and A. A. Efros, "Unpaired image-to-image translation using cycle-consistent adversarial networks," arXiv preprint arXiv:1703.10593, 2017.
32	T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, "Analyzing and improving the image quality of stylegan," Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2020. https://doi.org/10.1109/CVPR42600.2020.00813m. DOI
33	J. Han, J. Tao, and C. Wang, "FlowNet: A deep learning framework for clustering and selection of streamlines and stream surfaces," in IEEE Transactions on Visualization and Computer Graphics, Vol.26, No.4, pp.1732-1744, 2020, doi: 10.1109/TVCG.2018.2880207. DOI
34	T. Karras, S. Laine, and T. Aila, "A style-based generator architecture for generative adversarial networks," Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019-June, 2019. https://doi.org/10.1109/CVPR.2019.00453. DOI
35	R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, "The unreasonable effectiveness of deep features as a perceptual metric," Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2018. https://doi.org/10.1109/CVPR.2018.00068. DOI
36	H. Wang, S. Sridhar, J. Huang, J. Valentin, S. Song, and L. J. Guibas, "Normalized object coordinate space for category-level 6D object pose and size estimation," Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019. https://doi.org/10.1109/CVPR.2019.00275. DOI

KSCI

Style Synthesis of Speech Videos Through Generative Adversarial Neural Networks 적대적 생성 신경망을 통한 얼굴 비디오 스타일 합성 연구

Style Synthesis of Speech Videos Through Generative Adversarial Neural Networks