Search | Korea Science

Kim, Yanghee;Lee, Chanhee;Whang, Taesun;Kim, Gyeongmin;Lim, Heuiseok
- Journal of Internet of Things and Convergence
- /
- v.4 no.2
- /
- pp.1-6
- /
- 2018
Techniques for generating new sample data from higher dimensional data such as images have been utilized variously for speech synthesis, image conversion and image restoration. This paper adopts Progressive Growing of Generative Adversarial Networks(PG-GANs) as an implementation model to generate high-resolution images and to enhance variation of the generated images, and applied it to fashion image data. PG-GANs allows the generator and discriminator to progressively learn at the same time, continuously adding new layers from low-resolution images to result high-resolution images. We also proposed a Mini-batch Discrimination method to increase the diversity of generated data, and proposed a Sliced Wasserstein Distance(SWD) evaluation method instead of the existing MS-SSIM to evaluate the GAN model.
https://doi.org/10.20465/KIOTS.2018.4.2.001 인용 PDF

Zhang, Ruyang;Lee, Eung-Joo
- Journal of Korea Multimedia Society
- /
- v.25 no.11
- /
- pp.1582-1590
- /
- 2022
Due to the situation of the widespread of the coronavirus, which causes the problem of lack of face image data occluded by masks at recent time, in order to solve the related problems, this paper proposes a method to generate face images with masks using a combination of generative adversarial networks and spatial transformation networks based on CNN model. The system we proposed in this paper is based on the GAN, combined with multi-scale convolution kernels to extract features at different details of the human face images, and used Wasserstein divergence as the measure of the distance between real samples and synthetic samples in order to optimize Generator performance. Experiments show that the proposed method can effectively put masks on face images with high efficiency and fast reaction time and the synthesized human face images are pretty natural and real.
https://doi.org/10.9717/kmms.2022.25.11.1582 인용 PDF KSCI

Choi, Yeonju;Kim, Minsik;Kim, Yongwoo;Han, Sanghyuck
- Korean Journal of Remote Sensing
- /
- v.36 no.3
- /
- pp.449-460
- /
- 2020
Super-resolution is a technique used to reconstruct an image with low-resolution into that of high-resolution. Recently, deep-learning based super resolution has become the mainstream, and applications of these methods are widely used in the remote sensing field. In this paper, we propose a super-resolution method based on the deep back-projection network model to improve the satellite image resolution by the factor of four. In the process, we customized the loss function with the edge loss to result in a more detailed feature of the boundary of each object and to improve the stability of the model training using generative adversarial network based on Wasserstein distance loss. Also, we have applied the detail preserving image down-scaling method to enhance the naturalness of the training output. Finally, by including the modified-residual learning with a panchromatic feature in the final step of the training process. Our proposed method is able to reconstruct fine features and high frequency information. Comparing the results of our method with that of the others, we propose that the super-resolution method improves the sharpness and the clarity of WorldView-3 and KOMPSAT-2 images.
https://doi.org/10.7780/kjrs.2020.36.3.5 인용 PDF KSCI HTML

Bogyung Park;Somin Park;Hyunki Hong
- KIPS Transactions on Software and Data Engineering
- /
- v.12 no.7
- /
- pp.303-314
- /
- 2023
Voice conversion, a technology that allows an individual's speech data to be regenerated with the acoustic properties(tone, cadence, gender) of another, has countless applications in education, communication, and entertainment. This paper proposes an approach based on the StarGAN-VC model that generates realistic-sounding speech without requiring parallel utterances. To overcome the constraints of the existing StarGAN-VC model that utilizes one-hot vectors of original and target speaker information, this paper extracts feature vectors of target speakers using a pre-trained version of Rawnet3. This results in a latent space where voice conversion can be performed without direct speaker-to-speaker mappings, enabling an any-to-any structure. In addition to the loss terms used in the original StarGAN-VC model, Wasserstein distance is used as a loss term to ensure that generated voice segments match the acoustic properties of the target voice. Two Time-Scale Update Rule (TTUR) is also used to facilitate stable training. Experimental results show that the proposed method outperforms previous methods, including the StarGAN-VC network on which it was based.
https://doi.org/10.3745/KTSDE.2023.12.7.303 인용 PDF