• 제목/요약/키워드: Image generative model

검색결과 122건 처리시간 0.022초

Toon Image Generation of Main Characters in a Comic from Object Diagram via Natural Language Based Requirement Specifications

  • Janghwan Kim;Jihoon Kong;Hee-Do Heo;Sam-Hyun Chun;R. Young Chul Kim
    • International journal of advanced smart convergence
    • /
    • 제13권1호
    • /
    • pp.85-91
    • /
    • 2024
  • Currently, generative artificial intelligence is a hot topic around the world. Generative artificial intelligence creates various images, art, video clips, advertisements, etc. The problem is that it is very difficult to verify the internal work of artificial intelligence. As a requirements engineer, I attempt to create a toon image by applying linguistic mechanisms to the current issue. This is combined with the UML object model through the semantic role analysis technique of linguists Chomsky and Fillmore. Then, the derived properties are linked to the toon creation template. This is to ensure productivity based on reusability rather than creativity in toon engineering. In the future, we plan to increase toon image productivity by incorporating software development processes and reusability.

3D Object Generation and Renderer System based on VAE ResNet-GAN

  • Min-Su Yu;Tae-Won Jung;GyoungHyun Kim;Soonchul Kwon;Kye-Dong Jung
    • International journal of advanced smart convergence
    • /
    • 제12권4호
    • /
    • pp.142-146
    • /
    • 2023
  • We present a method for generating 3D structures and rendering objects by combining VAE (Variational Autoencoder) and GAN (Generative Adversarial Network). This approach focuses on generating and rendering 3D models with improved quality using residual learning as the learning method for the encoder. We deep stack the encoder layers to accurately reflect the features of the image and apply residual blocks to solve the problems of deep layers to improve the encoder performance. This solves the problems of gradient vanishing and exploding, which are problems when constructing a deep neural network, and creates a 3D model of improved quality. To accurately extract image features, we construct deep layers of the encoder model and apply the residual function to learning to model with more detailed information. The generated model has more detailed voxels for more accurate representation, is rendered by adding materials and lighting, and is finally converted into a mesh model. 3D models have excellent visual quality and accuracy, making them useful in various fields such as virtual reality, game development, and metaverse.

GAN 기반 의료영상 생성 모델에 대한 품질 및 다양성 평가 및 분석 (Assessment and Analysis of Fidelity and Diversity for GAN-based Medical Image Generative Model)

  • 장유진;유재준;홍헬렌
    • 한국컴퓨터그래픽스학회논문지
    • /
    • 제28권2호
    • /
    • pp.11-19
    • /
    • 2022
  • 최근 의료영상의 발전에 따라 의료 영상 생성에 대한 다양한 연구가 제안되고 있는데, 이와 관련하여 생성된 의료 영상의 품질과 다양성을 정확하게 평가하는 것이 중요해지고 있다. 생성된 의료 영상을 평가하는 방법으로는 전문가의 시각적 튜링 테스트(visual turing test), 특징 분포 시각화, IS, FID를 통한 정량적 평가를 통해 평가하고 있으나 의료 영상을 품질(fidelity)과 다양성(diversity) 측면에서 정량적으로 평가 하는 방법은 거의 이루어지고 있지 않다. 본 논문에서는 DCGAN과 PGGAN 생성 모델을 통해 비소세포폐암 환자의 흉부 CT 데이터 셋을 학습하여 영상을 생성하고, 이를 품질(fidelity)과 다양성(diversity) 측면에서 두 생성 모델의 성능을 평가한다. 1차원 점수 기반 평가방법인 IS, FID와 2차원 점수 기반 평가방법인 Precision 및 Recall, 개선된 Precision 및 Recall을 통해 성능을 정량적으로 평가하고, 의료영상에서의 각 평가방법들의 특징과 한계점에 대해서도 분석한다.

Few-Shot Content-Level Font Generation

  • Majeed, Saima;Hassan, Ammar Ul;Choi, Jaeyoung
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권4호
    • /
    • pp.1166-1186
    • /
    • 2022
  • Artistic font design has become an integral part of visual media. However, without prior knowledge of the font domain, it is difficult to create distinct font styles. When the number of characters is limited, this task becomes easier (e.g., only Latin characters). However, designing CJK (Chinese, Japanese, and Korean) characters presents a challenge due to the large number of character sets and complexity of the glyph components in these languages. Numerous studies have been conducted on automating the font design process using generative adversarial networks (GANs). Existing methods rely heavily on reference fonts and perform font style conversions between different fonts. Additionally, rather than capturing style information for a target font via multiple style images, most methods do so via a single font image. In this paper, we propose a network architecture for generating multilingual font sets that makes use of geometric structures as content. Additionally, to acquire sufficient style information, we employ multiple style images belonging to a single font style simultaneously to extract global font style-specific information. By utilizing the geometric structural information of content and a few stylized images, our model can generate an entire font set while maintaining the style. Extensive experiments were conducted to demonstrate the proposed model's superiority over several baseline methods. Additionally, we conducted ablation studies to validate our proposed network architecture.

A Novel Cross Channel Self-Attention based Approach for Facial Attribute Editing

  • Xu, Meng;Jin, Rize;Lu, Liangfu;Chung, Tae-Sun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권6호
    • /
    • pp.2115-2127
    • /
    • 2021
  • Although significant progress has been made in synthesizing visually realistic face images by Generative Adversarial Networks (GANs), there still lacks effective approaches to provide fine-grained control over the generation process for semantic facial attribute editing. In this work, we propose a novel cross channel self-attention based generative adversarial network (CCA-GAN), which weights the importance of multiple channels of features and archives pixel-level feature alignment and conversion, to reduce the impact on irrelevant attributes while editing the target attributes. Evaluation results show that CCA-GAN outperforms state-of-the-art models on the CelebA dataset, reducing Fréchet Inception Distance (FID) and Kernel Inception Distance (KID) by 15~28% and 25~100%, respectively. Furthermore, visualization of generated samples confirms the effect of disentanglement of the proposed model.

적대적 생성 신경망을 이용한 레이더 기반 초단시간 강우예측 (Radar-based rainfall prediction using generative adversarial network)

  • 윤성심;신홍준;허재영
    • 한국수자원학회논문집
    • /
    • 제56권8호
    • /
    • pp.471-484
    • /
    • 2023
  • 적대적 생성 신경망 기반의 딥러닝 모델은 학습된 정보를 바탕으로 새로운 정보를 생성하는데 특화되어 있다. 구글 딥마인드에서 개발한 deep generative model of rain (DGMR) 모델은 대규모 레이더 이미지 데이터의 복잡한 패턴과 관계를 학습하여, 예측 레이더 이미지를 생성하는 적대적 생성 신경망 모델이다. 본 연구에서는 환경부 레이더 강우관측자료를 이용하여 DGMR 모델을 학습하고, 2021년 8월 호우사례를 대상으로 적대적 생성 신경망을 이용하여 강우예측을 수행하고 기존 예측기법들과 정확도를 비교하였다. DGMR은 대체적으로 선행 60분까지는 강우 분포 위치가 관측강우와 가장 유사하였으나, 전체 영역에서 강한 강우가 발생한 사례에서는 강우가 지속적으로 발달하는 것으로 예측하는 경향이 있었다. 통계적 평가에서도 DGMR 기법이 1시간 선행예측에서 임계성공지수 0.57~0.79, 평균절대오차 0.57~1.36 mm로 나타나 타 기법 대비 효과적인 강우예측 기법임을 보여주었다. 다만, 생성 결과의 다양성이 부족한 경우가 발생하여 예측 정확도를 저하하므로 다양성을 개선하기 위한 연구와 2시간 이상의 선행예측에 대한 정확도 개선을 위해 물리기반 수치예보모델 예측강우 자료를 이용한 보완이 필요할 것으로 판단되었다.

Frontal Face Generation Algorithm from Multi-view Images Based on Generative Adversarial Network

  • Heo, Young- Jin;Kim, Byung-Gyu;Roy, Partha Pratim
    • Journal of Multimedia Information System
    • /
    • 제8권2호
    • /
    • pp.85-92
    • /
    • 2021
  • In a face, there is much information of person's identity. Because of this property, various tasks such as expression recognition, identity recognition and deepfake have been actively conducted. Most of them use the exact frontal view of the given face. However, various directions of the face can be observed rather than the exact frontal image in real situation. The profile (side view) lacks information when comparing with the frontal view image. Therefore, if we can generate the frontal face from other directions, we can obtain more information on the given face. In this paper, we propose a combined style model based the conditional generative adversarial network (cGAN) for generating the frontal face from multi-view images that consist of characteristics that not only includes the style around the face (hair and beard) but also detailed areas (eye, nose, and mouth).

GAN-based Color Palette Extraction System by Chroma Fine-tuning with Reinforcement Learning

  • Kim, Sanghyuk;Kang, Suk-Ju
    • Journal of Semiconductor Engineering
    • /
    • 제2권1호
    • /
    • pp.125-129
    • /
    • 2021
  • As the interest of deep learning, techniques to control the color of images in image processing field are evolving together. However, there is no clear standard for color, and it is not easy to find a way to represent only the color itself like the color-palette. In this paper, we propose a novel color palette extraction system by chroma fine-tuning with reinforcement learning. It helps to recognize the color combination to represent an input image. First, we use RGBY images to create feature maps by transferring the backbone network with well-trained model-weight which is verified at super resolution convolutional neural networks. Second, feature maps are trained to 3 fully connected layers for the color-palette generation with a generative adversarial network (GAN). Third, we use the reinforcement learning method which only changes chroma information of the GAN-output by slightly moving each Y component of YCbCr color gamut of pixel values up and down. The proposed method outperforms existing color palette extraction methods as given the accuracy of 0.9140.

Face Recognition Research Based on Multi-Layers Residual Unit CNN Model

  • Zhang, Ruyang;Lee, Eung-Joo
    • 한국멀티미디어학회논문지
    • /
    • 제25권11호
    • /
    • pp.1582-1590
    • /
    • 2022
  • Due to the situation of the widespread of the coronavirus, which causes the problem of lack of face image data occluded by masks at recent time, in order to solve the related problems, this paper proposes a method to generate face images with masks using a combination of generative adversarial networks and spatial transformation networks based on CNN model. The system we proposed in this paper is based on the GAN, combined with multi-scale convolution kernels to extract features at different details of the human face images, and used Wasserstein divergence as the measure of the distance between real samples and synthetic samples in order to optimize Generator performance. Experiments show that the proposed method can effectively put masks on face images with high efficiency and fast reaction time and the synthesized human face images are pretty natural and real.

Conditional GAN을 이용한 SAR 표적영상의 해상도 변환 (Resolution Conversion of SAR Target Images Using Conditional GAN)

  • 박지훈;서승모;최여름;유지희
    • 한국군사과학기술학회지
    • /
    • 제24권1호
    • /
    • pp.12-21
    • /
    • 2021
  • For successful automatic target recognition(ATR) with synthetic aperture radar(SAR) imagery, SAR target images of the database should have the identical or highly similar resolution with those collected from SAR sensors. However, it is time-consuming or infeasible to construct the multiple databases with different resolutions depending on the operating SAR system. In this paper, an approach for resolution conversion of SAR target images is proposed based on conditional generative adversarial network(cGAN). First, a number of pairs consisting of SAR target images with two different resolutions are obtained via SAR simulation and then used to train the cGAN model. Finally, the model generates the SAR target image whose resolution is converted from the original one. The similarity analysis is performed to validate reliability of the generated images. The cGAN model is further applied to measured MSTAR SAR target images in order to estimate its potential for real application.