• Title/Summary/Keyword: Multi-modal Generative AI

Search Result 4, Processing Time 0.015 seconds

Research on Generative AI for Korean Multi-Modal Montage App (한국형 멀티모달 몽타주 앱을 위한 생성형 AI 연구)

  • Lim, Jeounghyun;Cha, Kyung-Ae;Koh, Jaepil;Hong, Won-Kee
    • Journal of Service Research and Studies
    • /
    • v.14 no.1
    • /
    • pp.13-26
    • /
    • 2024
  • Multi-modal generation is the process of generating results based on a variety of information, such as text, images, and audio. With the rapid development of AI technology, there is a growing number of multi-modal based systems that synthesize different types of data to produce results. In this paper, we present an AI system that uses speech and text recognition to describe a person and generate a montage image. While the existing montage generation technology is based on the appearance of Westerners, the montage generation system developed in this paper learns a model based on Korean facial features. Therefore, it is possible to create more accurate and effective Korean montage images based on multi-modal voice and text specific to Korean. Since the developed montage generation app can be utilized as a draft montage, it can dramatically reduce the manual labor of existing montage production personnel. For this purpose, we utilized persona-based virtual person montage data provided by the AI-Hub of the National Information Society Agency. AI-Hub is an AI integration platform aimed at providing a one-stop service by building artificial intelligence learning data necessary for the development of AI technology and services. The image generation system was implemented using VQGAN, a deep learning model used to generate high-resolution images, and the KoDALLE model, a Korean-based image generation model. It can be confirmed that the learned AI model creates a montage image of a face that is very similar to what was described using voice and text. To verify the practicality of the developed montage generation app, 10 testers used it and more than 70% responded that they were satisfied. The montage generator can be used in various fields, such as criminal detection, to describe and image facial features.

Audio Generative AI Usage Pattern Analysis by the Exploratory Study on the Participatory Assessment Process

  • Hanjin Lee;Yeeun Lee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.4
    • /
    • pp.47-54
    • /
    • 2024
  • The importance of cultural arts education utilizing digital tools is increasing in terms of enhancing tech literacy, self-expression, and developing convergent capabilities. The creation process and evaluation of innovative multi-modal AI, provides expanded creative audio-visual experiences in users. In particular, the process of creating music with AI provides innovative experiences in all areas, from musical ideas to improving lyrics, editing and variations. In this study, we attempted to empirically analyze the process of performing tasks using an Audio and Music Generative AI platform and discussing with fellow learners. As a result, 12 services and 10 types of evaluation criteria were collected through voluntary participation, and divided into usage patterns and purposes. The academic, technological, and policy implications were presented for AI-powered liberal arts education with learners' perspectives.

UI/UX for Generative AI (생성형 AI 용도의 UI/UX)

  • Tae-Seok Kim;Anh H. Vo;Marvin John Ignacio;Khuong G. T. Diep;Yong-Guk Kim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.687-690
    • /
    • 2023
  • 본 논문은 다양한 종류의 생성형 AI 용도의 UI/UX 중 텍스트 기반 UI/UX, 이미지 기반 UI/UX, 오디오 기반 UI/UX, 그리고 Multi-modal 을 기반으로 둔 UI/UX 와 같은 다양한 유형의 UI/UX 를 살펴보고 최신 기술을 활용한 미래전망에 대해 알아 보도록 한다. 현재 생성 모델은 다양한 산업 분야에서 광범위하고 다양한 응용 프로그램으로 사용되고 있으며, 최근 연구자와 실무자들로부터 상당한 관심을 받고 있다.생성형 AI 용도의 UI/UX 를 사용하면 생활에 편리해지며 시간과 돈이 매우 절약이 된다. 특히 사용자들이 편안하게 사용할 수 있는 생성형 AI 의 UI/UX 대한 연구방향에 대해 알아 보도록 한다.

A Study on Performance Improvement of GVQA Model Using Transformer (트랜스포머를 이용한 GVQA 모델의 성능 개선에 관한 연구)

  • Park, Sung-Wook;Kim, Jun-Yeong;Park, Jun;Lee, Han-Sung;Jung, Se-Hoon;Sim, Cun-Bo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.749-752
    • /
    • 2021
  • 오늘날 인공지능(Artificial Intelligence, AI) 분야에서 가장 구현하기 어려운 분야 중 하나는 추론이다. 근래 추론 분야에서 영상과 언어가 결합한 다중 모드(Multi-modal) 환경에서 영상 기반의 질의 응답(Visual Question Answering, VQA) 과업에 대한 AI 모델이 발표됐다. 얼마 지나지 않아 VQA 모델의 성능을 개선한 GVQA(Grounded Visual Question Answering) 모델도 발표됐다. 하지만 아직 GVQA 모델도 완벽한 성능을 내진 못한다. 본 논문에서는 GVQA 모델의 성능 개선을 위해 VCC(Visual Concept Classifier) 모델을 ViT-G(Vision Transformer-Giant)/14로 변경하고, ACP(Answer Cluster Predictor) 모델을 GPT(Generative Pretrained Transformer)-3으로 변경한다. 이와 같은 방법들은 성능을 개선하는 데 큰 도움이 될 수 있다고 사료된다.