• Title/Summary/Keyword: Text-to-Image

Search Result 891, Processing Time 0.027 seconds

Implementation of Annotation and Thesaurus for Remote Sensing

  • Chae, Gee-Ju;Yun, Young-Bo;Park, Jong-Hyun
    • Proceedings of the KSRS Conference
    • /
    • 2003.11a
    • /
    • pp.222-224
    • /
    • 2003
  • Many users want to add some their own information to data which was on the web and computer without actually needing to touch data. In remote sensing, the result data for image classification consist of image and text file in general. To overcome these inconvenience problems, we suggest the annotation method using XML language. We give the efficient annotation method which can be applied to web and viewing of image classification. We can apply the annotation for web and image classification with image and text file. The need for thesaurus construction is the lack of information for remote sensing and GIS on search engine like Empas, Naver and Google. In search engine, we can’t search the information for word which has many different names simultaneously. We select the remote sensing data from different sources and make the relation between many terms. For this process, we analyze the meaning for different terms which has similar meaning.

  • PDF

Injection of Cultural-based Subjects into Stable Diffusion Image Generative Model

  • Amirah Alharbi;Reem Alluhibi;Maryam Saif;Nada Altalhi;Yara Alharthi
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.2
    • /
    • pp.1-14
    • /
    • 2024
  • While text-to-image models have made remarkable progress in image synthesis, certain models, particularly generative diffusion models, have exhibited a noticeable bias to- wards generating images related to the culture of some developing countries. This paper introduces an empirical investigation aimed at mitigating the bias of image generative model. We achieve this by incorporating symbols representing Saudi culture into a stable diffusion model using the Dreambooth technique. CLIP score metric is used to assess the outcomes in this study. This paper also explores the impact of varying parameters for instance the quantity of training images and the learning rate. The findings reveal a substantial reduction in bias-related concerns and propose an innovative metric for evaluating cultural relevance.

Document Layout Analysis Based on Fuzzy Energy Matrix

  • Oh, KangHan;Kim, SooHyung
    • International Journal of Contents
    • /
    • v.11 no.2
    • /
    • pp.1-8
    • /
    • 2015
  • In this paper, we describe a novel method for document layout analysis that is based on a Fuzzy Energy Matrix (FEM). A FEM is a two-dimensional matrix that contains the likelihood of text and non-text and is generated through the use of Fuzzy theory. The key idea is to define an Energy map for the document to categorize text and non-text. The proposed mechanism is designed for execution with a low-resolution document image, and hence our method has a fast processing speed. The proposed method has been tested on public ICDAR 2009 datasets to conduct a comparison against other state-of-the-art methods, and it was also tested with Korean documents. The results of the experiment indicate that this scheme achieves superior segmentation accuracy, in terms of both precision and recall, and also requires less time for computation than other state-of-the-art document image analysis methods.

A Feasibility Study on RUNWAY GEN-2 for Generating Realistic Style Images

  • Yifan Cui;Xinyi Shan;Jeanhun Chung
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.16 no.1
    • /
    • pp.99-105
    • /
    • 2024
  • Runway released an updated version, Gen-2, in March 2023, which introduced new features that are different from Gen-1: it can convert text and images into videos, or convert text and images together into video images based on text instructions. This update will be officially open to the public in June 2023, so more people can enjoy and use their creativity. With this new feature, users can easily transform text and images into impressive video creations. However, as with all new technologies, comes the instability of AI, which also affects the results generated by Runway. This article verifies the feasibility of using Runway to generate the desired video from several aspects through personal practice. In practice, I discovered Runway generation problems and propose improvement methods to find ways to improve the accuracy of Runway generation. And found that although the instability of AI is a factor that needs attention, through careful adjustment and testing, users can still make full use of this feature and create stunning video works. This update marks the beginning of a more innovative and diverse future for the digital creative field.

Using similarity based image caption to aid visual question answering (유사도 기반 이미지 캡션을 이용한 시각질의응답 연구)

  • Kang, Joonseo;Lim, Changwon
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.2
    • /
    • pp.191-204
    • /
    • 2021
  • Visual Question Answering (VQA) and image captioning are tasks that require understanding of the features of images and linguistic features of text. Therefore, co-attention may be the key to both tasks, which can connect image and text. In this paper, we propose a model to achieve high performance for VQA by image caption generated using a pretrained standard transformer model based on MSCOCO dataset. Captions unrelated to the question can rather interfere with answering, so some captions similar to the question were selected to use based on a similarity to the question. In addition, stopwords in the caption could not affect or interfere with answering, so the experiment was conducted after removing stopwords. Experiments were conducted on VQA-v2 data to compare the proposed model with the deep modular co-attention network (MCAN) model, which showed good performance by using co-attention between images and text. As a result, the proposed model outperformed the MCAN model.

A Novel Text to Image Conversion Method Using Word2Vec and Generative Adversarial Networks

  • LIU, XINRUI;Joe, Inwhee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.05a
    • /
    • pp.401-403
    • /
    • 2019
  • In this paper, we propose a generative adversarial networks (GAN) based text-to-image generating method. In many natural language processing tasks, which word expressions are determined by their term frequency -inverse document frequency scores. Word2Vec is a type of neural network model that, in the case of an unlabeled corpus, produces a vector that expresses semantics for words in the corpus and an image is generated by GAN training according to the obtained vector. Thanks to the understanding of the word we can generate higher and more realistic images. Our GAN structure is based on deep convolution neural networks and pixel recurrent neural networks. Comparing the generated image with the real image, we get about 88% similarity on the Oxford-102 flowers dataset.

Image Based Text Matching Using Local Crowdedness and Hausdorff Distance (지역 밀집도 및 Hausdorff 거리를 이용한 영상기반 텍스트 매칭)

  • Son, Hwa-Jeong;Kim, Ji-Soo;Park, Mi-Seon;Yoo, Jae-Myeong;Kim, Soo-Hyung
    • The Journal of the Korea Contents Association
    • /
    • v.6 no.10
    • /
    • pp.134-142
    • /
    • 2006
  • In this paper, we investigate a Hausdorff distance, which is used for the measurement of image similarity, to see whether it is also effective for document retrieval. The proposed method uses a local crowdedness and a Hausdorff distance to locate text images by determining whether a pair of images scanned at different time comes from the same text or not. To reduce the processing time, which is one of the disadvantages of a Hausdorff distance algorithm, we adopt a local crowdedness for feature point extraction. We apply the proposed method to 190 pairs of the same class and 190 pairs of the different class collected from postal envelop images. The results show that the modified Hausdorff distance proposed in this paper performed well in locating the tort region and calculating the degree of similarity between two images. An improvement of accuracy by 2.7% and 9.0% has been obtained, compared to a binary correlation method and the original Hausdorff distance method, respectively.

  • PDF

A Development design Image DataBase (디자인 이미지데이터베이스 구축사례 연구)

  • 정지홍
    • Archives of design research
    • /
    • v.13 no.3
    • /
    • pp.313-320
    • /
    • 2000
  • Currently, The new wave of information technology has enormously influenced every field. In the Held of design, it is time to strive possible efforts in order to accumulate the design-related knowledge by maintaining, managing and controlling design information in a systematic manner, getting out of the old stage of mere use of data itself. Due to remarkable progress in communication media and speed, and file compression technology, text-centric data has been shifting to multimedia data such as image and motion picture. So it is currently required that methologies be developed to effectively utilize the related information. With respect to the processing of image data, it is certain that the optimal method should be come up with reflecting the unique characteristics and utilization of image data, apart from the traditional way of processing and storing the legacy text-based data. The study suggests the system of indexing and implementing design image information through the case of analyzing design image data, abstracting data elements of image itself, and finally applying it to building image-oriented database for use.

  • PDF

Regional Image Change Analysis using Text Mining and Network Analysis (텍스트 마이닝과 네트워크 분석을 이용한 지역 이미지 변화 분석)

  • Jeong, Eun-Hee
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.15 no.2
    • /
    • pp.79-88
    • /
    • 2022
  • Social media big data includes a lot of information that can identify not only consumer consumption patterns but also local images. This paper was collected annually data including 'Samcheok' from 2015 to 2019 from Blog and Cafe of Naver and Daum in domestic portal site, and analyzed the regional image change after refining keyword which forms the regional image by performing text mining and network analysis. According to the research results, the regional image of 2015 was expressed with image cognitive elements of the nearby place name or place etc. such as 'Jangho Port', 'Donghae', and 'Beach'. However the regional image both 2016 and 2019 were changed with image cognitive elements of 'SamcheokSolbich' which is a special place within region. Therefore as the keywords related to the local image include 'Jangho Port' and Resort, which are the representative attractions of Samcheok, it can be seen that the infrastructure factor plays a big role in forming the local image. The significance test for the network data used the bootstrap technique, and the p-values in 2015, 2016, and 2019 were 0.0002, 0.0006, and 0.0002, respectively, which were found to be statistically significant at the significance level of 5%.

A Study on the Retrieval Effectiveness Based on Image Query Types (이미지 인지 유형 및 검색질의 방식에 따른 검색 효율성에 관한 연구)

  • Kim, Seonghee;Yi, Keunyoung
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.47 no.3
    • /
    • pp.321-342
    • /
    • 2013
  • The purpose of this study was to compare and evaluate retrieval effectiveness of three types of image perception using different retrieval methods. Image types included specific, general, and abstract topics. The retrieval method included text only search, query by example (QBE) search, and a hybrid/hybrid search. Thirty-two college students were recruited for searching topics using Google image search system. The search results were compared with One-Way and Two-Way ANOVA. As a result, text search and hybrid search showed advantage when searching for specific and general topics. On the other hand, the QBE search performed better than both the text-only and hybrid search for abstract topics. The results have implications for the implementation of image retrieval systems.