• Title/Summary/Keyword: Text-to-Image

Search Result 891, Processing Time 0.031 seconds

Extracting curved text lines using the chain composition and the expanded grouping method (체인 정합과 확장된 그룹핑 방법을 사용한 곡선형 텍스트 라인 추출)

  • Bai, Nguyen Noi;Yoon, Jin-Seon;Song, Young-Jun;Kim, Nam;Kim, Yong-Gi
    • The KIPS Transactions:PartB
    • /
    • v.14B no.6
    • /
    • pp.453-460
    • /
    • 2007
  • In this paper, we present a method to extract the text lines in poorly structured documents. The text lines may have different orientations, considerably curved shapes, and there are possibly a few wide inter-word gaps in a text line. Those text lines can be found in posters, blocks of addresses, artistic documents. Our method based on the traditional perceptual grouping but we develop novel solutions to overcome the problems of insufficient seed points and vaned orientations un a single line. In this paper, we assume that text lines contained tone connected components, in which each connected components is a set of black pixels within a letter, or some touched letters. In our scheme, the connected components closer than an iteratively incremented threshold will make together a chain. Elongate chains are identified as the seed chains of lines. Then the seed chains are extended to the left and the right regarding the local orientations. The local orientations will be reevaluated at each side of the chains when it is extended. By this process, all text lines are finally constructed. The proposed method is good for extraction of the considerably curved text lines from logos and slogans in our experiment; 98% and 94% for the straight-line extraction and the curved-line extraction, respectively.

PDA-based Text Extraction System using Client/Server Architecture (Client/Server구조를 이용한 PDA기반의 문자 추출 시스템)

  • Park Anjin;Jung Keechul
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.2
    • /
    • pp.85-98
    • /
    • 2005
  • Recently, a lot of researches about mobile vision using Personal Digital Assistant(PDA) has been attempted. Many CPUs for PDA are integer CPUs, which have no floating-computation component. It results in slow computation of the algorithms peformed by vision system or image processing, which have much floating-computation. In this paper, in order to resolve this weakness, we propose the Client(PDA)/server(PC) architecture which is connected to each other with a wireless LAN, and we construct the system with pipelining processing using two CPUs of the Client(PDA) and the Server(PC) in image sequence. The Client(PDA) extracts tentative text regions using Edge Density(ED). The Server(PC) uses both the Multi-1.aver Perceptron(MLP)-based texture classifier and Connected Component(CC)-based filtering for a definite text extraction based on the Client(PDA)'s tentativel99-y extracted results. The proposed method leads to not only efficient text extraction by using both the MLP and the CC, but also fast running time using Client(PDA)/server(PC) architecture with the pipelining processing.

Influence of TrueView Ad Skip Buttons on Advertising Effect (트루뷰 동영상 광고의 스킵버튼 종류에 따른 광고 효과)

  • Kim, Ju Seok;Chung, Donghun
    • Journal of Information Technology Services
    • /
    • v.18 no.1
    • /
    • pp.1-12
    • /
    • 2019
  • The purpose of this study is to find out what type of skip button used in forced exposure advertising is the most positive to the users. The four types of skip buttons were produced for the experiment and tested by survey and eye tracker to reveal the effects of the skip buttons on perceived intrusion, advertising attention, attitude toward advertising, and memory consisting of recall and recognition. Out of 80 participants, 20 were randomly assigned to the specific type of skip button group. The results showed that there is no statistical difference in advertising attention, perceived intrusiveness and attitude toward advertising. However, the recall and recognition rate are the highest in the static text type and kinetic text, product image, and default follow statistically. This study has implications for using skip buttons as a major variable for inventory of TrueView advertising effects and suggests that the amount of information in the image is critical processed by users within very short time.

Supervised text data augmentation method for deep neural networks

  • Jaehwan Seol;Jieun Jung;Yeonseok Choi;Yong-Seok Choi
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.3
    • /
    • pp.343-354
    • /
    • 2023
  • Recently, there have been many improvements in general language models using architectures such as GPT-3 proposed by Brown et al. (2020). Nevertheless, training complex models can hardly be done if the number of data is very small. Data augmentation that addressed this problem was more than normal success in image data. Image augmentation technology significantly improves model performance without any additional data or architectural changes (Perez and Wang, 2017). However, applying this technique to textual data has many challenges because the noise to be added is veiled. Thus, we have developed a novel method for performing data augmentation on text data. We divide the data into signals with positive or negative meaning and noise without them, and then perform data augmentation using k-doc augmentation to randomly combine signals and noises from all data to generate new data.

Estimating Media Environments of Fashion Contents through Semantic Network Analysis from Social Network Service of Global SPA Brands (패션콘텐츠 미디어 환경 예측을 위한 해외 SPA 브랜드의 SNS 언어 네트워크 분석)

  • Jun, Yuhsun
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.43 no.3
    • /
    • pp.427-439
    • /
    • 2019
  • This study investigated the semantic network based on the focus of the fashion image and SNS text utilized by global SPA brands on the last seven years in terms of the quantity and quality of data generated by the fast-changing fashion trends and fashion content-based media environment. The research method relocated frequency, density and repetitive key words as well as visualized algorithms using the UCINET 6.347 program and the overall classification of the text related to fashion images on social networks used by global SPA brands. The conclusions of the study are as follows. A common aspect of global SPA brands is that by looking at the basis of text extraction on SNS, exposure through image of products is considered important for sales. The following is a discriminatory aspect of global SPA brands. First, ZARA consistently exposes marketing using a variety of professions and nationalities to SNS. Second, UNIQLO's correlation exposes its collaboration promotion to SNS while steadily exposing basic items. Third, in the case of H&M, some discriminatory results were found with other brands in connectivity with each cluster category that showed remarkably independent results.

On the Security of Image-based CAPTCHA using Multi-image Composition (복수의 이미지를 합성하여 사용하는 캡차의 안전성 검증)

  • Byun, Je-Sung;Kang, Jeon-Il;Nyang, Dae-Hun;Lee, Kyung-Hee
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.22 no.4
    • /
    • pp.761-770
    • /
    • 2012
  • CAPTCHAs(Completely Automated Public Turing tests to tell Computer and Human Apart) have been widely used for preventing the automated attacks such as spam mails, DDoS attacks, etc.. In the early stages, the text-based CAPTCHAs that were made by distorting random characters were mainly used for frustrating automated-bots. Many researches, however, showed that the text-based CAPTCHAs were breakable via AI or image processing techniques. Due to the reason, the image-based CAPTCHAs, which employ images instead of texts, have been considered and suggested. In many image-based CAPTCHAs, however, the huge number of source images are required to guarantee a fair level of security. In 2008, Kang et al. suggested a new image-based CAPTCHA that uses test images made by composing multiple source images, to reduce the number of source images while it guarantees the security level. In their paper, the authors showed the convenience of their CAPTCHA in use through the use study, but they did not verify its security level. In this paper, we verify the security of the image-based CAPTCHA suggested by Kang et al. by performing several attacks in various scenarios and consider other possible attacks that can happen in the real world.

Effect of text and image presenting method on Chinese college students' learning flow, learning satisfaction and learning outcome in video learning environment (중국대학생 동영상 학습에서 텍스트 제시방식과 이미지 제시방식이 학습몰입, 학습만족, 학업성취에 미치는 효과)

  • Zhang, Jing;Zhu, Hui-Qin;Kim, Bo-Kyeong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.1
    • /
    • pp.633-640
    • /
    • 2021
  • This study analyzes the effects of text and image presenting methods in video lectures on students' learning flow, learning satisfaction and learning outcomes. The text presenting methods include forming short sentences of 2 or 3 words or using key words, while image presenting methods include images featuring both detailed and related information as well as images containing only related information. 167 first year students from Xingtai University were selected as experimental participants. Groups of participants were randomly assigned to engage in four types of video. The research results are as follows. First, it was found that learning flow, learning satisfaction and learning outcomes of group presented with video forms of short sentences had higher statistical significance compared to the group experiencing the key word method. Second, learning flow, learning satisfaction and learning outcomes of group presented with video forms of only related information had higher statistical significance compared to the group experiencing the presenting method of both detailed and related information. That is, the mean values of dependent variables for groups of short form text and only related information were highest. In contrast, the mean values of dependent variables for groups of key words and both detailed and related information were the lowest.

Deep Learning-based Professional Image Interpretation Using Expertise Transplant (전문성 이식을 통한 딥러닝 기반 전문 이미지 해석 방법론)

  • Kim, Taejin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.79-104
    • /
    • 2020
  • Recently, as deep learning has attracted attention, the use of deep learning is being considered as a method for solving problems in various fields. In particular, deep learning is known to have excellent performance when applied to applying unstructured data such as text, sound and images, and many studies have proven its effectiveness. Owing to the remarkable development of text and image deep learning technology, interests in image captioning technology and its application is rapidly increasing. Image captioning is a technique that automatically generates relevant captions for a given image by handling both image comprehension and text generation simultaneously. In spite of the high entry barrier of image captioning that analysts should be able to process both image and text data, image captioning has established itself as one of the key fields in the A.I. research owing to its various applicability. In addition, many researches have been conducted to improve the performance of image captioning in various aspects. Recent researches attempt to create advanced captions that can not only describe an image accurately, but also convey the information contained in the image more sophisticatedly. Despite many recent efforts to improve the performance of image captioning, it is difficult to find any researches to interpret images from the perspective of domain experts in each field not from the perspective of the general public. Even for the same image, the part of interests may differ according to the professional field of the person who has encountered the image. Moreover, the way of interpreting and expressing the image also differs according to the level of expertise. The public tends to recognize the image from a holistic and general perspective, that is, from the perspective of identifying the image's constituent objects and their relationships. On the contrary, the domain experts tend to recognize the image by focusing on some specific elements necessary to interpret the given image based on their expertise. It implies that meaningful parts of an image are mutually different depending on viewers' perspective even for the same image. So, image captioning needs to implement this phenomenon. Therefore, in this study, we propose a method to generate captions specialized in each domain for the image by utilizing the expertise of experts in the corresponding domain. Specifically, after performing pre-training on a large amount of general data, the expertise in the field is transplanted through transfer-learning with a small amount of expertise data. However, simple adaption of transfer learning using expertise data may invoke another type of problems. Simultaneous learning with captions of various characteristics may invoke so-called 'inter-observation interference' problem, which make it difficult to perform pure learning of each characteristic point of view. For learning with vast amount of data, most of this interference is self-purified and has little impact on learning results. On the contrary, in the case of fine-tuning where learning is performed on a small amount of data, the impact of such interference on learning can be relatively large. To solve this problem, therefore, we propose a novel 'Character-Independent Transfer-learning' that performs transfer learning independently for each character. In order to confirm the feasibility of the proposed methodology, we performed experiments utilizing the results of pre-training on MSCOCO dataset which is comprised of 120,000 images and about 600,000 general captions. Additionally, according to the advice of an art therapist, about 300 pairs of 'image / expertise captions' were created, and the data was used for the experiments of expertise transplantation. As a result of the experiment, it was confirmed that the caption generated according to the proposed methodology generates captions from the perspective of implanted expertise whereas the caption generated through learning on general data contains a number of contents irrelevant to expertise interpretation. In this paper, we propose a novel approach of specialized image interpretation. To achieve this goal, we present a method to use transfer learning and generate captions specialized in the specific domain. In the future, by applying the proposed methodology to expertise transplant in various fields, we expected that many researches will be actively conducted to solve the problem of lack of expertise data and to improve performance of image captioning.

An Embedded Text Index System for Mass Flash Memory (대용량 플래시 메모리를 위한 임베디드 텍스트 인덱스 시스템)

  • Yun, Sang-Hun;Cho, Haeng-Rae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.6
    • /
    • pp.1-10
    • /
    • 2009
  • Flash memory has the advantages of nonvolatile, low power consumption, light weight, and high endurance. This enables the flash memory to be utilized as a storage of mobile computing device such as PMP(Portable Multimedia Player). Potable device with a mass flash memory can store various multimedia data such as video, audio, or image. Typical index systems for mobile computer are inefficient to search a form of text like lyric or title. In this paper, we propose a new text index system, named EMTEX(Embedded Text Index). EMTEX has the following salient features. First, it uses a compression algorithm for embedded system. Second, if a new insert or delete operation is executed on the base table. EMTEX updates the text index immediately. Third, EMTEX considers the characteristics of flash memory to design insert, delete, and rebuild operations on the text index. Finally, EMTEX is executed as an upper layer of DBMS. Therefore, it is independent of the underlying DBMS. We evaluate the performance of EMTEX. The Experiment results show that EMTEX can outperform th conventional index systems such as Oracle Text and FT3.

An Image-Based CAPTCHA Scheme Exploiting Human Appearance Characteristics

  • Kalsoom, Sajida;Ziauddin, Sheikh;Abbasi, Abdul Rehman
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.2
    • /
    • pp.734-750
    • /
    • 2012
  • CAPTCHAs are automated tests that are there to avoid misuse of computing and information resources by bots. Typical text-based CAPTCHAs are proven to be vulnerable against malicious automated programs. In this paper, we present an image-based CAPTCHA scheme using easily identifiable human appearance characteristics that overcomes the weaknesses of current text-based schemes. We propose and evaluate two applications for our scheme involving 25 participants. Both applications use same characteristics but different classes against those characteristics. Application 1 is optimized for security while application 2 is optimized for usability. Experimental evaluation shows promising results having 83% human success rate with Application 2 as compared to 62% with Application 1.