• Title/Summary/Keyword: document image analysis

Search Result 86, Processing Time 0.035 seconds

Jointly Image Topic and Emotion Detection using Multi-Modal Hierarchical Latent Dirichlet Allocation

  • Ding, Wanying;Zhu, Junhuan;Guo, Lifan;Hu, Xiaohua;Luo, Jiebo;Wang, Haohong
    • Journal of Multimedia Information System
    • /
    • v.1 no.1
    • /
    • pp.55-67
    • /
    • 2014
  • Image topic and emotion analysis is an important component of online image retrieval, which nowadays has become very popular in the widely growing social media community. However, due to the gaps between images and texts, there is very limited work in literature to detect one image's Topics and Emotions in a unified framework, although topics and emotions are two levels of semantics that often work together to comprehensively describe one image. In this work, a unified model, Joint Topic/Emotion Multi-Modal Hierarchical Latent Dirichlet Allocation (JTE-MMHLDA) model, which extends previous LDA, mmLDA, and JST model to capture topic and emotion information at the same time from heterogeneous data, is proposed. Specifically, a two level graphical structured model is built to realize sharing topics and emotions among the whole document collection. The experimental results on a Flickr dataset indicate that the proposed model efficiently discovers images' topics and emotions, and significantly outperform the text-only system by 4.4%, vision-only system by 18.1% in topic detection, and outperforms the text-only system by 7.1%, vision-only system by 39.7% in emotion detection.

  • PDF

Variational Expectation-Maximization Algorithm in Posterior Distribution of a Latent Dirichlet Allocation Model for Research Topic Analysis

  • Kim, Jong Nam
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.7
    • /
    • pp.883-890
    • /
    • 2020
  • In this paper, we propose a variational expectation-maximization algorithm that computes posterior probabilities from Latent Dirichlet Allocation (LDA) model. The algorithm approximates the intractable posterior distribution of a document term matrix generated from a corpus made up by 50 papers. It approximates the posterior by searching the local optima using lower bound of the true posterior distribution. Moreover, it maximizes the lower bound of the log-likelihood of the true posterior by minimizing the relative entropy of the prior and the posterior distribution known as KL-Divergence. The experimental results indicate that documents clustered to image classification and segmentation are correlated at 0.79 while those clustered to object detection and image segmentation are highly correlated at 0.96. The proposed variational inference algorithm performs efficiently and faster than Gibbs sampling at a computational time of 0.029s.

Text Line Segmentation of Handwritten Documents by Area Mapping

  • Boragule, Abhijeet;Lee, GueeSang
    • Smart Media Journal
    • /
    • v.4 no.3
    • /
    • pp.44-49
    • /
    • 2015
  • Text line segmentation is a preprocessing step in OCR, which can significantly influence the accuracy of document analysis applications. This paper proposes a novel methodology for the text line segmentation of handwritten documents. First, the average width of the connected components is used to form a 1-D Gaussian kernel and a smoothing operation is then applied to the input binary image. The adaptive binarization of the smoothed image forms the final text lines. In this work, the segmentation method involves two stages: firstly, the large connected components are labelled as a unique text line using text line area mapping. Secondly, the final refinement of the segmentation is performed using the Euclidean distance between the text line and small connected components. The group of uniquely labelled text candidates achieves promising segmentation results. The proposed approach works well on Korean and English language handwritten documents captured using a camera.

Analyzing the Research Fronts of Women's Studies in Korea Using Citation Image Makers Profiling (인용 이미지 구축자 프로파일링을 이용한 국내 여성학 분야 연구 전선 분석)

  • Kim, Jo-Ah;Lee, Jae Yun
    • Journal of the Korean Society for information Management
    • /
    • v.33 no.2
    • /
    • pp.201-225
    • /
    • 2016
  • A new technique for revealing the research fronts of a interdisciplinary discipline has been developed. Citation image makers profiling (CIMP) determines the relationships between research papers with the title words of the citing documents. We adapted this new technique to analyze the research fronts and hot topics in women's studies of Korea. By Korean Citation Index (KCI) data in 2015, we selected 148 papers cited more than 9 times as the core documents of women's studies. Analysis of intellectual structure using citation image makers profiling was performed with the 148 core documents and those citing papers. Document co-citation analysis was hindered by citation data sparsity, while CIMP method successfully revealed the structure of research fronts of Korean women's studies including 2 divisions and 6 subdivisions. The CIMP method suggested in this study has good potential to discover the characteristics of research fronts of interdisciplinary research domains.

Developing Standard Transmission System for Radiology Reporting Including Key Images (Key Image를 포함한 방사선과 판독결과지 표준전송시스템 개발)

  • Kim, Seon-Chil
    • Journal of radiological science and technology
    • /
    • v.30 no.1
    • /
    • pp.47-51
    • /
    • 2007
  • Development of hospital information system and Picture Archiving Communication System is not new in the medical field, and the development of internet and information technology are also universal. In the course of such development, however, it is hard to share medical information without a refined standard format. Especially in the department of radiology, the role of PACS has become very important in interchanging information with other disparate hospital information systems. A specific system needs to be developed that radiological reports are archived into a database efficiently. This includes sharing of medical images. A model is suggested in this study in which an internal system is developed where radiologists store necessary images and transmit them in the standard international clinical format, Clinical Document Architecture, and share the information with hospitals. CDA document generator was made to generate a new file format and separate the existing storage system from the new system. This was to ensure the access to required data in XML documents. The model presented in this study added a process where crucial images in reading are inserted in the CDA radiological report generator. Therefore, this study suggests a storage and transmission model for CDA documents, which is different from the existing DICOM SR. Radiological reports could be better shared, when the application function for inserting images and the analysis of standard clinical terms are completed.

  • PDF

Development of an Automated ESG Document Review System using Ensemble-Based OCR and RAG Technologies

  • Eun-Sil Choi
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.9
    • /
    • pp.25-37
    • /
    • 2024
  • This study proposes a novel automation system that integrates Optical Character Recognition (OCR) and Retrieval-Augmented Generation (RAG) technologies to enhance the efficiency of the ESG (Environmental, Social, and Governance) document review process. The proposed system improves text recognition accuracy by applying an ensemble model-based image preprocessing algorithm and hybrid information extraction models in the OCR process. Additionally, the RAG pipeline optimizes information retrieval and answer generation reliability through the implementation of layout analysis algorithms, re-ranking algorithms, and ensemble retrievers. The system's performance was evaluated using certificate images from online portals and corporate internal regulations obtained from various sources, such as the company's websites. The results demonstrated an accuracy of 93.8% for certification reviews and 92.2% for company regulations reviews, indicating that the proposed system effectively supports human evaluators in the ESG assessment process.

Locating Text in Web Images Using Image Based Approaches (웹 이미지로부터 이미지기반 문자추출)

  • Chin, Seongah;Choo, Moonwon
    • Journal of Intelligence and Information Systems
    • /
    • v.8 no.1
    • /
    • pp.27-39
    • /
    • 2002
  • A locating text technique capable of locating and extracting text blocks in various Web images is presented here. Until now this area of work has been ignored by researchers even if this sort of text may be meaningful for internet users. The algorithms associated with the technique work without prior knowledge of the text orientation, size or font. In the work presented in this research, our text extraction algorithm utilizes useful edge detection followed by histogram analysis on the genuine characteristics of letters defined by text clustering region, to properly perform extraction of the text region that does not depend on font styles and sizes. By a number of experiments we have showed impressively acceptable results.

  • PDF

Adaptive thresholding for two-dimensional barcode images using two thresholds and the integral image (이중 문턱 값과 적분영상을 이용한 2차원 바코드 영상의 적응적 이진화)

  • Lee, Yeon-Kyung;Yoo, Hoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.11
    • /
    • pp.2453-2458
    • /
    • 2012
  • In this paper, we propose an adaptive thresholding method to binarize two-dimensional barcode images. Adaptive thresholding methods that minimize light effects convert an original image into a binary image. The methods are applied to document image binarization. The methods, however, have problems of determining box size used in adaptive thresholding. thus, they inappropriate to use in recognition of two-dimensional barcode images. To overcome the problem, we analysis the problem and propose a new adaptive threshold method using the integral image. To show the effectiveness of our method, we compared our method with the well-known existing methods in terms of visual quality and processing time. The experimental result indicates that the proposed method is superior to the existing method.

Adaptive Thresholding Technique for Binarization of License Plate Images

  • Kim, Min-Ki
    • Journal of the Optical Society of Korea
    • /
    • v.14 no.4
    • /
    • pp.368-375
    • /
    • 2010
  • Unlike document images, license plate images are mostly captured under uneven lighting conditions. In particular, a shadowed region has sharp intensity variation and sometimes that region has very high intensity by reflected light. This paper presents a new technique for thresholding license plate images. This approach consists of three parts. In the first part, it performs a rough thresholding and classifies the type of license plate to adjust some parameters optimally. Next, it identifies a shadow type and binarizes license plate images by adjusting the window size and location according to the shadow type. And finally, post-processing based on the cluster analysis is performed. Experimental results show that the proposed method outperformed five well-known methods.

A Study on Visual Factors Affecting Purchase of Convenient Store's Packed-meal on Mobile Application (모바일 애플리케이션을 통한 편의점 도시락 구매 과정에 영향을 미치는 시각 요소에 관한 연구)

  • Lee, Da-Hyun;Kim, Seung-In
    • Journal of Digital Convergence
    • /
    • v.18 no.11
    • /
    • pp.443-448
    • /
    • 2020
  • The purpose of research is identifying visual factors affecting purchase of convenient store packed-meals through mobile applications. Previous studies have not covered mobile applications on purchase of convenient store packed-meals and this inspired the research topic. Document analysis and online survey are mainly implemented and 4 visual factors; typography, product image, main color, brand logo have been set as a research variables. It is revealed that consumers recognize product image prior to the rest and their purchase intentions are most significantly affected by product image. In conclusion, the product image should encourage consumers to have expectation on packed-meal and need to deliver credibility at the same time. Hence, the application should be designed to solely spotlight product image to lead consumer's concentration on it. The research can be further expanded by including non-visual factors as its variables or increasing scale of survey samples.