• Title/Summary/Keyword: Text line information

Search Result 147, Processing Time 0.023 seconds

Design and Implementation of Learning System for Generating Multimedia Contents at On-Line$\cdot$Mobile Environment (온라인$\cdot$모바일 환경에서 멀티미디어 컨텐츠 생성을 위한 학습 시스템의 설계 및 구현에 관한 연구)

  • Lee Hyun Chang;Choi Kwang Don
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.1 s.33
    • /
    • pp.217-222
    • /
    • 2005
  • The on-line and mobile communication technologies provide an environment to make users share information on the rove. However learning on a file received from on-line or mobile internet environment is able to read only, According to this, users cannot use various learning methods to make multimedia contents for learning like coloring and underlining considerable parts. Also, in case of storing, it cannot be stored in a standard file format HTML. Therefore, in this paper, we suggest a new learning platform to be able to change text contents in a web documents and implement a prototype system to process learning system in on-line environment

  • PDF

A methodology for XML documentation of the structural calculation document to build database supporting safety management of infrastructures (사회기반시설물 안전관리 지원 데이터베이스 구축을 위한 구조계산서의 XML 문서화 방법론)

  • Kim, Bong-Geun;Park, Sang-Il;Lee, Jin-Hoon;Lee, Sang-Ho
    • 한국방재학회:학술대회논문집
    • /
    • 2007.02a
    • /
    • pp.414-417
    • /
    • 2007
  • A methodology for XML documentation of the structural calculation document is presented to support manipulation of the design information on the internet. The text file format is chosen as a neutral format because it can be easily translated from office documents generated from engineering practice. The first word of each line is compared with the reserved numbering groups, and relative levels among the lines are defined to generate the hierarchically structured XML document of the text file. The demonstration subjected to sample general documents and structural calculation documents shows that the prototype application module based on the developed methodology can be adopted to build the database of design information which supports the safety management of infrastructures.

  • PDF

A Study of Incremental and Multiple Entry Support Parser for Multi View Editing Environment (다중 뷰 편집환경을 위한 점진적 다중진입 지원 파서에 대한 연구)

  • Yeom, Saehun;Bang, Hyeja
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.14 no.3
    • /
    • pp.21-28
    • /
    • 2018
  • As computer performance and needs of user convenience increase, computer user interface are also changing. This changes had great effects on software development environment. In past, text editors like vi or emacs on UNIX OS were the main development environment. These editors are very strong to edit source code, but difficult and not intuitive compared to GUI(Graphical User Interface) based environment and were used by only some experts. Moreover, the trends of software development environment was changed from command line to GUI environment and GUI Editor provides usability and efficiency. As a result, the usage of text based editor had decreased. However, because GUI based editor use a lot of computer resources, computer performance and efficiency are decreasing. The more contents are, the more time to verify and display the contents it takes. In this paper, we provide a new parser that provide multi view editing, incremental parsing and multiple entry of abstract syntax tree.

Purchase Information Extraction Model From Scanned Invoice Document Image By Classification Of Invoice Table Header Texts (인보이스 서류 영상의 테이블 헤더 문자 분류를 통한 구매 정보 추출 모델)

  • Shin, Hyunkyung
    • Journal of Digital Convergence
    • /
    • v.10 no.11
    • /
    • pp.383-387
    • /
    • 2012
  • Development of automated document management system specified for scanned invoice images suffers from rigorous accuracy requirements for extraction of monetary data, which necessiate automatic validation on the extracted values for a generative invoice table model. Use of certain internal constraints such as "amount = unit price times quantity" is typical implementation. In this paper, we propose a noble invoice information extraction model with improved auto-validation method by utilizing table header detection and column classification.

A Novel Technique of Topic Detection for On-line Text Documents: A Topic Tree-based Approach (온라인 텍스트문서의 계층적 트리 기반 주제탐색 기법)

  • Xuan, Man;Kim, Han-Joon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.11a
    • /
    • pp.396-399
    • /
    • 2012
  • Topic detection is a problem of discovering the topics of online publishing documents. For topic detection, it is important to extract correct topic words and to show the topical words easily to understand. We consider a topic tree-based approach to more effectively and more briefly show the result of topic detection for online text documents. In this paper, to achieve the topic tree-based topic detection, we propose a new term weighting method, called CTF-CDF-IDF, which is simple yet effective. Moreover, we have modified a conventional clustering method, which we call incremental k-medoids algorithm. Our experimental results with Reuters-21578 and Google news collections show that the proposed method is very useful for topic detection.

Automatic Word-Segmentation at Line-Breaks for Korean Text Processing (한국어 텍스트 처리를 위한 줄 경계 띄어쓰기 복원)

  • 정영미;이재윤
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 1999.08a
    • /
    • pp.21-24
    • /
    • 1999
  • 한국어 텍스트의 줄 경계에서의 띄어쓰기 복원을 위해 음절쌍 통계를 이용한 복원 기법을 설계하고 신문기사를 대상으로 통계 정보원과 음절쌍 위치에 따른 가중치를 달리하는 실험을 수행하였다. 실험 결과 처리 대상 기사를 포함하는 1개월 분 기사를 통계 정보원으로 하고 가중치는 균등하게 할 때 가장 높은 성공률을 얻었다. 이 결과는 디지털 원문을 텍스트 방식으로 소급하여 구축하는 경우에 적용될 수 있을 것이다.

  • PDF

A Study on Development of VUI(Voice User Interface) using VoiceXML (VoiceXML을 이용한 VUI 개발에 관한 연구)

  • 장민석;양운모
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.04a
    • /
    • pp.349-351
    • /
    • 2002
  • 한국현재의 컴퓨팅환경은 Text위주의 Command Line상에서의 입출력에서 GUI(Graphic User Interface)환경으로 전환되었다. 이는 사용자에게 좀더 친근한 방법으로의 컴퓨팅환경을 제공하고 있는 것이다. 하지만 아직까지 그러한 환경에 익숙해지기 위해서는 많은 습득시간이 필요하며 또한, 응용프로그램 간의 인터페이싱 기능 등을 익히기 위해서는 추가적인 학습을 통해야 원활한 작업을 수행할 수 있다. 이를 해결하고자 본 연구는 음성인식/ 합성과, 현재 음성마크업 언어인 VoiceXML 등을 통해서 모색해보고자 한다.

  • PDF

Text Detection and Recognition in Outdoor Korean Signboards for Mobile System Applications (모바일 시스템 응용을 위한 실외 한국어 간판 영상에서 텍스트 검출 및 인식)

  • Park, J.H.;Lee, G.S.;Kim, S.H.;Lee, M.H.;Toan, N.D.
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.46 no.2
    • /
    • pp.44-51
    • /
    • 2009
  • Text understand in natural images has become an active research field in the past few decades. In this paper, we present an automatic recognition system in Korean signboards with a complex background. The proposed algorithm includes detection, binarization and extraction of text for the recognition of shop names. First, we utilize an elaborate detection algorithm to detect possible text region based on edge histogram of vertical and horizontal direction. And detected text region is segmented by clustering method. Second, the text is divided into individual characters based on connected components whose center of mass lie below the center line, which are recognized by using a minimum distance classifier. A shape-based statistical feature is adopted, which is adequate for Korean character recognition. The system has been implemented in a mobile phone and is demonstrated to show acceptable performance.

Slab Region Localization for Text Extraction using SIFT Features (문자열 검출을 위한 슬라브 영역 추정)

  • Choi, Jong-Hyun;Choi, Sung-Hoo;Yun, Jong-Pil;Koo, Keun-Hwi;Kim, Sang-Woo
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.5
    • /
    • pp.1025-1034
    • /
    • 2009
  • In steel making production line, steel slabs are given a unique identification number. This identification number, Slab management number(SMN), gives information about the use of the slab. Identification of SMN has been done by humans for several years, but this is expensive and not accurate and it has been a heavy burden on the workers. Consequently, to improve efficiency, automatic recognition system is desirable. Generally, a recognition system consists of text localization, text extraction, character segmentation, and character recognition. For exact SMN identification, all the stage of the recognition system must be successful. In particular, the text localization is great important stage and difficult to process. However, because of many text-like patterns in a complex background and high fuzziness between the slab and background, directly extracting text region is difficult to process. If the slab region including SMN can be detected precisely, text localization algorithm will be able to be developed on the more simple method and the processing time of the overall recognition system will be reduced. This paper describes about the slab region localization using SIFT(Scale Invariant Feature Transform) features in the image. First, SIFT algorithm is applied the captured background and slab image, then features of two images are matched by Nearest Neighbor(NN) algorithm. However, correct matching rate can be low when two images are matched. Thus, to remove incorrect match between the features of two images, geometric locations of the matched two feature points are used. Finally, search rectangle method is performed in correct matching features, and then the top boundary and side boundaries of the slab region are determined. For this processes, we can reduce search region for extraction of SMN from the slab image. Most cases, to extract text region, search region is heuristically fixed [1][2]. However, the proposed algorithm is more analytic than other algorithms, because the search region is not fixed and the slab region is searched in the whole image. Experimental results show that the proposed algorithm has a good performance.

Application of ChatGPT text extraction model in analyzing rhetorical principles of COVID-19 pandemic information on a question-and-answer community

  • Hyunwoo Moon;Beom Jun Bae;Sangwon Bae
    • International journal of advanced smart convergence
    • /
    • v.13 no.2
    • /
    • pp.205-213
    • /
    • 2024
  • This study uses a large language model (LLM) to identify Aristotle's rhetorical principles (ethos, pathos, and logos) in COVID-19 information on Naver Knowledge-iN, South Korea's leading question-and-answer community. The research analyzed the differences of these rhetorical elements in the most upvoted answers with random answers. A total of 193 answer pairs were randomly selected, with 135 pairs for training and 58 for testing. These answers were then coded in line with the rhetorical principles to refine GPT 3.5-based models. The models achieved F1 scores of .88 (ethos), .81 (pathos), and .69 (logos). Subsequent analysis of 128 new answer pairs revealed that logos, particularly factual information and logical reasoning, was more frequently used in the most upvoted answers than the random answers, whereas there were no differences in ethos and pathos between the answer groups. The results suggest that health information consumers value information including logos while ethos and pathos were not associated with consumers' preference for health information. By utilizing an LLM for the analysis of persuasive content, which has been typically conducted manually with much labor and time, this study not only demonstrates the feasibility of using an LLM for latent content but also contributes to expanding the horizon in the field of AI text extraction.