• Title/Summary/Keyword: Document Reading

Search Result 64, Processing Time 0.027 seconds

Stroke Width-Based Contrast Feature for Document Image Binarization

  • Van, Le Thi Khue;Lee, Gueesang
    • Journal of Information Processing Systems
    • /
    • v.10 no.1
    • /
    • pp.55-68
    • /
    • 2014
  • Automatic segmentation of foreground text from the background in degraded document images is very much essential for the smooth reading of the document content and recognition tasks by machine. In this paper, we present a novel approach to the binarization of degraded document images. The proposed method uses a new local contrast feature extracted based on the stroke width of text. First, a pre-processing method is carried out for noise removal. Text boundary detection is then performed on the image constructed from the contrast feature. Then local estimation follows to extract text from the background. Finally, a refinement procedure is applied to the binarized image as a post-processing step to improve the quality of the final results. Experiments and comparisons of extracting text from degraded handwriting and machine-printed document image against some well-known binarization algorithms demonstrate the effectiveness of the proposed method.

The Influences of Reading Type, Line Length, and Interlinear Spacing on the Legibility of Korean Web Documents (읽기 형태, 줄 길이, 줄 간격이 한글 웹 문서의 가독성에 미치는 영향)

  • Shin, Jong-Hyun;Park, Min-Yong
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.29 no.3
    • /
    • pp.197-205
    • /
    • 2003
  • Many people get plenty of information from World Wide Web, and the study of the factors that affect on reading task on web browser is presenting important issue. But domestic studies on legibility of Korean on web environment were relatively poor and the study about suitable text layout for skimming wasn't carried out also. At this point, this study was performed to investigate the effects of two types of reading, three levels of line length, and three levels of interlinear spacing on comprehension and reading rate when subjects read the materials on web browser. Reading speed, error rate, subjective preference and SACL(Stress and Arousal Checklist) evaluation were measured to evaluate the effects. Eighteen volunteer subjects participated in eighteen web document sessions with two different reading types, three different line lengths, and three different interlinear spacings. Statistical results from objective and subjective evaluations indicate that 50 characters per line of line length and 100 percents of interlinear spacing improved reading rate, overall error rates were reduced when reading normally, and SACL measures were increased at fast reading type. Consequently, in order to design text layout to retrieve information in WWW environment effectively, just applying guidelines of traditional printed material is not proper. Therefore, it is effective to consider reading type, line length, and interlinear spacing. Implications of these results and suggestions for the further study are also addressed.

XML-Based EDI Document Processing System (XML 기반 EDI 문서 처리 시스템)

  • Cho, Hui-Kyoung;Chin, Sung-Geun;Jung, Hoe-Kyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.4
    • /
    • pp.829-834
    • /
    • 2012
  • This paper is about the system to process XML-based EDI e-document. This system does not use the script file when translating EDI e-document into the XML-based EDI-document. We design and implement the scanner and mapper which are the e-document processors with binary format used when reading and writing the documents. Also, we design and implement the mapping tools that graphically define the translation rules among e-documents. Therefore, the proposed XML-based EDI e-document processing system has characteristics advantages of XML and more benefits than the previous EDI e-document processing system such as faster speed, convenience, and better adaptability. Due to these advantages, this system will be widely used as the B2B gateway system.

Deep Learning Document Analysis System Based on Keyword Frequency and Section Centrality Analysis

  • Lee, Jongwon;Wu, Guanchen;Jung, Hoekyung
    • Journal of information and communication convergence engineering
    • /
    • v.19 no.1
    • /
    • pp.48-53
    • /
    • 2021
  • Herein, we propose a document analysis system that analyzes papers or reports transformed into XML(Extensible Markup Language) format. It reads the document specified by the user, extracts keywords from the document, and compares the frequency of keywords to extract the top-three keywords. It maintains the order of the paragraphs containing the keywords and removes duplicated paragraphs. The frequency of the top-three keywords in the extracted paragraphs is re-verified, and the paragraphs are partitioned into 10 sections. Subsequently, the importance of the relevant areas is calculated and compared. By notifying the user of areas with the highest frequency and areas with higher importance than the average frequency, the user can read only the main content without reading all the contents. In addition, the number of paragraphs extracted through the deep learning model and the number of paragraphs in a section of high importance are predicted.

Implementation & Usability Evaluation of Math Expression Reader for Domestic Reading Disables (국내 독서장애인을 위한 Math Expression Reader의 구현 및 사용성 평가)

  • Lee, Jae-Hwa;Lee, Jong-Woo;Lim, Soon-Bum
    • Journal of Korea Multimedia Society
    • /
    • v.15 no.7
    • /
    • pp.951-961
    • /
    • 2012
  • E-books produced in the country provide limited audio service for reading disables. The reason is that those books cannot translate the mathematical expressions and symbols in the context. In this paper, the 'Math Expression Reader' was implemented that can translate the expressions and symbols in the document into Korean speech for those who have reading disabilities. The math to speech generated by this program has been tested to both the public and reading disables and the results of this test has been compared whether they can exactly understand the speech and evaluated the reading rules.

A Study on Ambiguity Resolving for Pen-based Proofreading of Web Documents (펜 기반 웹 문서 교정을 위한 모호성 문제 해결에 관한 연구)

  • Sohn, Won-Sung
    • Journal of The Korean Association of Information Education
    • /
    • v.11 no.1
    • /
    • pp.107-116
    • /
    • 2007
  • To produce accurate editing results, the ambiguity of editing scopes related to marked correction signs should be solved. Proofreading the web document modifies the document structures, and the modified structures should be robustly valid for the defined DTD. This paper presents a pen-based proof-reading interface in the XML document. In the proposed interface, correction signs are free-drawn, and the editing scopes are recognized and revised based on the contexts of the document to minimize the ambiguity of the editing scopes. The proposed interface provides both implicit and explicit modification methods for document structures. As a result, the editing scopes processed in the proposed interface are more accurate, and the document structures are maintained valid for DTD after the editing.

  • PDF

Security Elevation of XML Document Using DTD Digital Signature (DTD 전자서명을 이용한 XML문서의 보안성 향상)

  • 김형균;오무송
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2002.11a
    • /
    • pp.592-596
    • /
    • 2002
  • Can speak that DTD is meta data that define meaning of expressed data on XML document. Therefore, In case DTD information is damaged this information to base security of XML document dangerous. Not that attach digital signature on XML document at send-receive process of XML document in this research, proposed method to attach digital signature to DTD. As reading DTD file to end first, do parsing, and store abstracted element or attribute entitys in hash table. Read hash table and achieve message digest if parsing is ended. Compose and create digital signature with individual key after achievement. When sign digital, problem that create entirely other digest cost because do not examine about order that change at message digest process is happened. This solved by method to create DTD's digital signature using DOM that can embody tree structure for standard structure and document.

  • PDF

XML Document Keyword Weight Analysis based Paragraph Extraction Model (XML 문서 키워드 가중치 분석 기반 문단 추출 모델)

  • Lee, Jongwon;Kang, Inshik;Jung, Hoekyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.11
    • /
    • pp.2133-2138
    • /
    • 2017
  • The analysis of existing XML documents and other documents was centered on words. It can be implemented using a morpheme analyzer, but it can classify many words in the document and cannot grasp the core contents of the document. In order for a user to efficiently understand a document, a paragraph containing a main word must be extracted and presented to the user. The proposed system retrieves keyword in the normalized XML document. Then, the user extracts the paragraphs containing the keyword inputted for searching and displays them to the user. In addition, the frequency and weight of the keyword used in the search are informed to the user, and the order of the extracted paragraphs and the redundancy elimination function are minimized so that the user can understand the document. The proposed system can minimize the time and effort required to understand the document by allowing the user to understand the document without reading the whole document.

Conceptual Extraction of Compound Korean Keywords

  • Lee, Samuel Sangkon
    • Journal of Information Processing Systems
    • /
    • v.16 no.2
    • /
    • pp.447-459
    • /
    • 2020
  • After reading a document, people construct a concept about the information they consumed and merge multiple words to set up keywords that represent the material. With that in mind, this study suggests a smarter and more efficient keyword extraction method wherein scholarly journals are used as the basis for the establishment of production rules based on a concept information of words appearing in a document in a way in which author-provided keywords are functional although they do not appear in the body of the document. This study presents a new way to determine the importance of each keyword, excluding non-relevant keywords. To identify the validity of extracted keywords, titles and abstracts of journals about natural language and auditory language were collected for analysis. The comparison of author-provided keywords with the keyword results of the developed system showed that the developed system was highly useful, with an accuracy rate as good as up to 96%.

Active Documents: Programs by Form Designers (능동문서: 서식설계자의 프로그램)

  • Nam, Chul-Ki;Bae, Jae-Hak;Yoo, Hae-Young
    • The KIPS Transactions:PartB
    • /
    • v.10B no.6
    • /
    • pp.599-610
    • /
    • 2003
  • The Web plays an important role as information source and most Web applications are document-centric. A document implies an intention of its own designer, which can be utilized actively in automation of business processes. Through an understanding of an intrinsic nature of a document function, we can see a document as an executable computer program in a special case. For this approach, we propose an active document model that is composed of form, knowledge base, rules, and queries. For reusability and interoperability of a document, each component of the proposed model is uniformly represented in XML. The proposed active document not only plays a passive role in providing user interfaces, but also is a document that a machine can infer and process with reading a procedure of document processing and business rules intended by document designers. Through this approach, document can interact with machines and can cooperate with other applications. For applicability of our active document, we show a case study for the processing of purchase orders in a B2B e-Commerce system. This paper is expected to provide the framework of accelerating the development of intelligent applications through our approach regards form document as a computer program. In short, the proposed active document contains knowledge representation and processing method, consequently our document will play an important role in providing a concept of document of pursuing in Semantic Web.