Browse > Article
http://dx.doi.org/10.6109/jkiice.2019.23.4.401

Document Analysis based Main Requisite Extraction System  

Lee, Jongwon (Department of Computer Engineering, Paichai University)
Yeo, Ilyeon (Department of Computer Engineering, Paichai University)
Jung, Hoekyung (Department of Computer Engineering, Paichai University)
Abstract
In this paper, we propose a system for analyzing documents in XML format and in reports. The system extracts the paper or reports of keywords, shows them to the user, and then extracts the paragraphs containing the keywords by inputting the keywords that the user wants to search within the document. The system checks the frequency of keywords entered by the user, calculates weights, and removes paragraphs containing only keywords with the lowest weight. Also, we divide the refined paragraphs into 10 regions, calculate the importance of the paragraphs per region, compare the importance of each region, and inform the user of the main region having the highest importance. With these features, the proposed system can provide the main paragraphs with higher compression ratio than analyzing the papers or reports using the existing document analysis system. This will reduce the time required to understand the document.
Keywords
Paragraph Extraction; Document Analysis; Sequence Maintenance; Deduplication; Keyword;
Citations & Related Records
Times Cited By KSCI : 4  (Citation Analysis)
연도 인용수 순위
1 J. R. Li, E. H. Lee, and J. H. Lee, "Sequence-to-sequence based Morphological Analysis and Part-Of-Speech Tagging for Korean Language with Convolutional Features," Journal of Korean Institute of Information Scientists and Engineering, vol. 44, no. 1, pp. 57-62, Jan. 2017.
2 K. S. Shim, "Cloning of Korean Morphological Analyzers using Pre-analyzed Eojeol Dictionary and Syllable-based Probabilistic Model," Journal of Korean Institute of Information Scientists and Engineering, vol. 22, no. 3, pp. 119-126, Mar. 2016.
3 J. W. Lee, I. S. Kang, and H. K Jung, "XML Document Keyword Weight Analysis based Paragraph Extraction Model," Journal of the Korea Institute of Information and Communication Engineering, vol. 21, no. 11, Nov. 2017.
4 U. S. Gim, S. H. Choi, and J. H. Cho, "An impact analysis of FMD news on pork demand in korea," Journal of The Korean Journal of Community Living Science, vol. 26, no. 1, pp. 75-85, Feb. 2015.   DOI
5 J. H. Lee, K. S. Song, J. A. Kang, and J. R. Hwang, "A study on the efficient extraction method of SNS data related to crime risk factor," Journal of The Korea Society of Computer and Information, vol. 20, no. 1, pp. 255-263, Jan. 2015.   DOI
6 H. Y. Lee, J. S. Lee, B. D. Kang, and S. W. Yang, "Functional Expansion of Morphological Analyzer Based on Longest Phrase Matching For Efficient Korean Parsing," Journal of Digital Contents Society, vol. 17, no. 3, pp. 203-210, Jun. 2016.   DOI
7 J. Y. Lee, J. H. Lee, and Y. H. Park, "A design and implementation of the management system for number of keyword searching results using Google searching engine," Journal of the Korea Institute of Information and Communication Engineering, vol. 20, no. 5, pp. 880-886, May. 2016.   DOI
8 S. H. Na, J. I. Kim, E. J. Lee, and P. K. Kim, "A Study on the Short Text Categorization using SNS Feature Informations," Journal of Korean Institute of Information Technology, vol. 14, no. 6, pp. 159-165, Jun. 2016.
9 J. W. Lee, I. S. Kang, and H. K. Jung "XML Document Keyword Weight Analysis based Paragraph Extraction Model," Journal of the Korea Institute of Information and Communication Engineering, vol. 21, no. 11, pp. 2133-2138, Nov. 2017.   DOI