DOI QR코드

DOI QR Code

Deep Learning Document Analysis System Based on Keyword Frequency and Section Centrality Analysis

  • Received : 2021.02.22
  • Accepted : 2021.03.02
  • Published : 2021.03.31

Abstract

Herein, we propose a document analysis system that analyzes papers or reports transformed into XML(Extensible Markup Language) format. It reads the document specified by the user, extracts keywords from the document, and compares the frequency of keywords to extract the top-three keywords. It maintains the order of the paragraphs containing the keywords and removes duplicated paragraphs. The frequency of the top-three keywords in the extracted paragraphs is re-verified, and the paragraphs are partitioned into 10 sections. Subsequently, the importance of the relevant areas is calculated and compared. By notifying the user of areas with the highest frequency and areas with higher importance than the average frequency, the user can read only the main content without reading all the contents. In addition, the number of paragraphs extracted through the deep learning model and the number of paragraphs in a section of high importance are predicted.

Keywords

References

  1. H. S. Lee and J. D. Kim, "A design of similar video recommendation system using extracted words in big data cluster," Journal of the Korea Institute of Information and Communication Engineering, vol. 24, no. 2, pp. 172-178, 2020. UCI(KEPA): I410-ECN-0101-2020-004-000454079. https://doi.org/10.6109/JKIICE.2020.24.2.172
  2. G. X. Wang and S. Y. Shin, "An improved text classification method for sentiment classification," Journal of Information and Communication Convergence Engineering, vol. 17, no. 1, pp. 41-48, 2019. DOI: 10.6109/jicce.2019.17.1.41.
  3. X. F. Wang and H. C. Kim, "Text categorization with improved deep learning methods," Journal of Information and Communication Convergence Engineering, vol. 16, no. 2, pp. 106-113, 2018. UCI(KEPA): I410-ECN-0101-2018-004-003109645. https://doi.org/10.6109/JICCE.2018.16.2.106
  4. P. Patel and A. Thakkar, "The upsurge of deep learning for computer vision applications," International Journal of Electrical and Computer Engineering, vol. 10, no. 1, pp. 538-548, 2020. DOI: 10.11591/ijece.v10i1.pp538-548.
  5. M. A. Jishan, K. R. Mahmud, and A. K. A. Azad, "Natural language description of images using hybrid recurrent neural network," International Journal of Electrical and Computer Engineering, vol. 9, no. 4, pp. 2932-2940, 2019. DOI: 10.11591/ijece.v9i4.pp2932-2940.
  6. H. S. Yi, K. H. N. Bui, and C. N. Seon, "A deep learning LSTM framework for urban traffic flow and fine dust prediction," Journal of Korean Institute of Information Scientists and Engineering, vol. 47, no. 3, pp. 292-297, 2020. DOI: 10.5626/JOK.2020.47.3.292.
  7. B. C. Kim, S. H. Jung, M. S. Kim, J. G. Kim, H. S. Lee, and S. S. Kim, "Solar power generation forecasting based on LSTM considering weather conditions," Journal of Korean Institute of Intelligent Systems, vol. 30, no. 1, pp. 7-12, 2020. DOI: 10.5391/JKIIS.2020.30.1.7.
  8. N. P. Shetty, B. Muniyal, A. Anand, S. Kumar, and S. Prabhu, "Predicting depression using deep learning and ensemble algorithms on raw twitter data," International Journal of Electrical and Computer Engineering, vol. 10, no. 4, pp. 3751-3756, 2020. DOI: 10.11591/ijece.v10i4.pp3751-3756.