Fig. 1. Example of problematic PDF file whenextracting text
Fig. 2. Topic Modeling based on Security Intelligence Report
Fig. 3. Result of putting test document in TopicModeling
Table 1. When a PDF document is simply extracted as text
Table 2. This is an example of extracting the same PDF document by the method developed in this task
Table 3. Topic by bag-of- words
Table 4. Security Intelligence Report Topic Automatic Extraction Model Satisfaction Evaluation Question
Table 5. Security Intelligence Report Topic Automatic Extraction Model satisfaction
References
- S. Y. Lee. (2018. 06. 18). Microsoft Announces Cyber Security Threat Report. News of SecuN, p. 1.
- T. K. Kim &H. R Choi &H. C. Lee. (2016). A Study on the Research Trends inFintech using Topic Modeling. Journal of the Korea Academia-Industrial cooperation Society, 7(11), 670-681. DOI :10.5762/KAIS.2016.17.11.670
- L. Hong & B. D. Davison. (2010, July). Empirical study of topic modeling in twitter. In Proceedings of the first workshop onsocial media analytics(ACM), 80-88.
- N. C. Ho .(2016). An Illustrative Application of Topic Modeling Method to a Farmer's Diary. INSTITUTE OFCROSS-CULTURAL STUDIES, 22(1), 89-135.
- R. Krestel, P. Fankhauser & W. Nejdl. (2009, October). Latentdirichlet allocation for tag recommendation. In Proceedings of the third ACM conference on Recommender systems, 61-68.
- Y. A Hur, D. Y. Lee, K. K. Kim, W. H. Yu & H. S. Lim. (2017). A System for Automatic Classification of Traditional Culture Texts. Journal of the Korea Convergence Society, 8(12), 39-47. https://doi.org/10.15207/JKCS.2017.8.12.039
- B. I. Kang, M. Song, W. Jho. (2013). A Study on Opinion Mining of News paper Texts based on Topic Modeling. Journal of The Korean Society For Library And Information Science, 47(4), 315-334. https://doi.org/10.4275/KSLIS.2013.47.4.315
- J. H. Bae, N. G. Han & M. Song (2014). Twitter Issue Tracking System by Topic Modeling Techniques. Journal of Intelligence and Information System, 20(20), 109-122.
- H. G Kim, S. U. Kim & S. T. Kim. (2018). Topic Modeling of Media Reports on Smartphone Addiction - A Study on the Comparison of Government Policies between 2010 and 2018. Korean Association for Braodcasting & Telecommunication Studies, 104, 38-62.
- N. Potha & E. Stamatatos. (2019). Improving author verification based on topic modeling. Journal of the Association for Information Science and Technology, 0(0), 1-15. DOI :10.1002/asi.24183
- H. H. Gill. (2018) The Study of Korean Stopwords list for Textmining, URIMALGEUL: The Korean Language and Literature, 78, 1-25. https://doi.org/10.18628/urimal.78..201809.1
- H. M. Wallach. (2006). Topic modeling: beyond bag-of-words. In Proceedings of the 23rd international conference on Machinelearning(ACM), 977-984.
- J. Yang, Y. G. Jiang, A. G. Hauptmann & C. W. Ngo. (2007). Evaluating bag-of-visual-words representations in scene classification. In Proceedings of the international workshop on Workshop on multimedia information retrieval(ACM), 197-206.
- D. M. Blei, A. Y. Ng & M. I. Jordan. (2003). Latent Dirichlet Allocation, Journal of Machine Learning Research, 3(Jan), 993-1022. DOI: 10.1162/jmlr.2003.3.4.-5.993
- Y. Guo, S. J. Barnes & Q. Jia. (2017). Mining meaning from online ratings and reviews: Tourist satisfaction analysis using latent dirichlet allocation, Tourism Management, 59, 467-483. https://doi.org/10.1016/j.tourman.2016.09.009