Figure 2.1. Text mining procedure for National Police Agency business analysis.
Figure 2.2. Example of Bag-of-words vector representation on text.
Figure 2.3. Histogram of the top 300 words in the National Police Agency’s business report texts.
Figure 3.1. Word clouds about each topic from National Police Agency White Paper.
Figure 3.2. Time series plot of words with (a) upward and (b) downward trend in Topic2-“Background and result of Police business”.
Figure 3.3. Time series plot of words with (a) upward and (b) downward trend in Topic3-“Traffic safety and Police business”.
Figure 3.4. Time series plot of words with upward trend in Topic4-“Public safety and Police business”.
Figure 3.5. Time series plot of words with (a) upward and (b) downward trend in Topic6-“Social security and Police business”.
Figure 3.6. Time series plot of words with (a) upward and (b) downward trend in Topic7-“Police business in globalization”.
Figure 3.7. Word clouds for each Metropolitan Police Agency.
Table 2.1. Configuration of text data used for National Police Agency business analysis
Table 3.1. 7 topics from National Police Agency White Paper and top keywords in each topic
Table 3.2. Top 10 words by freqeuncy for each Metropolitan Police Agency
Table 3.3. Top 10 keywords by TF-IDF for each Metropolitan Police Agency
Table 3.4. Top 10 business-related keywords by TF-IDF for each Metropolitan Police Agency
참고문헌
- Bae, J. H., Son, J. E., and Song, M. (2013). Analysis of Twitter for 2012 South Korea Presidential Election by text mining techniques, Journal of Intelligence and Information Systems, 19, 141-156.
- Berry, M. W. (2004). Survey of text mining, Computing Reviews, 45, 548.
- Cho, S. G. and Kim, S. B. (2011). Finding meaningful pattern of key words in IIE transactions using text mining. In 2011 Fall Conference Proceedings of Korean Institute of Industrial Engineers, 443-452.
- Grimes, S. (2008). Unstructured data and the 80 percent rule, Carabridge Bridgepoints, 10.
- Kothe, G. (1983). Topological vector spaces. In Topological Vector Spaces I, Springer, Berlin, Heidelberg, 123-201
- Leopold, E. and Kindermann, J. (2002). Text categorization with support vector machines. How to represent texts in input space?, Machine Learning, 46, 423-444. https://doi.org/10.1023/A:1012491419635
- Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, 3111-3119.
- Nahm, U. Y. and Mooney, R. J. (2002). Text mining with information extraction. In Proceedings of the AAAI 2002 Spring Symposium on Mining Answers from Texts and Knowledge Bases, 60-67.
- Park, E. L. and Cho, S. (2014). KoNLPy: Korean natural language processing in Python. In Proceedings of the 26th Annual Conference on Human & Cognitive Language Technology, 133-136.
- Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., and Vanderplas, J. (2011). Scikit-learn: machine learning in Python, Journal of Machine Learning Research, 12, 2825-2830.
- Pennington, J., Socher, R., and Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 1532-1543.
- Python Software Foundation (2017). Python Language Reference, version 3.6. Available from: http://www.python.org
- Ramos, J. (2003). Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning, 242, 133-142.
- Song, H. J., Park, K. S., Jung, H. E., and Song, M. (2013). Trend Analysis of Korean Economy in the Economic Literature by text mining techniques. In Proceedings of the 20th Conference on Korea Society for Information Management, 47-50.
- Sulova, S., Todoranova, L., Penchev, B., and Nacheva, Radka. (2017). Using text mining to classify research papers. DOI:10.5593/SGEM2017/21/S07.083
- Talib, R., Hanif, M. K., Ayesha, S., and Fatima, F. (2016). Text mining: techniques, applications and issues, International Journal of Advanced Computer Science & Applications, 1, 414-418.