Browse > Article
http://dx.doi.org/10.5351/KJAS.2019.32.2.301

Analysis of the National Police Agency business trends using text mining  

Sun, Hyunseok (Department of Applied Statistics, Chung-Ang University)
Lim, Changwon (Department of Applied Statistics, Chung-Ang University)
Publication Information
The Korean Journal of Applied Statistics / v.32, no.2, 2019 , pp. 301-317 More about this Journal
Abstract
There has been significant research conducted on how to discover various insights through text data using statistical techniques. In this study we analyzed text data produced by the Korean National Police Agency to identify trends in the work by year and compare work characteristics among local authorities by identifying distinctive keywords in documents produced by each local authority. A preprocessing according to the characteristics of each data was conducted and the frequency of words for each document was calculated in order to draw a meaningful conclusion. The simple term frequency shown in the document is difficult to describe the characteristics of the keywords; therefore, the frequency for each term was newly calculated using the term frequency-inverse document frequency weights. The L2 norm normalization technique was used to compare the frequency of words. The analysis can be used as basic data that can be newly for future police work improvement policies and as a method to improve the efficiency of the police service that also help identify a demand for improvements in indoor work.
Keywords
text-mining; unstructured format; the Korean National Police Agency; keyword extraction;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Leopold, E. and Kindermann, J. (2002). Text categorization with support vector machines. How to represent texts in input space?, Machine Learning, 46, 423-444.   DOI
2 Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, 3111-3119.
3 Nahm, U. Y. and Mooney, R. J. (2002). Text mining with information extraction. In Proceedings of the AAAI 2002 Spring Symposium on Mining Answers from Texts and Knowledge Bases, 60-67.
4 Park, E. L. and Cho, S. (2014). KoNLPy: Korean natural language processing in Python. In Proceedings of the 26th Annual Conference on Human & Cognitive Language Technology, 133-136.
5 Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., and Vanderplas, J. (2011). Scikit-learn: machine learning in Python, Journal of Machine Learning Research, 12, 2825-2830.
6 Pennington, J., Socher, R., and Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 1532-1543.
7 Python Software Foundation (2017). Python Language Reference, version 3.6. Available from: http://www.python.org
8 Ramos, J. (2003). Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning, 242, 133-142.
9 Song, H. J., Park, K. S., Jung, H. E., and Song, M. (2013). Trend Analysis of Korean Economy in the Economic Literature by text mining techniques. In Proceedings of the 20th Conference on Korea Society for Information Management, 47-50.
10 Sulova, S., Todoranova, L., Penchev, B., and Nacheva, Radka. (2017). Using text mining to classify research papers. DOI:10.5593/SGEM2017/21/S07.083
11 Talib, R., Hanif, M. K., Ayesha, S., and Fatima, F. (2016). Text mining: techniques, applications and issues, International Journal of Advanced Computer Science & Applications, 1, 414-418.
12 Bae, J. H., Son, J. E., and Song, M. (2013). Analysis of Twitter for 2012 South Korea Presidential Election by text mining techniques, Journal of Intelligence and Information Systems, 19, 141-156.
13 Berry, M. W. (2004). Survey of text mining, Computing Reviews, 45, 548.
14 Cho, S. G. and Kim, S. B. (2011). Finding meaningful pattern of key words in IIE transactions using text mining. In 2011 Fall Conference Proceedings of Korean Institute of Industrial Engineers, 443-452.
15 Grimes, S. (2008). Unstructured data and the 80 percent rule, Carabridge Bridgepoints, 10.
16 Kothe, G. (1983). Topological vector spaces. In Topological Vector Spaces I, Springer, Berlin, Heidelberg, 123-201