DOI QR코드

DOI QR Code

Analysis of the National Police Agency business trends using text mining

텍스트 마이닝 기법을 이용한 경찰청 업무 트렌드 분석

  • Sun, Hyunseok (Department of Applied Statistics, Chung-Ang University) ;
  • Lim, Changwon (Department of Applied Statistics, Chung-Ang University)
  • 선현석 (중앙대학교 응용통계학과) ;
  • 임창원 (중앙대학교 응용통계학과)
  • Received : 2018.10.15
  • Accepted : 2019.01.31
  • Published : 2019.04.30

Abstract

There has been significant research conducted on how to discover various insights through text data using statistical techniques. In this study we analyzed text data produced by the Korean National Police Agency to identify trends in the work by year and compare work characteristics among local authorities by identifying distinctive keywords in documents produced by each local authority. A preprocessing according to the characteristics of each data was conducted and the frequency of words for each document was calculated in order to draw a meaningful conclusion. The simple term frequency shown in the document is difficult to describe the characteristics of the keywords; therefore, the frequency for each term was newly calculated using the term frequency-inverse document frequency weights. The L2 norm normalization technique was used to compare the frequency of words. The analysis can be used as basic data that can be newly for future police work improvement policies and as a method to improve the efficiency of the police service that also help identify a demand for improvements in indoor work.

최근 통계적인 기법을 이용하여 대량으로 생산되고 있는 텍스트 데이터를 통해 다양한 인사이트 발굴을 하기 위한 연구가 활발히 진행되고 있다. 본 연구는 경찰청에서 생산하는 텍스트 데이터를 통해 연도별 경찰청의 업무 트렌드를 파악하고, 각 지방청별로 생산되는 문서에서 주요 키워드를 파악하여 지방청 간의 업무 특성을 비교하고자 하였다. 의미 있는 결론을 도출하기 위해 각 자료 특성에 맞는 전처리 과정을 시행하고 문서별 단어 빈도수를 계산하였다. 문서에 나타난 키워드의 단순 출현 빈도로는 해당 키워드가 문서에서 갖는 중요도를 설명하기 힘들기 때문에 단어-역문서 가중치를 이용하여 각 단어에 대한 빈도수를 새롭게 계산하였고 단어의 문서별 및 연도별 빈도 비교를 위해 L2 정규화 기법을 이용하였다. 이러한 분석은 향후 경찰청 업무 개선 정책에 새롭게 활용될 수 있는 기초 자료로 사용될 수 있으며, 경찰청 업무 효율성 향상 및 청내 업무 개선 수요 파악을 위한 방법으로 활용될 수 있다.

Keywords

GCGHDE_2019_v32n2_301_f0001.png 이미지

Figure 2.1. Text mining procedure for National Police Agency business analysis.

GCGHDE_2019_v32n2_301_f0002.png 이미지

Figure 2.2. Example of Bag-of-words vector representation on text.

GCGHDE_2019_v32n2_301_f0003.png 이미지

Figure 2.3. Histogram of the top 300 words in the National Police Agency’s business report texts.

GCGHDE_2019_v32n2_301_f0004.png 이미지

Figure 3.1. Word clouds about each topic from National Police Agency White Paper.

GCGHDE_2019_v32n2_301_f0005.png 이미지

Figure 3.2. Time series plot of words with (a) upward and (b) downward trend in Topic2-“Background and result of Police business”.

GCGHDE_2019_v32n2_301_f0006.png 이미지

Figure 3.3. Time series plot of words with (a) upward and (b) downward trend in Topic3-“Traffic safety and Police business”.

GCGHDE_2019_v32n2_301_f0007.png 이미지

Figure 3.4. Time series plot of words with upward trend in Topic4-“Public safety and Police business”.

GCGHDE_2019_v32n2_301_f0008.png 이미지

Figure 3.5. Time series plot of words with (a) upward and (b) downward trend in Topic6-“Social security and Police business”.

GCGHDE_2019_v32n2_301_f0009.png 이미지

Figure 3.6. Time series plot of words with (a) upward and (b) downward trend in Topic7-“Police business in globalization”.

GCGHDE_2019_v32n2_301_f0010.png 이미지

Figure 3.7. Word clouds for each Metropolitan Police Agency.

Table 2.1. Configuration of text data used for National Police Agency business analysis

GCGHDE_2019_v32n2_301_t0001.png 이미지

Table 3.1. 7 topics from National Police Agency White Paper and top keywords in each topic

GCGHDE_2019_v32n2_301_t0002.png 이미지

Table 3.2. Top 10 words by freqeuncy for each Metropolitan Police Agency

GCGHDE_2019_v32n2_301_t0003.png 이미지

Table 3.3. Top 10 keywords by TF-IDF for each Metropolitan Police Agency

GCGHDE_2019_v32n2_301_t0004.png 이미지

Table 3.4. Top 10 business-related keywords by TF-IDF for each Metropolitan Police Agency

GCGHDE_2019_v32n2_301_t0005.png 이미지

References

  1. Bae, J. H., Son, J. E., and Song, M. (2013). Analysis of Twitter for 2012 South Korea Presidential Election by text mining techniques, Journal of Intelligence and Information Systems, 19, 141-156.
  2. Berry, M. W. (2004). Survey of text mining, Computing Reviews, 45, 548.
  3. Cho, S. G. and Kim, S. B. (2011). Finding meaningful pattern of key words in IIE transactions using text mining. In 2011 Fall Conference Proceedings of Korean Institute of Industrial Engineers, 443-452.
  4. Grimes, S. (2008). Unstructured data and the 80 percent rule, Carabridge Bridgepoints, 10.
  5. Kothe, G. (1983). Topological vector spaces. In Topological Vector Spaces I, Springer, Berlin, Heidelberg, 123-201
  6. Leopold, E. and Kindermann, J. (2002). Text categorization with support vector machines. How to represent texts in input space?, Machine Learning, 46, 423-444. https://doi.org/10.1023/A:1012491419635
  7. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, 3111-3119.
  8. Nahm, U. Y. and Mooney, R. J. (2002). Text mining with information extraction. In Proceedings of the AAAI 2002 Spring Symposium on Mining Answers from Texts and Knowledge Bases, 60-67.
  9. Park, E. L. and Cho, S. (2014). KoNLPy: Korean natural language processing in Python. In Proceedings of the 26th Annual Conference on Human & Cognitive Language Technology, 133-136.
  10. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., and Vanderplas, J. (2011). Scikit-learn: machine learning in Python, Journal of Machine Learning Research, 12, 2825-2830.
  11. Pennington, J., Socher, R., and Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 1532-1543.
  12. Python Software Foundation (2017). Python Language Reference, version 3.6. Available from: http://www.python.org
  13. Ramos, J. (2003). Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning, 242, 133-142.
  14. Song, H. J., Park, K. S., Jung, H. E., and Song, M. (2013). Trend Analysis of Korean Economy in the Economic Literature by text mining techniques. In Proceedings of the 20th Conference on Korea Society for Information Management, 47-50.
  15. Sulova, S., Todoranova, L., Penchev, B., and Nacheva, Radka. (2017). Using text mining to classify research papers. DOI:10.5593/SGEM2017/21/S07.083
  16. Talib, R., Hanif, M. K., Ayesha, S., and Fatima, F. (2016). Text mining: techniques, applications and issues, International Journal of Advanced Computer Science & Applications, 1, 414-418.