Browse > Article
http://dx.doi.org/10.15207/JKCS.2022.13.02.013

A Named Entity Recognition Model in Criminal Investigation Domain using Pretrained Language Model  

Kim, Hee-Dou (Department of Bigdata Convergence, Korea University)
Lim, Heuiseok (Department of Computer Science and Engineering, Korea University)
Publication Information
Journal of the Korea Convergence Society / v.13, no.2, 2022 , pp. 13-20 More about this Journal
Abstract
This study is to develop a named entity recognition model specialized in criminal investigation domains using deep learning techniques. Through this study, we propose a system that can contribute to analysis of crime for prevention and investigation using data analysis techniques in the future by automatically extracting and categorizing crime-related information from text-based data such as criminal judgments and investigation documents. For this study, the criminal investigation domain text was collected and the required entity name was newly defined from the perspective of criminal analysis. In addition, the proposed model applying KoELECTRA, a pre-trained language model that has recently shown high performance in natural language processing, shows performance of micro average(referred to as micro avg) F1-score 98% and macro average(referred to as macro avg) F1-score 95% in 9 main categories of crime domain NER experiment data, and micro avg F1-score 98% and macro avg F1-score 62% in 56 sub categories. The proposed model is analyzed from the perspective of future improvement and utilization.
Keywords
Crime Prevention; Criminal Investigation; Pretrained Language Model; Crime Domain Text; Named Entity Recognition; KoELECTRA;
Citations & Related Records
연도 인용수 순위
  • Reference
1 K. R. Rahem & N. Omar. (2014). Drug-related crime information extraction and analysis. Proceedings of the 6th International Conference on Information Technology and Multimedia, pp. 250-254. DOI : 10.1109/ICIMU.2014.7066639   DOI
2 J. Devlin, M. W. Chang, K. lee & K. Toutanova. (2019). BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding. In proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1, 4171-4186, DOI : 10.18653/v1/N19-1423   DOI
3 J. H. Lee, W. J. Yoon, S. D. Kim, D. H. Kim, S. K Kim, C. H. So & J. W. Kang. (2020) BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4), 1234-1240. DOI : 10.1093/bioinformatics/btz682   DOI
4 H. Chen, W. Chung, J. J. Xu, G. Wang, Y. Qin, & M. Chau. (2004). Crime data mining: a general framework and some examples. Computer, 37(4), 50-56.   DOI
5 R. Bache, F. Crestani, D. Canter & D. Youngs. (2007). Application of Language Models to Suspect Prioritisation and Suspect Likelihood in Serial Crimes. Third International Symposium on Information Assurance and Security, 399-404. DOI : 10.1109/IAS.2007.58   DOI
6 K. Srinivasa & P. S. Thilagam (2019). Crime base: Towards building a knowledge base for crime entities and their relationships from online news papers. Information Processing & Management, 56. DOI : org/10.1016/j.ipm.2019.102059   DOI
7 M. Chau, J. J. Xu & H. Chen. (2002). Extracting meaningful entities from police narrative reports. In Proceedings of the 2002 Annual National Conference on Digital Government Research, Los Angeles
8 S. Sathyadevan, M. S. Devan & S. S. Gangadharan (2014). Crime analysis and prediction using data mining. 2014 First International Conference on Networks & Soft Computing (ICNSC2014), 406-412. DOI : 10.1109/CNSC.2014.6906719.   DOI
9 J. Johnson, A. Miller, L. Khan, B. Thuraisingham, & M. Kantarcioglu. (2011). Extraction of expanded entity phrases. Proceedings of the IEEE International Conference on Intelligence and Security Informatics, Beijing, China, 107-112. DOI : 10.1109/ISI.2011.5984059   DOI
10 K-S. Yang, C-C. Chen, Y-H. Tseng & Z-P. Ho. (2012). Name entity extraction based on POS tagging for criminal information analysis and relation visualization. Proceedings of the 6th International Conference on New Trends in Information Science and Service Science and Data Mining (ISSDM), October, Taipei. 785-789.
11 K. Clark, M. T. Luong, Q. V. Le & C. D. Manning. (2020). ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.
12 A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez & I. Polosukhin. (2017). Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, 6000-6010.
13 S. Hochreiter & J. Schmidhuber. (1997). Long short-ter memory. Neural computation, 9(8), 1735-1780. DOI : 10.1162/neco.1997.9.8.1735   DOI
14 H. Hassani, X. Huang & E. S. Silva. (2016). A review of data mining applications in crime. Statistical Analysis and Data Mining, 9(3), 139-154. DOI : 10.1002/sam.11312   DOI
15 A. Alkaff & M. Mohd. (2013). Extraction of naitonality from crime news. Journal of Theoretical and Applied Information Technology, 54, 304-312.
16 M. Asharef, N. Omar & M. Albared. (2012). Arabic named entity recognition in crime documents. Journal of Theoretical and Applied Information Technology, 44(1), 1-6.
17 Arulanandam, R., Savarimuthu, B. T. R. & Purvis. M. A. (2014). Extracting crime information from online newspaper articles. Proceedings of the Second Australasian Web Conference, Auckland, New Zealand, 31-38.
18 P Gohel. (2016) Crime information extraction from news articles. M Tech Dissertations. Dhirubhai Ambani Institute of Information and Communication Technology. Gandhinagar.
19 C. H. Ku, IA. riberri & G. Leroy. (2008). Natural language processing and e-government: crime information extraction from heterogeneous data sources. Proceedings of the 9th Annual International Digital Government Research Conference, Canada. 162-170.