Browse > Article
http://dx.doi.org/10.11108/kagis.2018.21.4.064

Predicting Crime Risky Area Using Machine Learning  

HEO, Sun-Young (Engineering Research Institute(ERI), Gyeongsang National University)
KIM, Ju-Young (Dong Myeong Engineering Consultants & Architecture, Urban Development, Urban Planning)
MOON, Tae-Heon (Dept. of Urban Engineering, Gyeongsang National University)
Publication Information
Journal of the Korean Association of Geographic Information Studies / v.21, no.4, 2018 , pp. 64-80 More about this Journal
Abstract
In Korea, citizens can only know general information about crime. Thus it is difficult to know how much they are exposed to crime. If the police can predict the crime risky area, it will be possible to cope with the crime efficiently even though insufficient police and enforcement resources. However, there is no prediction system in Korea and the related researches are very much poor. From these backgrounds, the final goal of this study is to develop an automated crime prediction system. However, for the first step, we build a big data set which consists of local real crime information and urban physical or non-physical data. Then, we developed a crime prediction model through machine learning method. Finally, we assumed several possible scenarios and calculated the probability of crime and visualized the results in a map so as to increase the people's understanding. Among the factors affecting the crime occurrence revealed in previous and case studies, data was processed in the form of a big data for machine learning: real crime information, weather information (temperature, rainfall, wind speed, humidity, sunshine, insolation, snowfall, cloud cover) and local information (average building coverage, average floor area ratio, average building height, number of buildings, average appraised land value, average area of residential building, average number of ground floor). Among the supervised machine learning algorithms, the decision tree model, the random forest model, and the SVM model, which are known to be powerful and accurate in various fields were utilized to construct crime prevention model. As a result, decision tree model with the lowest RMSE was selected as an optimal prediction model. Based on this model, several scenarios were set for theft and violence cases which are the most frequent in the case city J, and the probability of crime was estimated by $250{\times}250m$ grid. As a result, we could find that the high crime risky area is occurring in three patterns in case city J. The probability of crime was divided into three classes and visualized in map by $250{\times}250m$ grid. Finally, we could develop a crime prediction model using machine learning algorithm and visualized the crime risky areas in a map which can recalculate the model and visualize the result simultaneously as time and urban conditions change.
Keywords
Crime Prediction; Machine Learning; Decision Tree; Random Forest; SVM;
Citations & Related Records
Times Cited By KSCI : 5  (Citation Analysis)
연도 인용수 순위
1 Kim, S.J. and B.Y. Kim. 2013. Comparative analysis of predictors of depression for residents in a metropolitan city using logistic regression and decision making tree. Journal of The Korea Institute of Building Construction 13(12):829-839.
2 Lee, S.K. and T.S. Shin. 2018. Development and application of prediction model of hyperlipidemia using SVM and metalearning algorithm. Journal of Intelligence and Information Systems 24(2):111-124.   DOI
3 Lee, S. M. 2017. Spatial analysis of flood and landslide susceptibility in Seoul using random forest and boosted tree models. Master. Thesis, Univ. of Seoul, Seoul, Korea. 78pp.
4 Neuilly, M.A., K.M. Zgoba., G.E. Tita and S.S. Lee. 2011. Predicting recidivism in homicide offenders using classification tree analysis. Homicide studies 15(2): 154-176.   DOI
5 Newburn, T. and R. Sparks.(eds.). 2004. Criminal Justice and Political Cultures: National and international dimensions of crime control. Willan Publishing. UK.
6 Oh, B.H., K.W. Chung. and K.S. Hong. 2015. Gaze recognition system using random forests in vehicular environment based on smart-phone. The Journal of The Institute of Internet. Broadcasting and Communication 15(1):191-197.   DOI
7 Oliveira, S., F. Oehler., J. San-Miguel-Ayanz., A. Camia and J.M. Pereira. 2012. Modeling spatial patterns of fire occurrence in Mediterranean Europe using Multiple Regression and Random Forest. Forest Ecology and Management 275:117-129.   DOI
8 Park, W.K. and S.Y. Kim. 2003. A Study on TV program rating prediction : Emphasizing the comparison of prediction capability between regression model and data mining model. Advertising Research 58:61-79.
9 Ranson, M. 2014. Crime, weather, and climate change. Journal of Environmental Economics and Management 67(3):274-302.   DOI
10 Song, J.Y. and T.M. Song. 2018. Crime prediction using Big Data. Hwangsogeoleum academi. Seoul. 414pp.
11 Song, Y.S., Y.C. Cho., Y.S. Seo and S.R. Ahn. 2009. Development and its application of computer program for slope hazards prediction using Decision Tree Model. Journal of The Korean Society of Civil Engineers 29(2):59-69.
12 Wikipedia. https://www.wikipedia.org/.
13 Yoo, B.K., K.Y. Choi and D.K. Kim. 2018. An study on shopper's retail format choice via Machine Learning Method : Based on national chain market and traditional market. The Journal of Business Education 32(1):155-174.   DOI
14 Yoo, J.E. 2015. Random forests, an alternative data mining technique to decision tree. Journal of Educational Evaluation 28(2):427-448.
15 Choi, J.H. and D.S. Seo. 1999. Decision trees and its applications. Statistical Analysis Studies 4(1):61-83
16 Chaurasia, V. and S. Pal. 2013. Early prediction of heart diseases using data mining techniques. Caribbean Journal of Science and Technology 1:208-217.
17 Cho, Y.R., Kim, Y.C. and Y.S. Shin. 2017. Prediction model of construction safety accidents using decision tree technique. Journal of the Korea Institute of Building Construction 17(3):295-303.   DOI
18 Choi, H.N. and D.H. Lim. Bankruptcy prediction using ensemble SVM model. Journal of the Korean Data & Information Science Society 24(6):1113-1125.   DOI
19 Cortes, C. and V. Vapnik. 1995. Supportvector networks. Machine learning 20(3):273-297.   DOI
20 Elleng G. and COHN. 1990. Weather and crime. The British Journal of Criminology 30(1):51-64.   DOI
21 Guo, F., L. Zhang., S. Jin., M. Tigabu., Z. Su and W. Wang. 2016. Modeling anthropogenic fire occurrence in the boreal forest of China using logistic regression and random forests. Forests 7(11):250.   DOI
22 Hajek, P. and K. Michalak. 2013. Feature selection in corporate credit rating prediction. Knowledge-Based Systems 51:72-84.   DOI
23 Heo, J.Y. and J.Y. Yang. 2015. SVM based stock price forecasting using financial statements. KIISE Transactions on Computing Practices (KTCP) 21(2):167- 172.   DOI
24 Heo, S.Y., J.Y. Kim and T.H. Moon. 2017. Crime incident prediction model based on Bayesian probability. Journal of the Korean Association of Geographic Information Studies 20(4):89-101.   DOI
25 Horrocks, J. and A.K. Menclova. 2011. The effects of weather on crime, New Zealand Economic Papers 45(3):231-254.   DOI
26 Brown, I. and C. Mues. 2012. An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems with Applications 39(3):3446-3453.   DOI
27 Almanie, T., R. Mirza and E. Lor. 2015. Crime prediction based on crime types and using spatial and temporal criminal hotspots. International Journal of Data Mining & Knowledge Management Process 5(4):1-19.
28 Ahn, H.C. 2014. Optimization of multiclass support vector machine using genetic algorithm : application to the prediction of corporate credit rating. Information Systems Review 16(3):161-177.   DOI
29 Bae, S.W. and J.S. Yu. 2018. Predicting the real estate price index using machine learning methods and time series analysis model. Housing Studies 26:107-133.
30 Jeong, J.H., J.H. Kim., J.H. Choo., S.H. Lee. and C.T. Hyun. 2017. Common maintenance cost estimation model using random forest for multi-family housing. Journal of the Architectural Institute of Korea 33(3):19-27.
31 Kim, S.J. and H.C. Ahn. 2016. Application of random forests to corporate credit rating prediction. Industrial Innovation Studies 32(1):187-211.