DOI QR코드

DOI QR Code

A Safety Score Prediction Model in Urban Environment Using Convolutional Neural Network

컨볼루션 신경망을 이용한 도시 환경에서의 안전도 점수 예측 모델 연구

  • 강현우 (가톨릭대학교 디지털미디어과) ;
  • 강행봉 (가톨릭대학교 디지털미디어학부)
  • Received : 2016.03.04
  • Accepted : 2016.04.26
  • Published : 2016.08.31

Abstract

Recently, there have been various researches on efficient and automatic analysis on urban environment methods that utilize the computer vision and machine learning technology. Among many new analyses, urban safety analysis has received a major attention. In order to predict more accurately on safety score and reflect the human visual perception, it is necessary to consider the generic and local information that are most important to human perception. In this paper, we use Double-column Convolutional Neural network consisting of generic and local columns for the prediction of urban safety. The input of generic and local column used re-sized and random cropped images from original images, respectively. In addition, a new learning method is proposed to solve the problem of over-fitting in a particular column in the learning process. For the performance comparison of our Double-column Convolutional Neural Network, we compare two Support Vector Regression and three Convolutional Neural Network models using Root Mean Square Error and correlation analysis. Our experimental results demonstrate that our Double-column Convolutional Neural Network model show the best performance with Root Mean Square Error of 0.7432 and Pearson/Spearman correlation coefficient of 0.853/0.840.

최근, 컴퓨터 비전과 기계 학습 기술의 도움을 받아 효율적이고 자동적인 도시 환경에 대한 분석 방법의 개발에 대한 연구가 이루어지고 있다. 많은 분석들 중에서도 도시의 안전도 분석은 지역 사회의 많은 관심을 받고 있다. 더욱 정확한 안전도 점수 예측과 인간의 시각적 인지를 반영하기 위해서, 인간의 시각적 인지에서 가장 중요한 전역 정보와 지역 정보의 고려가 필요하다. 이를 위해 우리는 전역 칼럼과 지역 칼럼으로 구성된 Double-column Convolutional Neural Network를 사용한다. 전역 칼럼과 지역 칼럼 각각은 입력은 크기가 변환된 원 영상과 원 영상에서 무작위로 크로핑을 사용한다. 또한, 학습 과정에서 특정 칼럼에 오버피팅되는 문제를 해결하기 위한 새로운 학습방법을 제안한다. 우리의 DCNN 모델의 성능 비교를 위해 2개의 SVR 모델과 3개의 CNN 모델의 평균 제곱근 오차와 상관관계 분석을 측정하였다. 성능 비교 실험 결과 우리의 모델이 0.7432의 평균 제곱근 오차와 0.853/0.840 피어슨/스피어맨 상관 계수로 가장 좋은 성능을 보여주었다.

Keywords

References

  1. K. Lynch, "The image of the city," MIT press, 1960.
  2. P. Salesses, S. Katja, and C. A. Hidalgo, "The collaborative image of the city: mapping the inequality of urban perception," PloS one, Vol.8, No.7, 2013.
  3. V. Ordonez and T. L. Berg, "Learning high-level judgments of urban perception," in Proceedings of the European Conference on Computer Vision, pp.494-510, 2014.
  4. A. Khosla, B. An, J. J. Lim, and A. Torralba, "Looking beyond the visible scene," in Proceedings of the Computer Vision and Pattern Recognition, pp.3710-3717, 2014.
  5. N. Naik, J. Philipoom, R. Raskar, and C. Hidalgo, "Streetscore--Predicting the Perceived Safety of One Million Streetscapes," in Proceedings of the Computer Vision and Pattern Recognition Workshops, pp.793-799, 2014.
  6. H. W. Kang and H. B. Kang, "A new context-aware computing method for urban safety," in International Conference on Image Analysis and Processing Workshops, pp.298-305, 2015.
  7. G. O. Mohler, M. B. Short, P. J. Brantingham, F. P. Schoenberg, and G. E. Tita, "Self-exciting point process modeling of crime," Journal of the American Statistical Association, Vol. 106, pp.100-108, 2012.
  8. M. S. Gerber, "Predicting crime using Twitter and kernel density estimation," Decision Support Systems, Vol.61, pp.115-125, 2014. https://doi.org/10.1016/j.dss.2014.02.003
  9. X. Chen, Y. Cho, and S. Jang, "Crime prediction using Twitter sentiment and weather," in Proceedings of the Systems and Information Engineering Design Symposium, pp.63-68, 2015.
  10. J. Q. Wilson and G. L. Kelling, "Broken windows," Atlantic monthly, Vol.249, No.3, pp.29-38, 1982.
  11. A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Proceedings of the Advances in Neural Information Processing Systems, pp.1097-1105, 2012.
  12. K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," in arXiv preprint arXiv:1409.1556, 2014.
  13. X. Lu, Z. Lin, H. Jin, J. Yang, and J. Z. Wang, "Rapid: Rating pictorial aesthetics using deep learning," in Proceedings of the ACM International Conference on Multimedia, pp.457-466, 2014.
  14. H. Jung, S. Lee, J. Yim, S Park, and J. Kim, "Joint Fine-Tuning in Deep Neural Networks for Facial Expression Recognition," in Proceedings of the International Conference on Computer Vision, pp.2983-2991, 2015.
  15. J. Park and M. E. Newman, "A network-based ranking system for us college football," Journal of Statistical Mechanics: Theory and Experiment, Vol.10, 2005.
  16. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, and T. Darrell, "Caffe: Convolutional architecture for fast feature embedding," in Proceedings of the ACM International Conference on Multimedia, pp.675-678, 2014.
  17. N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection," in Proceedings of Computer Vision and Pattern Recognition, pp.886-893, 2005.
  18. A. Oliva and A. Torralba, "Modeling the shape of the scene: A holistic representation of the spatial envelope," International Journal of Computer Vision, Vol.42, No.3, pp.145-175, 2001. https://doi.org/10.1023/A:1011139631724
  19. F. Perronnin, J. Sanchez, and T. Mensink, "Improving the fisher kernel for largescale image classification," in Proceedings of the European Conference on Computer Vision, pp.143-156, 2010.
  20. J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell, "DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition," in arXiv preprint arXiv:1310.1531, 2013.
  21. R. E. Fan, K. W. Chang, C. J. Hsieh, X. R. Wang, and C. J. Lin, "LIBLINEAR: A library for large linear classification," The Journal of Machine Learning Research, Vol.9, pp.1871-1874, 2008.