DOI QR코드

DOI QR Code

CAPTCHA에 사용되는 숫자데이터를 자동으로 판독하기 위한 Autoencoder 모델들의 특성 연구

A Study on the Characteristics of a series of Autoencoder for Recognizing Numbers used in CAPTCHA

  • Jeon, Jae-seung (Graduate School of Information Security, Korea university) ;
  • Moon, Jong-sub (Graduate School of Information Security, Korea university)
  • 투고 : 2017.09.01
  • 심사 : 2017.10.02
  • 발행 : 2017.12.31

초록

오토인코더(Autoencoder)는 입력 계층과 출력 계층이 동일한 딥러닝의 일종으로 은닉 계층의 제약 조건을 이용하여 입력 벡터의 특징을 효과적으로 추출하고 복원한다. 본 논문에서는 CAPTCHA 이미지 중 하나의 숫자와 자연배경이 혼재된 영역을 대상으로 일련의 다양한 오토인코더 모델들을 적용하여 잡음인 자연배경을 제거하고 숫자 이미지만을 복원하는 방법들을 제시한다. 제시하는 복원 이미지의 적합성은 오토인코더의 출력을 입력으로 하는 소프트맥스 함수를 활성화 함수로 사용하여 검증하고, CAPTCHA 정보를 자동으로 획득하는 다른 방법들과 비교하여, 본 논문에서 제시하는 방법의 우수함을 검증하였다.

Autoencoder is a type of deep learning method where input layer and output layer are the same, and effectively extracts and restores characteristics of input vector using constraints of hidden layer. In this paper, we propose methods of Autoencoders to remove a natural background image which is a noise to the CAPTCHA and recover only a numerical images by applying various autoencoder models to a region where one number of CAPTCHA images and a natural background are mixed. The suitability of the reconstructed image is verified by using the softmax function with the output of the autoencoder as an input. And also, we compared the proposed methods with the other method and showed that our methods are superior than others.

키워드

참고문헌

  1. E. Bursztein, J. Aigrain, A. Moscicki, and J. C. Mitchell, "The End is Nigh: Generic Solving of Text-based CAPTCHAs," Usenix Woot, 2014. https://www.usenix.org/node/185129
  2. https://www.google.com/recaptcha/intro/invisible.html
  3. B. M. Powell, E. Kalsy, G. Goswami, M. Vatsa, R. Singh, and A. Noore, "Attack-Resistant aiCAPTCHA using a Negative Selection Artificial Immune System," urity and Privacy Workshops (SPW), IEEE, pp. 1-6, 2017. https://doi.org/10.1109/SPW.2017.22
  4. K. Chellapilla, K. Larson, P. Simard, and M. Czerwinski, "Computers beat humans at single character recognition in reading based human interaction proofs (HIPs)," in Proc. of Second Conf. Email Anti-Spam, 2005. https://www.microsoft.com/en-us/research/wp-content/uploads/2005/01/CEAS2005Final.doc
  5. E. Bursztein, M. Martin, and J. C. Mitchell, "Textbased CAPTCHA strengths and weaknesses," in Proc. of 18th ACM Conf. Comput. Commun. Secur., ISBN: 978-1-4503-0948-6, pp. 125-138. 2011. https://doi.org/10.1145/2046707.2046724
  6. C. Cruz-Perez, O. Starostenko, F. Uceda-Ponga, V. Alarcon- Aquino, and L. Reyes-Cabrera, "Breaking reCAPTCHAs with unpredictable collapse: Heuristic character segmentation and recognition," Pattern Recognition, vol. 7329, pp. 155-165, 2012. https://link.springer.com/chapter/10.1007/978-3-642-31149-9_16
  7. K. Kim, D. Shin, K. Lee and D. Nyang, "CAPTCHA Analysis using Convolution Filtering," Journal of The Korea Institute of Information Security & Cryptology, Vol. 24, no. 6, pp. 1129-1138, 2014. http://dx.doi.org/10.13089/JKIISC.2014.24.6.1129
  8. J. Kim, S. Kim, and H. J. Kim, "Breaking character and natural image based CAPTCHA using feature classification," Journal of The Korea Institute of Information Security & Cryptology, Vol. 25, no. 5, pp. 1011-1019, 2015. http://dx.doi.org/10.13089/JKIISC.2015.25.5.1011
  9. J. Xie, L. Xu, and E. Chen, "Image Denoising and Inpainting with Deep Neural Networks," Nips, pp. 1-9, 2012. https://papers.nips.cc/paper/4686-image-denoising-and-in painting-with-deep-neural-networks
  10. Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, "Greedy Layer-Wise Training of Deep Networks," Adv. Neural Inf. Process. Syst., Vol. 19, no. 1, pp. 153-160, 2007.
  11. P. Vincent, H. Larochelle, Y. Bengio, and P.A. Manzagol, "Extracting and composing robust features with denoising autoencoders," in Proc. of 25th Int. Conf. Mach. Learn. - ICML '08, pp. 1096-1103, 2008. http://machinelearning.org/archive/icml2008/papers/592.pdf
  12. A. Ng, "CS229 Lecture notes," CS229 Lecture notes, pp. 1-30, 2000.
  13. P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.A. Manzagol, "Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion," J. Mach. Learn. Res., Vol. 11, pp. 3371-3408, 2010. http://www.jmlr.org/papers/v11/vincent10a.html
  14. G. E. Hinton and R. R. Salakhutdinov, "Reducing the Dimensionality of Data with Neural Networks," Science, Vol. 313, no. 5786, pp. 504-507, 2006. https://doi.org/10.1126/science.1127647
  15. A. Ng, "Sparse autoencoder," CS294A Lect. notes, 2011, pp. 1-19.
  16. G. E. Hinton, S. Osindero, and Y. W. Teh, "A fast learning algorithm for deep belief nets," Neural Comput., Vol. 18, no. 7, pp. 1527-54, 2006. https://www.cs.toronto.edu/-hinton/absps/fastnc.pdf https://doi.org/10.1162/neco.2006.18.7.1527
  17. http://ufldl.stanford.edu/tutorial/supervised/SoftmaxRegression/
  18. https://nid.naver.com/login/image/captcha/nhncaptchav4.gif?key=
  19. J. Canny, "A Computational Approach to Edge Detection," IEEE Trans. Pattern Anal. Mach. Intell., Vol. PAMI-8, no. 6, pp. 679-698, 1986. htps://doi.org/10.1109/TPAMI.1986.4767851
  20. A. Geron, "Hands on Machine Learning with scikit-learn and Tensorflow," 2017
  21. T. Amaral, L. M. Silva, L. A. Alexandre, C. Kandaswamy, J. M. Santos, and J. M. De Sa, "Using different cost functions to train stacked auto-encoders," Artificial Intelligence (MICAI), 2013 12th Mexican International Conference on, pp. 114-120, 2013. https://doi.org/10.1109/MICAI.2013.20