DOI QR코드

DOI QR Code

Tor Network Website Fingerprinting Using Statistical-Based Feature and Ensemble Learning of Traffic Data

트래픽 데이터의 통계적 기반 특징과 앙상블 학습을 이용한 토르 네트워크 웹사이트 핑거프린팅

  • Received : 2019.11.15
  • Accepted : 2020.03.03
  • Published : 2020.06.30

Abstract

This paper proposes a website fingerprinting method using ensemble learning over a Tor network that guarantees client anonymity and personal information. We construct a training problem for website fingerprinting from the traffic packets collected in the Tor network, and compare the performance of the website fingerprinting system using tree-based ensemble models. A training feature vector is prepared from the general information, burst, cell sequence length, and cell order that are extracted from the traffic sequence, and the features of each website are represented with a fixed length. For experimental evaluation, we define four learning problems (Wang14, BW, CWT, CWH) according to the use of website fingerprinting, and compare the performance with the support vector machine model using CUMUL feature vectors. In the experimental evaluation, the proposed statistical-based training feature representation is superior to the CUMUL feature representation except for the BW case.

본 논문은 클라이언트의 익명성과 개인 정보를 보장하는 토르 네트워크에서 앙상블 학습을 이용한 웹사이트 핑거프린팅 방법을 제안한다. 토르네트워크에서 수집된 트래픽 패킷들로부터 웹사이트 핑거프린팅을 위한 훈련 문제를 구성하며, 트리 기반 앙상블 모델을 적용한 웹사이트 핑거프린팅 시스템의 성능을 비교한다. 훈련 특징 벡터는 트래픽 시퀀스에서 추출된 범용 정보, 버스트, 셀 시퀀스 길이, 그리고 셀 순서로부터 준비하며, 각 웹사이트의 특징은 고정 길이로 표현된다. 실험 평가를 위해 웹사이트 핑거프린팅의 사용에 따른 4가지 학습 문제(Wang14, BW, CWT, CWH)를 정의하고, CUMUL 특징 벡터를 사용한 지지 벡터 기계 모델과 성능을 비교한다. 실험 평가에서, BW 경우를 제외하고 제안하는 통계 기반 훈련 특징 표현이 CUMUL 특징 표현보다 우수하다.

Keywords

References

  1. Tor Project Metrics [Internet], https://metrics.torproject.org.
  2. Onion Service Protocol [Internet], https://www.torproject.org.
  3. R. Dingledine, N. Mathewson, and P. Syverson, "Tor: The second-generation onion router," Usenix Security, pp. 303-320, 2004.
  4. T. Wang, X. Cai, R. Nithyanand, R. Johnson, and I. Goldberg, "Effective attacks and provable defenses for website fingerprinting," Proceedings of 23rd USENIX Security Symposium, pp.143-156, 2014.
  5. M. S. I. Mamun, A. A. Ghorbani, and N. Stakhanova, "An entropy based encrypted traffic classifier," International Conference on Information and Communications Security, pp.282-294, 2015.
  6. T. Wang and I. Goldberg, "Improved website fingerprinting on tor," Proceedings of 12th ACM Workshop on Workshop on Privacy in the Electronic Society, pp.201-212, 2013.
  7. K. Abe and S. Goto, "Fingerprinting attack on tor anonymity using deep learning," Proceedings of the Asia-Pacific Advanced Network, pp.15-20, 2016. https://doi.org/10.7125/APAN.33.2
  8. A. Panchenko, F. Lanze, J. Pennekamp, T. Engel, A. Zinnen, M. Henze, and K. Wehrle, "Website Fingerprinting at Internet Scale," NDSS, 2016.
  9. X. Cai, X. C. Zhang, B. Joshi, and R. Johnson, "Touching from a distance: Website fingerprinting attacks and defenses," Proceedings of the 2012 ACM Conference on Computer and Communications Security, pp.605-616, 2012.
  10. V. Rimmer, D. Preuveneers, M. Juarez, T. V. Goethem, and W. Joosen, "Automated website fingerprinting through deep learning," arXiv preprint arXiv, 2017.
  11. A. H. Lashkari, G. Draper-Gil, M. S. I. Mamun, and A. A. Ghorbani, "Characterization of Tor Traffic using Time based Features," 3rd International Conference on Information Systems Security and Privacy, pp.253-262, 2017.
  12. A. Pescape, A. Montieri, G. Aceto, and D. Ciuonzo, "Anonymity services tor, i2p, jondonym: Classifying in the dark (web)," IEEE Transactions on Dependable and Secure Computing, 2018.
  13. L. Lu, E. C. Chang, and M. C. Chan, "Website fingerprinting and identification using ordered feature sequences," European Symposium on Research in Computer Security, pp.199-214, 2010.
  14. L. Breiman, "Random forests," Machine Learning, pp.5-32, 2001.
  15. T. Chen and C. Guestrin, "Xgboost: A scalable tree boosting system," Proceddings of the 22nd acm SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.785-794, 2016.
  16. P. Geurts, D. Ernst, and L. Wehenkel, "Extremely randomized trees," Machine Learning, pp.3-42, 2006.