Browse > Article
http://dx.doi.org/10.3745/KTSDE.2020.9.6.187

Tor Network Website Fingerprinting Using Statistical-Based Feature and Ensemble Learning of Traffic Data  

Kim, Junho (단국대학교 컴퓨터학과)
Kim, Wongyum ((주)에이아이딥)
Hwang, Doosung (단국대학교 소프트웨어학과)
Publication Information
KIPS Transactions on Software and Data Engineering / v.9, no.6, 2020 , pp. 187-194 More about this Journal
Abstract
This paper proposes a website fingerprinting method using ensemble learning over a Tor network that guarantees client anonymity and personal information. We construct a training problem for website fingerprinting from the traffic packets collected in the Tor network, and compare the performance of the website fingerprinting system using tree-based ensemble models. A training feature vector is prepared from the general information, burst, cell sequence length, and cell order that are extracted from the traffic sequence, and the features of each website are represented with a fixed length. For experimental evaluation, we define four learning problems (Wang14, BW, CWT, CWH) according to the use of website fingerprinting, and compare the performance with the support vector machine model using CUMUL feature vectors. In the experimental evaluation, the proposed statistical-based training feature representation is superior to the CUMUL feature representation except for the BW case.
Keywords
Anonymous Network; Traffic Collection; Website Fingerprinting; Ensemble Algorithm; Machine Learning;
Citations & Related Records
연도 인용수 순위
  • Reference
1 T. Chen and C. Guestrin, "Xgboost: A scalable tree boosting system," Proceddings of the 22nd acm SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.785-794, 2016.
2 P. Geurts, D. Ernst, and L. Wehenkel, "Extremely randomized trees," Machine Learning, pp.3-42, 2006.
3 Tor Project Metrics [Internet], https://metrics.torproject.org.
4 Onion Service Protocol [Internet], https://www.torproject.org.
5 R. Dingledine, N. Mathewson, and P. Syverson, "Tor: The second-generation onion router," Usenix Security, pp. 303-320, 2004.
6 T. Wang, X. Cai, R. Nithyanand, R. Johnson, and I. Goldberg, "Effective attacks and provable defenses for website fingerprinting," Proceedings of 23rd USENIX Security Symposium, pp.143-156, 2014.
7 M. S. I. Mamun, A. A. Ghorbani, and N. Stakhanova, "An entropy based encrypted traffic classifier," International Conference on Information and Communications Security, pp.282-294, 2015.
8 T. Wang and I. Goldberg, "Improved website fingerprinting on tor," Proceedings of 12th ACM Workshop on Workshop on Privacy in the Electronic Society, pp.201-212, 2013.
9 K. Abe and S. Goto, "Fingerprinting attack on tor anonymity using deep learning," Proceedings of the Asia-Pacific Advanced Network, pp.15-20, 2016.   DOI
10 A. Panchenko, F. Lanze, J. Pennekamp, T. Engel, A. Zinnen, M. Henze, and K. Wehrle, "Website Fingerprinting at Internet Scale," NDSS, 2016.
11 A. Pescape, A. Montieri, G. Aceto, and D. Ciuonzo, "Anonymity services tor, i2p, jondonym: Classifying in the dark (web)," IEEE Transactions on Dependable and Secure Computing, 2018.
12 X. Cai, X. C. Zhang, B. Joshi, and R. Johnson, "Touching from a distance: Website fingerprinting attacks and defenses," Proceedings of the 2012 ACM Conference on Computer and Communications Security, pp.605-616, 2012.
13 V. Rimmer, D. Preuveneers, M. Juarez, T. V. Goethem, and W. Joosen, "Automated website fingerprinting through deep learning," arXiv preprint arXiv, 2017.
14 A. H. Lashkari, G. Draper-Gil, M. S. I. Mamun, and A. A. Ghorbani, "Characterization of Tor Traffic using Time based Features," 3rd International Conference on Information Systems Security and Privacy, pp.253-262, 2017.
15 L. Lu, E. C. Chang, and M. C. Chan, "Website fingerprinting and identification using ordered feature sequences," European Symposium on Research in Computer Security, pp.199-214, 2010.
16 L. Breiman, "Random forests," Machine Learning, pp.5-32, 2001.