DOI QR코드

DOI QR Code

P2P Traffic Classification using Advanced Heuristic Rules and Analysis of Decision Tree Algorithms

개선된 휴리스틱 규칙 및 의사 결정 트리 분석을 이용한 P2P 트래픽 분류 기법

  • 예우지엔 (단국대학교 대학원 컴퓨터학과) ;
  • 조경산 (단국대학교 소프트웨어학과)
  • Received : 2013.12.17
  • Accepted : 2014.02.18
  • Published : 2014.03.31

Abstract

In this paper, an improved two-step P2P traffic classification scheme is proposed to overcome the limitations of the existing methods. The first step is a signature-based classifier at the packet-level. The second step consists of pattern heuristic rules and a statistics-based classifier at the flow-level. With pattern heuristic rules, the accuracy can be improved and the amount of traffic to be classified by statistics-based classifier can be reduced. Based on the analysis of different decision tree algorithms, the statistics-based classifier is implemented with REPTree. In addition, the ensemble algorithm is used to improve the performance of statistics-based classifier Through the verification with the real datasets, it is shown that our hybrid scheme provides higher accuracy and lower overhead compared to other existing schemes.

본 논문에서는 기존 기법들의 제한점을 개선하기 위해 휴리스틱 규칙 및 기계학습 분석 결과를 이용한 두 단계의 P2P 트래픽 분류 기법을 제안한다. 첫 번째 단계는 패킷 레벨의 시그니처 기반 분류기이고, 두 번째 단계는 플로우 레벨에서 수행되는 패턴 휴리스틱 규칙 및 통계 기반 분류기이다. 제안된 패턴 휴리스틱 규칙은 분류의 정확도를 높이고 통계 기반 분류기가 처리할 트래픽의 양을 줄일 수 있다. 다양한 의사 결정 트리 알고리즘의 분석을 기반으로 통계 기반 분류기는 가장 효율적인 REPTree로 구현하고, 앙상블 알고리즘을 통해 통계 기반 분류기의 성능을 개선한다. 실제 환경의 데이터 집합을 이용한 검증 분석을 통해, 본 제안 기법이 기존 기법에 비해 높은 정확도와 낮은 과부하를 제공함을 제시한다.

Keywords

References

  1. Myung-Yoon Lee, Jang-Su Park and Im-Yeong Lee, "SPNS realization for secure P2P Service," Korea Multimedia Society, pp. 67-70, Nov. 2006.
  2. Jaehak Yu, Hansung Lee, Yuonghee Im, Myung-sup Kim and Daihee Park, "Hierarchical Internet Application Traffic Classification using a Multi-class SVM," Korean Institute of Intelligent Systems, Vol. 20, No. 1, pp. 7-14, Oct. 2010. https://doi.org/10.5391/JKIIS.2010.20.1.007
  3. Nam-Kyoung Um, Sung-Hee Woo and Sang-Ho Lee, "Flow-based P2P traffic identification using SVM," Vol. 13, No. 3, pp. 123-130, May 2008.
  4. Yu-Shui Geng, Tao Han and Xue-Song Jiang, "The Research of P2P Traffic Identification Technology," Proc. of International Conference on E-Business and Information System Security, Wuhan, pp. 1-4, May 2009.
  5. Subhabrata Sen, Oliver Spatscheck and Dong-Mei Wang, "Accurate, scalable in network identification of P2P traffic using application signature," Proc. the 13th international conference on World Wide Web, New York, pp. 512-521, May 2004.
  6. Alok Madhukar and Carey Williamson, "A longitudinal study of P2P traffic classification," Proc. 4th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, pp. 179-188, Sept. 2006.
  7. Xin-Bin Liu, Jian-Hua Yang, Gao-Gang Xie and Yao Hu, "Automated mining of packet signatures for traffic identification application layer with apriori algorithm," Journal on Communications, Vol. 29, No. 12, pp. 51-59, March 2008.
  8. G'eza Szab'o, Daniel Orincsay, Szabolcs Malomsoky, and Istvan Szab'o, "On the Validation of Traffic Classification algorithms," Passive and Active Network Measurement Lecture Notes in Computer Science, pp. 72-81, April 2008.
  9. Thomas Karagiannis, Andre Broido, Michalis Faloutsos and Kc Claffy, "Transport layer identification of P2P traffic," Proc. the 4th ACM SIGCOMMConference on Internet Measurement, NewYork, pp. 121-134, Oct. 2004.
  10. Yaou Zhao, Xiao Xie and Mingyan Jiang, "Hierarchical real-time network traffic classification based on ECOC," TELKOMNIKA Indonesian Journal of Electrical Engineering, Vol. 12, No. 2, pp. 1551-1560, Feb. 2014.
  11. Aiqing Zhu, "A P2P Network Traffic Classification Method Based on C4.5 Decision Tree Algorithm," Proc. of the 9th International Symposium on Linear Drives for Industry Applications, Vol. 4, pp.373-379, Jan. 2014.
  12. Wujian Ye and Kyungsan Cho, "Hybrid P2P traffic classification with heuristic rules and machine learning," Soft Computing Journal (to be published)
  13. Pruning, http://en.wikipedia.org/wiki/Pruning_(decisi on_trees)
  14. S. B. Kotsiantis, "Decision trees: a recent overview," Artificial Intelligence Review, Vol. 39, No. 4, pp. 261-283, April 2013. https://doi.org/10.1007/s10462-011-9272-4
  15. Oded Maimon and Lior Rokach, "Data Mining and Knowledge Discovery Handbook," Second Edition, Springer, 2010.
  16. Marina Skurichina and Robert P. W. Duin, "Bagging, Boosting and the Random Subspace Method for Linear Classifiers," Pattern Analysis and Applications, Vol. 5, No. 2, pp. 121-135, June 2002. https://doi.org/10.1007/s100440200011
  17. S. Kotsiantis, "Combining bagging, boosting, rotation forest and random subspace methods," Artificial Intelligence Review, Vol. 35, No. 3, pp. 223-240, March 2011. https://doi.org/10.1007/s10462-010-9192-8
  18. Tin Kam Ho, "The Random Subspace Method for Constructing Decision Forests," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 8, pp. 832-844, Aug. 1998. https://doi.org/10.1109/34.709601
  19. Robert E. Banfield, Lawrence O. Hall, Kevin W, Bowyer and W. P. KegeImeyer, "A comparison of decision tree ensemble creation techniques," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 29, No. 1, pp. 173-180, Jan. 2007. https://doi.org/10.1109/TPAMI.2007.250609
  20. Tianyan Jiang, Jian Li, Yuanbing Zheng and Caixin Sun, "Improved Bagging Algorithm for Pattern Recognition in UHF Signals of Partial Discharges," Energies, Vol. 4, No. 7, pp. 1087-1101, April 2011. https://doi.org/10.3390/en4071087
  21. Zhen-Xiang Chen, Bo Yang, Yue-Hui Chen, Ajith Abraham, Crina Grosan and Li-Zhi Peng, "Online Hybrid traffic classifier for Peer-to Peer Systems based on Network Processors," Applied Soft Computing, Vol. 9, No. 2, pp. 685-694, Mar. 2009. https://doi.org/10.1016/j.asoc.2008.09.010
  22. Jun Li, Shui-Yi Zhang, Yan-Qing Lu and Jun-Rong Yan, "Hybrid Internet Traffic Classification Technique," Journal of Electronics (China), Vol. 26, No. 1, pp. 101-112, Jan. 2009. https://doi.org/10.1007/s11767-007-0110-4
  23. Ram Keralapura, Antonio Nucci and Chen-Nee Chuah, "A novel self-learning architecture for p2p traffic classification in high speed networks," Computer Networks, Vol. 54, No. 7, pp. 1055-1068, May 2010. https://doi.org/10.1016/j.comnet.2009.10.009
  24. Wujian Ye and Kyungsan Cho, "Two-Step P2P Traffic Classification with Connection Heuristics," Proc. of IMIS2013-Seventh International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing, pp.135-141, July 2013.
  25. JPcap, http://www.eden.rutgers.edu/-muscarim/jpcap/index.html
  26. Weka, http://www.cs.waikato.ac.nz/ml/weka/
  27. Thuy T. T. Nguyen and Grenville J. Armitage, "A survey of techniques for internet traffic classification using machine learning," Proc. of IEEE Communications Surveys and Tutorials, Vol. 10, No. 4, pp. 56-76, Fourth Quarter 2008. https://doi.org/10.1109/SURV.2008.080406
  28. Precision and recall, http://en.wikipedia.org/wiki/Recall_and_precision
  29. F. Gringoli, L. Salgarelli, M. Dusi, N. Cascarano, F. Risso and K.C. Claffy, "GT: picking up the truth from the ground for Internet traffic,"ACM SIGCOMM Computer Communication Review, Vol. 39, No. 5, pp. 12-18, Oct. 2009. https://doi.org/10.1145/1629607.1629610

Cited by

  1. P2P and P2P botnet traffic classification in two stages vol.21, pp.5, 2017, https://doi.org/10.1007/s00500-015-1863-6