Comparison of Sampling Techniques for Passive Internet Measurement: An Inspection using An Empirical Study

수동적 인터넷 측정을 위한 샘플링 기법 비교: 사례 연구를 통한 검증

  • Kim, Jung-Hyun (Dept. of Electronics and Computer Engineering, Hanyang University) ;
  • Won, You-Jip (Dept. of Electronics and Computer Engineering, Hanyang University) ;
  • Ahn, Soo-Han (Dept. of Statistics, University of Seoul)
  • 김정현 (한양대학교 전자컴퓨터통신공학과) ;
  • 원유집 (한양대학교 전자컴퓨터통신공학과) ;
  • 안수한 (서울시립대학교 통계학과)
  • Published : 2008.06.25

Abstract

Today, the Internet is a part of our life. For that reason, we regard revealing characteristics of Internet traffic as an important research theme. However, Internet traffic cannot be easily manipulated because it usually occupy huge capacity. This problem is a serious obstacle to analyze Internet traffic. Many researchers use various sampling techniques to reduce capacity of Internet traffic. In this paper, we compare several famous sampling techniques, and propose efficient sampling scheme. We chose some sampling techniques such as Systematic Sampling, Simple Random Sampling and Stratified Sampling with some sampling intensities such as 1/10, 1/100 and 1/1000. Our observation focused on Traffic Volume, Entropy Analysis and Packet Size Analysis. Both the simple random sampling and the count-based systematic sampling is proper to general case. On the other hand, time-based systematic sampling exhibits relatively bad results. The stratified sampling on Transport Layer Protocols, e.g.. TCP, UDP and so on, shows superior results. Our analysis results suggest that efficient sampling techniques satisfactorily maintain variation of traffic stream according to time change. The entropy analysis endures various sampling techniques well and fits detecting anomalous traffic. We found that a traffic volume diminishment caused by bottleneck could induce wrong results on the entropy analysis. We discovered that Packet Size Distribution perfectly tolerate any packet sampling techniques and intensities.

인터넷이 일상생활에서 중요한 위치를 차지함에 따라 인터넷에서 발생되는 트래픽의 특성을 밝히는 것은 매우 중요한 연구과제로 주목을 받고 있다. 그러나 인터넷 트래픽은 대용량이므로 쉽게 다룰 수 없다. 이러한 문제는 인터넷 트래픽 측정 연구에 가장 큰 장애다 많은 연구자들은 다양한 샘플링 기법을 통해 트래픽을 다를 수 있는 양으로 샘플링하여 분석하고 있다. 본 연구에서는 기존의 인터넷 측정 연구에서 사용된 샘플링 기법을 비교 분석하고, 가장 효과적인 샘플링 방안을 제시하고자 한다. 연구에 비교 사용된 샘플링 기법은 규칙적 샘플링, 단순 랜덤 샘플링, 층화 샘플링이며, 샘플링 단위는 1/10, 1/100, 1/1000을 사용하였다. 분석한 항목은 트래픽 크기 분석, 엔트로피 분석, 패킷 크기 분석이다. 단순 랜덤 샘플링은 무난한 결과를 보였고, (간격을 패킷 개수로 설정한) 규칙적 샘플링은 대상과 샘플링 강도에 상관없이 고른 결과를 보였다. 한편, 간격을 시간으로 설정한 규칙적 샘플링은 매우 좋지 않을 결과를 나타내었다. 전송층 프로토콜을 기준으로 층화 샘플링 수행할 경우 더욱 좋은 결과를 얻을 수 있었다. 연구 결과를 통해 샘플링 기법이 시간에 따른 트래픽의 흐름을 얼마나 잘 유지하는가가 샘플링 성능을 좌우함을 알 수 있었다. 또한 엔트로피 분석은 샘플링에 강하고, 이상 트래픽 탐지에 매우 적절함이 확인되었다. 그러나 병목 현상에 의한 트래픽 크기 감소는 잘못된 엔트로피 분석 결과를 유발할 수 있음을 발견하였다. 마지막으로, 패킷 크기 분포는 패킷 샘플링 방식이나 강도에 영향을 받지 않음을 발견하였다.

Keywords

References

  1. M. Crovella and B. Krishnamurthy, "Internet Measurement: Infrastructure, Traffic, and Applications", John Wiley & Sons, Ltd, 2006
  2. D. Moore and G. M. Voelker and S. Savage, "Inferring Internet Denial-of-Service Activity", In Proc. of Usenix Security Symposium, pp. 9-22 Washington, DC, August 2001
  3. J. Mirkovic and P. Reiher, "A Taxonomy of DDoS attack and DDoS defense Mechanisms", ACM SIGCOMM Computer Communication Review, Vol. 34, Issue 2, pp. 39-53, April 2004
  4. R. R. Panko, "Corporate Computer and Network Security", Prentice Hall, 2004
  5. CERT Advisory MS-SQL Server Worm, "http://www.cert.org/advisories/CA-2003-04.html", January 2003
  6. CERT Advisory W32/Blaster worm, "http://www.cert.org/advisories/CA-2003-20.html", August 2003
  7. D. Moore and V. Paxson and S. Savage and S. Staniford and N. Weaver, "Inside the Slammer worm", IEEE Security & Privacy, Vol. 1 issue 4, pp. 33-39, August 2003
  8. Nick Duffield, "Sampling for Passive Internet Measurement: A Review", Statistical Science Vol. 19, No. 3, pp. 472-498, 2004 https://doi.org/10.1214/088342304000000206
  9. A. Lakhina and M. Crovella and C. Diot, "Characterization of Network-Wide Anomalies in Traffic Flows" In Proc. ACM Internet Measurement Conference, pp 201-206, Taormina, Sicily, Italy, October 2004
  10. A. Soule and F. Silveira and H. Ringberg and C. Diot, "Challenging the Supremacy of Traffic Matrics", In Proc. ACM Internet Measurement Conference, pp 105-110, San Diego, California, USA, October 2007
  11. D. Brauckhoff and B. Tellenbach and A. Wagner and M. May and A. Lakhina, "Impact of Packet Sampling on Anomaly Detection Metrics", In Proc. ACM Internet Measurement Conference, pp 159-164, Rio de Janeriro Brazil, October 2006
  12. J. Mai and C. Chuah and A. Sridharan and T. Ye and H. Zang, "Is Sampled Data Sufficient for Anomaly Detection?", In Proc. ACM Internet Measurement Conference, pp 165-176, Rio de Janeriro, Brazil, October 2006
  13. J. Xia and L. Gao and T. Fei, "Flooding Attacks by Exploiting Persistent Forwarding Loops", In Proc. ACM Internet Measurement Conference, pp 36-41, Berkeley, CA, USA, October 2005
  14. ping, "http://en.wikipedia.org/wiki/Ping", Wikipedia
  15. RFC 1393, Traceroute Using an IP Option, "http://tools.ietf.org/html/rfc1393"
  16. W. John and S. Tafvelin, "Analysis of Internet Backbone Traffic and Header Anomalies observed", In Proc. ACM Internet Measurement Conference, pp 111-116, San Diego, California, USA, October 2007
  17. J. Kim and S. Ahn and Y. Won, "Mining An Anomaly: On The Small Time Scale Behavior of The Traffic Anomaly", In Proc. of IADIS International Conference WWW/Internet, Murcia, Spain, PP. 552-559, October 2006
  18. Juniper Traffic Sampling, "http://www.juniper.net/techpubs/software/junos/junos60/swconfig60-policy/html/sampling-overview.html"
  19. TCPDUMP/LIBPCAP public repository, "http://tcpdump.org"
  20. T. M. Cover and J. A. Thomas, "Elements of Information Theory", Wiley Interscience, 1991
  21. A. Lakhina and M. Crovella and C. Diot, "Mining Anomalies using Traffic Feature Distributions", ACM SIGCOMM Computer Communication Review, Vol 35, Issue 4, pp. 217-228, October 2005 https://doi.org/10.1145/1090191.1080118