Privacy Preserving Sequential Patterns Mining for Network Traffic Data

사이트의 접속 정보 유출이 없는 네트워크 트래픽 데이타에 대한 순차 패턴 마이닝

  • 김승우 (연세대학교 컴퓨터과학과) ;
  • 박상현 (연세대학교 컴퓨터과학과) ;
  • 원정임 (한양대학교 정보통신대학)
  • Published : 2006.12.15

Abstract

As the total amount of traffic data in network has been growing at an alarming rate, many researches to mine traffic data with the purpose of getting useful information are currently being performed. However, network users' privacy can be compromised during the mining process. In this paper, we propose an efficient and practical privacy preserving sequential pattern mining method on network traffic data. In order to discover frequent sequential patterns without violating privacy, our method uses the N-repository server model and the retention replacement technique. In addition, our method accelerates the overall mining process by maintaining the meta tables so as to quickly determine whether candidate patterns have ever occurred. The various experiments with real network traffic data revealed tile efficiency of the proposed method.

네트워크가 급속도로 발달함에 따라, 네트워크 상에서 발생되는 트래픽 데이타를 대상으로 마이닝 기법을 적용하려는 연구가 활발히 진행되고 있다. 그러나 네트워크 트래픽 데이타를 대상으로 수행되는 마이닝 작업은 네트워크 사용자의 프라이버시를 침해할 여지가 있다는 문제점이 있다. 본 논문에서는 대용량 네트워크 트래픽 데이타를 대상으로 사이트의 프라이버시를 보호하면서 마이닝 결과의 정확성과 실용성을 보장할 수 있는 효율적인 순차 패턴 마이닝 기법을 제안한다. 제안된 기법은, N-저장소 서버 모델과 정보 유지 대체 기법을 사용함으로써, 각 사이트에 저장되어 있는 네트워크 데이타를 공개하지 않은 상태에서 순차 패턴 마이닝을 수행한다. 또한 후보 패턴의 발생 여부를 신속히 결정할 수 있는 메타 테이블을 유지하여 전체 마이닝 과정이 효율적으로 진행되도록 한다. 네트워크 상에서 발생한 실제 트래픽 데이타를 대상으로 다양한 실험을 수행한 결과 제안된 기법의 효율성과 정확성을 확인할 수 있었다.

Keywords

References

  1. W. Lee, S. Stolfo, and K. Mok, 'A Data Mining Framework for Building Intrusion Detection Models,' In Proceedings of IEEE Symposium on Security and Privacy, pp. 120-132, 1999 https://doi.org/10.1109/SECPRI.1999.766909
  2. S. Song, Z. Huang, H. Hu, and S. Jin, 'A Sequential Pattern Mining Algorithm for Misuse Intrusion Detection,' In Proceedings of International Workshop on Information Security and Survivability for Grid, pp. 458-465, 2004
  3. Y. Hu and B. Panda, 'A Data Mining Approach for Database Intrusion Detection,' In Proceedings of the 2004 ACM Symposium on Applied Computing, pp. 711-716, 2004 https://doi.org/10.1145/967900.968048
  4. P. Dokas, L. Ertoz, V. Kumar, A. Lazarevic, J. Srivastava, and P. Tan, 'Data Mining for Network Intrusion Detection,' In Proceedings of NSF Workshop on Next Generation Data Mining, pp. 73-81, 2002
  5. J. Luo and S. Bridges, 'Mining fuzzy association rules and fuzzy frequency episodes for intrusion detection,' International Journal of Intelligent Systems, Vol. 15, No.8, pp. 687-704, 2000 https://doi.org/10.1002/1098-111X(200008)15:8<687::AID-INT1>3.0.CO;2-X
  6. C. Clifton and D. Marks, 'Security and Privacy Implication of Data Mining,' In Proceedings of the 1996 ACM Workshop on Data Mining and Knowledge Discovery, pp. 15-19, 1996
  7. S. Rizvi and J. Haritsa, 'Maintaining Data Privacy in Association Rule Mining,' In Proceedings of the 28th Conference on Very Large Data Base, pp. 682-693, 2002
  8. R. Agrawal and R. Srikant, 'Privacy-Preserving Data Mining,' In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp, 439-450, 2000 https://doi.org/10.1145/342009.335438
  9. A. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke 'Privacy Preserving Mining of Association Rules,' In Proceedings of the 2002 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 217-228, 2002 https://doi.org/10.1145/775047.775080
  10. M. Kantarcioglu and C. Clifton, 'Privacy-Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data,' In Proceedings of the 2002 ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, pp. 24-31, 2002
  11. J. Zhan, L. Chang, and S. Matwin, 'Privacy-Preserving Collaborative Sequential Pattern Mining,' In Proceedings of Workshop on Link Analysis, Counter-terrorism and Privacy in conjunction with SIAM International Conference on Data Mining, pp. 61-72, 2004
  12. R. Agrawal and R. Srikant, 'Mining Sequential Patterns,' In Proceedings of the 11th International Conference on Data Engineering, pp. 3-14, 1995 https://doi.org/10.1109/ICDE.1995.380415
  13. R. Srikant and R. Agrawal, 'Mining Sequential Patterns: Generalizations and performance improvements,' In Proceedings of the 5th International Conference on Extending Database Technology, pp. 3-17, 1996
  14. J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M. Hsu, 'FreeSpan: Frequent pattern-projected sequential pattern mining,' In Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining, pp. 355-359, 2000 https://doi.org/10.1145/347090.347167
  15. J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M. Hsu, 'Mining Sequential Patterns by Pattern-growth: The PrefixSpan Approach,' IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 11, pp. 1424-1440, 2004 https://doi.org/10.1109/TKDE.2004.77
  16. H. Kum, J. Pei, W. Wang, and D. Duncan, 'ApproxMAP: Approximate Mining of Consensus Sequential Patterns,' In Proceedings of the 3rd SIAM International Conference on Data Mining, pp, 311-315, 2003
  17. M. Garofalakis, R. Rastogi, and K. Shim, 'SPIRIT: Sequential Pattern Mining with Regular Expression Constraints,' In Proceedings of 25th International Conference on Very Large Data Bases Conference, pp. 223-234, 1999
  18. J. Pei, J. Han, and W. Wang, 'Mining Sequential Patterns with Constraints in Large Databases,' In Proceedings of the 11th Conference on Information and Knowledge Management, pp. 18-25, 2002 https://doi.org/10.1145/584792.584799
  19. F. Masseglia, P. Poncelet, and M. Teisseire, 'Incremental Mining of Sequential Patterns in Large Databases,' Data and Knowledge Engineering, Vol. 46, Issue 1, pp. 97-121, 2003 https://doi.org/10.1016/S0169-023X(02)00209-4
  20. R. Agrawal, R. Srikant, and D. Thomas, 'Privacy Preserving OLAF,' In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 251-262, 2005 https://doi.org/10.1145/1066157.1066187