An Efficient Algorithm for Mining Interactive Communication Sequence Patterns

대화형 통신 순서열 패턴의 마이닝을 위한 효율적인 알고리즘

  • Published : 2009.06.15

Abstract

Communication log data consist of communication events such as sending and receiving e-mail or instance message and visiting web sites, etc. Many countries including USA and EU enforce the retention of these data on the communication service providers for the purpose of investigating or detecting criminals through the Internet. Because size of the retained data is very large, the efficient method for extracting valuable information from the data is needed for Law Enforcement Authorities to use the retained data. This paper defines the Interactive Communication Sequence Patterns(ICSPs) that is the important information when each communication event in communication log data consists of sender, receiver, and timestamp of this event. We also define a Mining(FDICSP) problem to discover such patterns and propose a method called Fast Discovering Interactive Communication Sequence Pattern(FDICSP) to solve this problem. FDICSP focuses on the characteristics of ICS to reduce the search space when it finds longer sequences by using shorter sequences. Thus, FDICSP can find Interactive Communication Sequence Patterns efficiently.

통신 기록 데이타는 이메일이나 인스턴스 메시지를 주고 받거나, 웹사이트에 접속하는 것과 같은 통신 이벤트들로 구성된다. 미국과 유럽연합을 포함한 여러 나라에서는 인터넷을 사용한 범죄의 조사와 발견을 위해서 통신 서비스 제공자에게 이런 데이타를 보관하도록 규정하고 있다. 보관되는 통신 기록 데이타의 크기가 매우 크기 때문에 치안당국이 이 데이타를 사용하기 위해서는 필요한 정보만을 효과적으로 추출해내는 방법이 필요하다. 본 논문에서는 발신자, 수신자, 통신발생시각의 세 가지 정보만 포함하는 통신 이벤트가 주어질 때, 의미 있는 정보 중 하나인 대화형 통신 순서열 패턴과 이러한 패턴의 마이닝 문제를 정의하고 것을 해결하기 위해 Fast Discovering Interactive Communication Sequence Patterns (FDICSP)라 불리는 알고리즘을 제안한다. FDICSP는 길이가 짧은 대화형 통신 순서열을 조합하여 길이가 긴 대화형 통신 순서열을 생성 해나가는데, 대화형 통신 순서열의 특성에 초점을 맞춘 작업을 통해 효율적으로 대화형 통신 순서열 패턴을 찾는다.

Keywords

References

  1. DIRECTIVE 2006/24/EC OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 15 March 2006 on the retention of data generated or processed in connection with the provision of publicly available electronic communications services or of public communications networks and amen-ding Directive 2002/58/EC
  2. ETSI TS 101 331 - Telecommunications security; Lawful Interception(LI); Requirements of Law Enforcement Agencies
  3. ETSI TS 101 671 - Telecommunications security; Lawful Interception(LI): Handover interface for the lawful interception of telecommunications traffic
  4. Philip A. Branch. Lawful Interception of the Internet. Centre for Advanced Internet Architectures. Technical Report 030606A
  5. H. Mannila, H. Toivonen. and A. I. Verkamo, 'Discovering frequent episodes in sequences,' Proc. of the Int'l Conference on Knowledge Discovery in Databases and Data Mining(KDD-95), pp. 210-215, 1995
  6. R. Agrawal and R. Srikant, 'Mining Sequential Patterns: Generalizations and Performance Improvements,' Advances in Database Technology EDBT'96, 5th International Conference on Extending Database Technology, pp. 3-17, 1996 https://doi.org/10.1007/BFb0014140
  7. R. Agrawal and R Srikant, 'Fast algorithms for mining association rules,' Proc. of the 20th VLDB Conference, pp, 487-499, 1994
  8. M. Zaki, 'SPADE: An Efficient Algorithm for Mining Frequent Sequences,' Machine Learning 40, pp. 31-60, 2001 https://doi.org/10.1023/A:1007652502315
  9. S. Aseervatham, A. Osmani and E. Viennet, 'Bitspade: A Lattice-Based Sequential Pattern Mining Algorithm Using Bitmap Representation,' Proc. Sixth Int'l Conf. Data Mining(ICDM), pp. 792-797, 2006 https://doi.org/10.1109/ICDM.2006.28
  10. J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M. Hsu, 'Prefixopan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth,; Proc. of the 17th International Conference on Data Engineering (ICDE'0l), pp. 215-224, 2001 https://doi.org/10.1109/ICDE.2001.914830
  11. J. Pei, B. Mortazavi Asl, J. Wang, H. Pinto, Q. Chen, U Dayal, and M. Hsu, 'Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach,' IEEE Transactions on Knowledge and Data Engineering 16, pp. 1424-1440, 2004 https://doi.org/10.1109/TKDE.2004.77
  12. C. Yu and Y. Chen, 'Mining Sequential Patterns from Multidimensional Sequence Data,' IEEE Trans actions on Knowledge and Data Engineering, v.17, n.1, pp. 136-140, 2005 https://doi.org/10.1109/TKDE.2005.13
  13. C. Raissi and M. Plantevit, 'Mining Multidimensional Sequential Patterns over Data Streams,' LNCS 5182, pp. 263-272, 2008 https://doi.org/10.1007/978-3-540-85836-2_25
  14. M. Plantevit, A. Laurent, M. Teisseire, 'HYPE: mining hierarchical sequential patterns,' Proc. of the 9th ACM international workshop on Data warehousing and OLAP(DOLAP), pp. 19-26, 2006 https://doi.org/10.1145/1183512.1183518