Browse > Article

Index-based Searching on Timestamped Event Sequences  

박상현 (연세대학교 컴퓨터과학과)
원정임 (연세대학교 컴퓨터과학과)
윤지희 (한림대학교 정보통신공학부)
김상욱 (한양대학교 정보통신학부)
Abstract
It is essential in various application areas of data mining and bioinformatics to effectively retrieve the occurrences of interesting patterns from sequence databases. For example, let's consider a network event management system that records the types and timestamp values of events occurred in a specific network component(ex. router). The typical query to find out the temporal casual relationships among the network events is as fellows: 'Find all occurrences of CiscoDCDLinkUp that are fellowed by MLMStatusUP that are subsequently followed by TCPConnectionClose, under the constraint that the interval between the first two events is not larger than 20 seconds, and the interval between the first and third events is not larger than 40 secondsTCPConnectionClose. This paper proposes an indexing method that enables to efficiently answer such a query. Unlike the previous methods that rely on inefficient sequential scan methods or data structures not easily supported by DBMSs, the proposed method uses a multi-dimensional spatial index, which is proven to be efficient both in storage and search, to find the answers quickly without false dismissals. Given a sliding window W, the input to a multi-dimensional spatial index is a n-dimensional vector whose i-th element is the interval between the first event of W and the first occurrence of the event type Ei in W. Here, n is the number of event types that can be occurred in the system of interest. The problem of‘dimensionality curse’may happen when n is large. Therefore, we use the dimension selection or event type grouping to avoid this problem. The experimental results reveal that our proposed technique can be a few orders of magnitude faster than the sequential scan and ISO-Depth index methods.hods.
Keywords
Sequence database; Event sequence; Multi-dimensional index;
Citations & Related Records
연도 인용수 순위
  • Reference
1 C. Shannon and W. Weaver, The Mathematical Theory of Communication, University of Illinois Press, 1964
2 S. Berchtold, D. A. Keim, and H.-P. Kriegel, 'The X-tree: An Index Structure for High-Dimensional Data,' In Proc Int'l. Conf. on Very Large Data Bases, VLDB, pp. 28-39, 1996
3 N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger, 'The R-tree: An Efficient and Robust Access Method for Points and Rectangles,' In Proc. Int'l conf. on Management of Data, ACM SIGMOD, pp. 322-331, 1990
4 C. Faloutsos and K. Lin, 'FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets,' In Proc. Int'l. Conf. on Management of Data, ACM SIGMOD, pp. 163-174, 1995   DOI
5 R. Agrawal, C. Faloutsos, and A. Swami, 'Efficient Similarity Search in Sequence Databases,' In Proc. Int'l. Conference on Foundations of Data Organization and Algorithms, FODO, pp. 69-84, 1993
6 L. Hammel and J. Patel, 'Searching on the Secondary Structure of Protein Sequences,' In Proc. 28th Int'l Conf. on Very Large Data Bases, pp. 634-645, 2002
7 K. Chakrabarti and S. Mehrotra, 'The Hybrid Tree : An Index Structure for High Dimensional Feature Spaces,' Proc. the 15th International Conference on Data Engineering, pp.440-447, 1999   DOI
8 H. Wang, C. Perng, W. Fan, S. Park, and P. Yu, 'Indexing Weighted Sequences in Large Databases,' In Proc. 19th Irit'l Conf. on Data Engineering, pp. 63-74, 2003   DOI
9 G. A. Stephen, String Searching Algorithms, World Scientific Publishing, 1994
10 S. Park, D Lee, and W. Chu, 'Fast Retrieval of Similar Subsequences in Long Sequence Databases,' In Proc. 3rd IEEE Knowledge and Data Engineering Exchange Workshop (IEEE KDEX) , pp. 60-67, 1999   DOI
11 M-S Chen, J. Han, and Philip S. Yu, 'Data Mining : An Overview from a Database Perspective,' IEEE Transactions on Knowledge and Data Engineering, 8(6) : pp.866-883, 1996   DOI   ScienceOn