Browse > Article
http://dx.doi.org/10.3745/KTSDE.2017.6.11.507

An Index-Based Search Method for Performance Improvement of Set-Based Similar Sequence Matching  

Lee, Juwon (연세대학교 전산학과)
Lim, Hyo-Sang (연세대학교 컴퓨터정보통신공학부)
Publication Information
KIPS Transactions on Software and Data Engineering / v.6, no.11, 2017 , pp. 507-520 More about this Journal
Abstract
The set-based similar sequence matching method measures similarity not for an individual data item but for a set grouping multiple data items. In the method, the similarity of two sets is represented as the size of intersection between them. However, there is a critical performances issue for the method in twofold: 1) calculating intersection size is a time consuming process, and 2) the number of set pairs that should be calculated the intersection size is quite large. In this paper, we propose an index-based search method for improving performance of set-based similar sequence matching in order to solve these performance issues. Our method consists of two parts. In the first part, we convert the set similarity problem into the intersection size comparison problem, and then, provide an index structure that accelerates the intersection size calculation. Second, we propose an efficient set-based similar sequence matching method which exploits the proposed index structure. Through experiments, we show that the proposed method reduces the execution time by 30 to 50 times then the existing methods. We also show that the proposed method has scalability since the performance gap becomes larger as the number of data sequences increases.
Keywords
Set Similarity; Similar Sequence Matching; Set Index;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Babcock, Brian et al., "Models and issues in data stream systems," Proceedings of the Twenty-First ACM SIGMOD-SIGACT- SIGART Symposium on Principles of Databased Systems, ACM, 2002.
2 Eunji, Yeo, Juwon, Lee, and Hyo-Sang, Lim, "A Similar Data Stream Matching Method by Using the Concept of Item-Set Time Series," Korea Computer Congress, pp.237-239, 2016.
3 Eunji, Yeo, "A Data Stream Similar Sequence Matching technique Using the Concept of Item Set and Hierarchy," M. Sc. thesis, Yonsei University, 2016.
4 Eunji,Yeo, Juwon, Lee, and Hyo-Sang, Lim, "Set-based Subsequence Matching," KIISE SIGDB, Vol.32, No.3, pp.152-169, 2016.
5 Juwon, Lee, Daewon, Kim, and Hyo-Sang Lim, "An index Technique for Efficiently Measuring Set Similarities," KIISE Winter Conference, pp.214-216, 2016.
6 Jaccard, Paul, "Etude comparative de la distribution florale dans une portion des Alpes et du Jura. Impr. Corbaz, 1901.
7 Sorensen, Thorvald, "A method of establishing groups of equal amplitude in plant sociology Indexedd on similarity of species and its application to analyses of the vegetation on Danish commons," Biol. Skr., 5, pp.1-34, 1948.
8 Faloutsos, Christos, Mudumbai Ranganathan, and Yannis Manolopoulos, "Fast subsequence matching in time-series database," ACM, Vol.23, No.2, 1994.
9 Yang-Sae, Moon, Kyu-Young Whang, and Woong-Kee Loh, "Efficient time-series subsequence matching using duality in constructing windows," Information Systems, Vol.26, No.4, pp.279-293, 2001.   DOI