Browse > Article
http://dx.doi.org/10.3745/KIPSTD.2011.18D.6.415

Efficient Inverted List Search Technique using Bitmap Filters  

Kwon, In-Teak (전북대학교 전자정보공학부)
Kim, Jong-Ik (전북대학교 컴퓨터공학부)
Abstract
Finding similar strings is an important operation because textual data can have errors, duplications, and inconsistencies by nature. Many algorithms have been developed for string approximate searches and most of them make use of inverted lists to find similar strings. These algorithms basically perform merge operations on inverted lists. In this paper, we develop a bitmap representation of an inverted list and propose an efficient search algorithm that can skip unnecessary inverted lists without searching using bitmap filters. Experimental results show that the proposed technique consistently improve the performance of the search.
Keywords
String Similarity Search; Similarity Search; Bitmap Filter;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 권인택, 김종익, "비트맵 필터를 이용한 효율적인 유사 문자열 검색 기법", 제 35회 한국정보처리학회 춘계학술대회 논문집, 제 18권 제 1호, pp.1298-1301, 2011.   과학기술학회마을
2 S. Sarawagi and A. Kirpal, "Efficient set joins on similarity predicates," SIGMOD, pp743-755, 2004.
3 C. Xiao, W. Wang, and X. Lin, "Ed-Join: an efficient algorithm for similarity joins with edit distance constraints," VLDB, 2008.   DOI
4 S. Chaudhuri, V. Ganti, and R. Kaushik, "A Primitive Opeartor for Similarity Joins in Data Cleaning," ICDE, pp.5-5, 2006.   DOI
5 C. Xiao, W. Wang, X. Lin, and Jeffrey Xu Yu, "Efficient Similarity Joins for Near Duplicate Detection", WWW, 2008.
6 Roberto J. Bayardo, Y. Ma, and R. Crikant, "Scaling Up All Pairs Simialrity Search", WWW, 2007.
7 Leonardo Andrade Ribeiro, and Theo Harder, "Generalizing prefix filtering to improve set similarity joins", Information Systems, 2010.   DOI   ScienceOn
8 C. Li, J. Lu, and Y. Lu, "Efficient Merging and Filtering Algorithms for Approximate String Searches," ICDE, pp.257-266, 2008.   DOI
9 A. Behm, S. Ji, C. Li, and J. Lu, "Space-Constrained Gram-Based Indexing for Efficient Approximate String Search," ICDE, pp.604-615, 2009.   DOI
10 C. Li, B. Wang, and X. Yang, "VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams," VLDB, pp.303-314, 2007.
11 S. Chaudhuri, K. Ganjam, V. Ganti, R. Kapoor, Vivek R. Narasayya, Theo Vassilakis, "Data cleaning in microsoft SQL server 2005," SIGMOD, pp.918-920, 2005.
12 X. Yang, B. Wang, and C. Li, "Cost-Based Variable-Length-Gram Selection for String Collections to Support Approximate Queries Efficiently," SIGMOD, 2008.   DOI
13 A. Arasu, V. Ganti, and R. Kaushik, "Efficient Exact Set-Similarity Joins," VLDB, pp.918-929, 2006.
14 K. Chakrabarti, S. Chaudhuri, V. Ganti, and D. Xin, "An Efficient Filter for Approximate Membership Checking," SIGMOD, 2008.
15 N. Okazaki and J. Tsujii, "Simple and Efficient Algorithm for Approximate Dictionary Matching," In proc. of the 23rd International Conference on Computational Linguistics, pp.851-859, 2010.
16 J. Barbay and C. Kenyon, "Adaptive intersection and t-threshold problems," SODA, pp.390-399, 2002.
17 N. Koudas, S. Sarawagi, and D. Srivastava, "Record linkage: Similarity measures and algorithms," SIGMOD, 2006.