Browse > Article
http://dx.doi.org/10.3745/KIPSTD.2009.16D.6.845

k-Bitmap Clustering Method for XML Data based on Relational DBMS  

Lee, Bum-Suk (가톨릭대학교 컴퓨터공학과)
Hwang, Byung-Yeon (가톨릭대학교 컴퓨터정보공학부)
Abstract
Use of XML data has been increased with growth of Web 2.0 environment. XML is recognized its advantages by using based technology of RSS or ATOM for transferring information from blogs and news feed. Bitmap clustering is a method to keep index in main memory based on Relational DBMS, and which performed better than the other XML indexing methods during the evaluation. Existing method generates too many clusters, and it causes deterioration of result of searching quality. This paper proposes k-Bitmap clustering method that can generate user defined k clusters to solve above-mentioned problem. The proposed method also keeps additional inverted index for searching excluded terms from representative bits of k-Bitmap. We performed evaluation and the result shows that the users can control the number of clusters. Also our method has high recall value in single term search, and it guarantees the searching result includes all related documents for its query with keeping two indices.
Keywords
XML; Bitmap; Clustering; Indexing; Relational DBMS; Performance Evaluation;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 J. Yoon, V. Raghavan, V. Chakilam, and L. Kerschberg, “BitCube: A Three-Dimensional Bitmap Indexing for XML Documents,” Journal of Intelligent Information System, Vol.17, pp.241-254, 2001   DOI   ScienceOn
2 J. Yoon, V. Raghavan, and V. Chakilam, “BitCube: Clustering and Statistical Analysis for XML Documents,” In Proc. of the 13th International Conference on Scientific and Statistical Database Management, Fairfax, Virginia, July, 2001
3 민경섭, 김형주, “상이한 구조의 XML 문서들에서 경로 질의 처리를 위한 RDBMS 기반 역인덱스 기법”, 정보과학회논문지, 제30권 제4호, pp.420-428, 2003   과학기술학회마을
4 서치영, 이상원, 김형주, “XML 문서에 대한 RDBMS에 기반을둔 효율적인 역색인 기법”, 정보과학회논문지, 제30권 제1호, pp.27-40, 2003   과학기술학회마을
5 J. McHugh, S. Abiteboul, R. Goldman, D. Quass, and J. Widom, “Lore: A Database Management System for Semistructured Data,” ACM SIGMOD Record, Vol.26, No.3, pp.54-66, 1997   DOI   ScienceOn
6 김의찬, 황병연, “데이터 마이닝에서 비트 트랜잭션 클러스터링을 이용한 빈발항목 생성”, 정보처리학회논문지D, 제13-D권, 제3호, pp.293-298, 2006   과학기술학회마을   DOI
7 M. Olson and U. Oqbuji, “The Python Web Service Developer: RSS for Python,” http://www.ibm.com/developerworks/webservices/library/ws-pyth11.html, November, 2002
8 T. Dalamagas, T. Cheng, K. J. Winkel, and T. Sellis, “A Methodology for Clustering XML Documents by Structure,” Information Systems, Vol.31, Issue 3, Elsevier Science Ltd., pp.187-228, May, 2006   DOI   ScienceOn
9 C. Chung, J. Min, and K. Shim, “APEX: An Adaptive Path Index for XML Data,” In Proc. of the International Conference on ACM SIGMOD, pp.121-132, Madison, Wisconsin, June, 2002   DOI
10 R. Kaushik, P. Shenoy, P. Bohannon, and E. Gudes, “Exploiting Local Similarity for Indexing Paths in Graph-Structured Data,” In Proc. of the 18th IEEE International Conference on Data Engineering, pp.129-140, 2002
11 T. Tran, R. Nayak, and P. Bruza, “Combining Structure and Content Similarities for XML Document Clustering,” In Proc. of the 7th Australasian Data Mining Conference, pp.219-226, 2008
12 J. Lee and B. Hwang, “Path Bitmap Indexing for Retrieval of XML Documents,” Lecture Notes in Computer Science, Vol.3885, Springer-Verlag, April, 2006   DOI   ScienceOn