Browse > Article
http://dx.doi.org/10.3745/KIPSTD.2008.15-D.2.171

An Indexing System for Retrieving Similar Paths in XML Documents  

Lee, Bum-Suk (가톨릭대학교 컴퓨터공학과)
Hwang, Byung-Yeon (가톨릭대학교 컴퓨터정보공학부)
Abstract
Since the XML standard was introduced by the W3C in 1998, documents that have been written in XML have been gradually increasing. Accordingly, several systems have been developed in order to efficiently manage and retrieve massive XML documents. BitCube-a bitmap indexing system-is a representative system for this field of research. Based on the bitmap indexing technique, the path bitmap indexing system(LH06), which performs the clustering of similar paths, improved the problem that the existing BitCube system could not solve, namely, determining similar paths. The path bitmap indexing system has the advantage of a higher retrieval speed in not only exactly matched path searching but also similar path searching. However, the similarity calculation algorithm of this system has a few particular problems. Consequently, it sometimes cannot calculate the similarity even though some of two paths have extremely similar relationships; further, it results in an increment in the number of meaningless clusters. In this paper, we have proposed a novel method that clustering, the similarity between the paths in order to solve these problems. The proposed system yields a stable result for clustering, and it obtains a high score in clustering precision during a performance evaluation against LH06.
Keywords
XML; Indexing System; Similar Path; Clustering;
Citations & Related Records
연도 인용수 순위
  • Reference
1 J. Yoon, V. Raghavan, and V. Chakilam, “BitCube: Clustering and Statistical Analysis for XML Documents,” In Proc. of the 13th Int'l Conf. on Scientific and Statistical Database Management, Virginia, Jul., 2001
2 XQEngine. http://www.fatdog.com
3 Jae-Min Lee and Byung-Yeon Hwang, “Path Bitmap Indexing for Retrieval of XML Documents,” Lecture Notes in Computer Science, Vol.3885, Springer- Verlag, Apr., 2006
4 NewsML, http://www.newsml.org
5 C. J. van Rijisbergen. “Information Retrieval,” Butterworths, London, 1979
6 http://www.w3.org/TR/2000/REC-xml-20001006
7 T. Dalamagas, T. Cheng, K. J. Winkel, and T. Sellis, “Clustering XML documents using structural summaries,” In Proc. of the EDBT Workshop on Clustering Information over the Web (ClustWeb04), Heraklion, Greece, 2004
8 J. H. Hwang and K. H. Ryu, “Clustering and Retrieval of XML Documents by Structure,” Lecture Notes in Computer Science, Vol.3481, Springer Berlin, 2005   DOI   ScienceOn
9 D. Egnor and R. Lord, “XYZFind: Structured Searching in Context with XML,” In Proc. of ACM SIGIR Workshop, Athens, Greece, 2000
10 T. Dalamagas, T. Cheng, K. J. Winkel, and T. Sellis, “A Methodology for Clustering XML Documents by Structure,” Information Systems, Vol.31, Issue3, Elsevier Science Ltd., pp.187-228, May, 2006   DOI   ScienceOn
11 U. Park and Y. Seo, “An Implementation of XML Documents Search System based on Similarity in Structure and Semantics,” In Proc. of the Web Information Retrieval and Integration, 2005(WIRI '05), pp. 97-103, April, 2005
12 J. P. Yoon, V. Raghavan, V. Chakilam, and L. Kerschberg, “BitCube: A Three-Dimensional Bitmap Indexing for XML Documents,” Journal of Intelligent Information System, Vol.17, pp.241-254, 2001   DOI   ScienceOn