Browse > Article
http://dx.doi.org/10.3745/KIPSTD.2004.11D.7.1357

Structure-based Clustering for XML Document Retrieval  

Hwang Jeong Hee (충북대학교 대학원 전자계산학과)
Ryu Keun Ho (충북대학교 전기전자 컴퓨터공학부)
Abstract
As the importance or XML is increasing to manage information and exchange data efficiently in the web, there are on going works about structural integration and retrieval. The XML. document with the defined structure can retrieve the structure through the DTD or XML schema, but the existing method can't apply to XML. documents which haven't the structure information. Therefore. in this paper we propose a new clus-tering technique at a basic research which make it possible to retrieve structure fast about the XML documents that haven't the structure information. We first estract the feature of frequent structure from each XML document. And we cluster based on the similar structure by con-sidering the frequent structure as representative structure of the XML document, which makes it possible to retrieve the XML document raster than dealing with the whole documents that have different structure. And also we perform the structure retrieval about XML documents based on the clusters which is the group of similar structure. Moreover, we show efficiency of proposed method to describe how to apply the structure retrieval as well as to display the example of application result.
Keywords
문서 클러스터링;XML 클러스터링;XML 문서;구조적 유사성;XML 구조검색;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Sakamoto, 'Efficient Substructure Discovery from Large Semi-structured Data,' Proceedings of the SIAM International Conference on Data Mining, 2002
2 R. Nayak, H. Witt, A. Tonev, 'Data Mining and XML Documents,' International Conference on Internet Computing, 2002
3 M. L. Lee, L. H. Yang, W. Hsu, X. Yang, 'XClust : Clustering XML Schemas for Effective Integration,' Proceedings of the ACM International Conference on Information and Knowledge Management, 2002   DOI
4 Z. Zhang, R. Li, S. Cao, Y. Zhu, 'Similarity Metric for XML Documents,' Workshop on Knowledge and Experience Management(FGWM) 2003
5 J. Madhavan, P. A. Bernstein, E. Rahm, 'Generic Schema Matching with Cupid,' Proceedings of VLDB., 2001
6 J. T. Wang, D. Shasha, G. J. S. Chang, 'Structural Matching and Discovery in Document Databases,' Proceedings of the ACM SIGMOD on Management of Data, 1997   DOI
7 J. H. Hwang, K. H. Ryu, 'Incremental Clustering of XML Documents Based on Similar Structure,' to be published in KISS   과학기술학회마을
8 KIAGARA query engine, http://www.cs.wisc.edu/niagara/data.html
9 M. Garafalalos, A. G. R. Rastogi, S. Seshadri, K. Shim, 'XTRACT : A System for Extracting Document Type Descriptors from XML Documents,' Proceedings of the ACM SIGMOD, 2000   DOI
10 S. W. Kim, et al., 'Indexing and Retrieval of XML-encoded Structured Documents in Dynamic Environment,' Lecture Notes in Computer Science(LNCS) Vol.24, No.80, 2002
11 W3C, Extensible Markup Language(XML) 1.1, http://www.w3.org/TR/xml11, W3C Working Draft. April, 2002
12 K. Wang, C. Xu, 'Clustering Transactions Using Large Items,' Proceedings of ACM CIKM-99, 1999   DOI
13 K. Winkler, M. Spiliopoulou, 'Employing Text Mining for Semantic Tagging in DIAsDEM,' KI, Vol.16, No.2, 2002
14 J. Pei, J. Han, B. M. Asi, H. Pinto, 'PrefixSpan : Mining Sequential Pattern Efficiently by Prefix-Projected Pattern Growth,' Proceedings of International Conference on Data Engineering(ICDE), 2001
15 Y. Yang, X. Guan, J. You, 'CLOPE : A fast and effective clustering algorithm for transaction data' Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002   DOI
16 J. Yoon, V. Raghavan. V. Chakilam, 'BitCube : Clustering and Statistical Analysis for XML Documents,' Proceedings of the International Conference on Scientific and Statistical Database Management, 2001   DOI
17 J. Widom, 'Data Management for XML : Research Directions,' IEEE Computer Society Technical Committee on Data Engineering, 1999
18 A. Termier, M. C. Rouster, M. Sebag, 'TreeFinder : A First Step towards XML Data Mining,' Proceedings of IEEE International Conference on Data Mining(ICDM), 2002   DOI
19 J. W. Lee, K. Lee, W. Kim, 'Preparation for Semantics-Based XML Mining,' Proceedings of IEEE International Conference on Data Mining (ICDM), 2001   DOI
20 F. D. Francesca, G. Gordano, G. Manco, R. Ortale, A. Tagarelli, 'A General Framework for XML Document Clustering,' Technical report, n(8), ICAR-CNR, 2003
21 A. G. Buchner, M. Baumgarten, M. D. Mulvenna, R. Bohm, S. S. Anand, 'Data Mining and XML : Current and Future Issues,' Proceedings of WISE, 2000   DOI
22 M. Zaki, 'Efficiently Mining Frequent Tree in a Forest,' Proceedings of the ACM SIGKDD International Conference, 2002   DOI
23 E. Kotasakis, 'Structural Information Retrieval in XML Documents,' ACM Symposium on Applied Computing (SAC), 2002   DOI