Browse > Article
http://dx.doi.org/10.3745/KIPSTD.2007.14-D.1.001

Clustering XML Documents Considering The Weight of Large Items in Clusters  

Hwang, Jeong-Hee (남서울대학교)
Abstract
As the web document of XML, an exchange language of data in the advanced Internet, is increasing, a target of information retrieval becomes the web documents. Therefore, there we researches on structure, integration and retrieval of XML documents. This paper proposes a clustering method of XML documents based on frequent structures, as a basic research to efficiently process query and retrieval. To do so, first, trees representing XML documents are decomposed and we extract frequent structures from them. Second, we perform clustering considering the weight of large items to adjust cluster creation and cluster cohesion, considering frequent structures as items of transactions. Third, we show the excellence of our method through some experiments which compare which the previous methods.
Keywords
XML Structure; Document Clustering; XML Clustering; XML Document;
Citations & Related Records
연도 인용수 순위
  • Reference
1 A. Doucet, H. A. Myka, 'Naive Clustering of a Large XML Document Collection,' Proceedings of INEX Workshop, 2002
2 M. L. Lee, L. H. Yang, W. Hsu, X. Yang, 'XClust: Clustering XML Schemas for Effective Integration,' Proceedings of the ACM International Conference on Information and Knowledge Management, 2002   DOI
3 S. W. Kim, et ai, 'Indexing and Retrieval of XML-encoded Structured Documents in Dynamic Environment', Lecture Notes in Computer Science(LNCS) Vol. 2480, 2002   DOI
4 J. Pei, J. Han, B. M. Asi, H. Pinto, 'PrefixSpan: Mining Sequential Pattern Efficiently by Prefix-Projected Pattern Growth,' Proceedings of International Conference on Data Engineering(ICDE), 2001
5 http://www.cogsci.princeton.edu/~wn/wn2.0
6 A. Termier, M. C. Rouster, M. Sebag, 'TreeFinder: A First Step towards XML Data Mining,' Proceedings of IEEE International Conference on Data Mining (ICDM), 2002   DOI
7 T. Miyahara, Y. Suzuki, T. Shoudai, T. Uchida, K. Takahashi, and H. Ueda, 'Discovery of Frequent Tag Tree Patterns in Semistructured Web Documents,' The 6th Pacific Asia Conference, Advances in Knowledge Discovery and Data Mining (PAKDD), 2002   DOI
8 J. T. Wang, D. Shasha, G. J. S. Chang, 'Structural Matching and Discovery in Document Databases,' Proceedings of the ACM SIGMOD on Management of Data, 1997   DOI   ScienceOn
9 J. Yoon, V. Raghavan, V. Chakilam, 'BitCube: Clustering and Statistical Analysis for XML Documents,' Proceedings of the International Conference on Scientific and Statistical Database Management, 2001
10 M. Zaki, 'Efficiently Mining Frequent Tree in a Forest,' Proceedings of the ACM SIGKDD International Conference, 2002   DOI
11 A. Deutsch, M. F. Fernandez, and D. Suciu, 'Storing Semistructured Data with STORED,' Proceedings of ACM SIGMOD International Conference on Management of Data, pp.431-442, 1999   DOI
12 http://www.acm.org/sigmod/record/xml, 2001
13 Y. Yang, X. Guan, J. You, 'CLOPE : A fast and effective clustering algorithm for transaction data,' Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002   DOI
14 D. Katsaros, 'Efficiently Maintaining Structural Associations of Semistructured Data,' Panhellenic Conference on Informatics, LNCS 2563, 2003   DOI
15 K. Wang and H. Liu, 'Discovery Typical Structures of Documents: A Road Map Approach,' In ACM SIGIR Conference on Information Retrieval, 1998   DOI
16 J. H. Hwang, K. H. Ryu, 'A Clustering Technique using Common Structures of XML Documents,' KISS, Vol.32, No.6, 2005   과학기술학회마을
17 NIAGARA query engine. http://www.cs.wisc.edu/niagara/data.html