Browse > Article

An Incremental Clustering Technique of XML Documents using Cluster Histograms  

Hwang, Jeong-Hee (남서울대학교 컴퓨터학과)
Abstract
As a basic research to integrate and to retrieve XML documents efficiently, this paper proposes a clustering method by structures of XML documents. We apply an algorithm processing the many transaction data to the clustering of XML documents, which is a quite different method from the previous algorithms measuring structure similarity. Our method performs the clustering of XML documents not only using the cluster histograms that represent the distribution of items in clusters but also considering the global cluster cohesion. We compare the proposed method with the existing techniques by performing experiments. Experiments show that our method not only creates good quality clusters but also improves the processing time.
Keywords
XML Mining; XML Clustering; Structure Extraction; XML Document;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 J. Pei, J. Han, B. M. Asi, H. Pinto, 'PrefixSpan: Mining Sequential Pattern Efficiently by Prefix-Projected Pattern Growth,' Proceedings of International Conference on Data Engineering(ICDE), 2001
2 NIAGARA query engine. http://www.cs.wisc.edu/niagara/data.html
3 http://www.acm.org/sigmod/record/xml, 2001
4 J. W. Lee, K. Lee, W. Kim, 'Preparation for Semantics-Based XML Mining,' Proceedings of IEEE International Conference on Data Mining(ICDM), 2001   DOI
5 T. Dalamagas, T. Cheng, K. J. Winkel, and T. Sellis, 'Clustering XML Document by Structure,' The 3rd Helenic Conference on AL. SETN, 2004
6 J. H. Hwang, K. H. Ryu, 'A Clustering Technique using Common Structures of XML Documents,' KISS, Vol.32, No.6, 2005   과학기술학회마을
7 http://www.cogsci.princeton.edu/~wn/wn2.0
8 Y. Shen and B. Wang, 'Clustering Schemaless XML Documents,' International Conference on Ontologies, Databases and Applications of SEmantics(ODBASE), 2003
9 R. Nayak, R. Witt, A. Tonev, 'Data Mining and XML Documents,' International Conference on Internet Computing, 2002
10 K. Wang and H. Liu, 'Discovery Typical Structures of Documents: A Road Map Approach,' ACM SIGIR Conference on Information Retrieval, 1998
11 M. Zaki, 'Efficiently Mining Frequent Tree in a Forest,' Proceedings of the ACM SIGKDD International Conference, 2002   DOI   ScienceOn
12 A. Termier, M. C. Rouster, M. Sebag, 'Tree-Finder: A First Step towards XML Data Mining,' Proceedings of IEEE International Conference on Data Mining (ICDM), 2002
13 M. L. Lee, L. H. Yang, W. Hsu, X. Yang, 'XClust: Clustering XML Schemas for Effective Integration,' Proceedings of the ACM International Conference on Information and Knowledge Management, 2002   DOI
14 A. Doucet, H. A. Myka, 'Naive Clustering of a Large XML Document Collection,' Proceedings of INEX Workshop, 2002
15 J. Yoon, V. Raghavan, V. Chakilam, 'BitCube: Clustering and Statistical Analysis for XML Documents,' Proceedings of the International Conference on Scientific and Statistical Database Management, 2001
16 D. Braga, A. Campi, S. Ceri, M. Klemettinen, and P. Lanzi, 'A Tool for Extracting XML Association Rules from XML Documents,' Proceedings of IEEE-ICTAI 2002, USA, November 2002