[KSCI] Korea Science Citation Index Service

An Incremental Clustering Technique of XML Documents using Cluster Histograms

Hwang, Jeong-Hee (남서울대학교 컴퓨터학과)

Publication Information

Journal of KIISE:Databases / v.34, no.3, 2007 , pp. 261-269 More about this Journal

Abstract

As a basic research to integrate and to retrieve XML documents efficiently, this paper proposes a clustering method by structures of XML documents. We apply an algorithm processing the many transaction data to the clustering of XML documents, which is a quite different method from the previous algorithms measuring structure similarity. Our method performs the clustering of XML documents not only using the cluster histograms that represent the distribution of items in clusters but also considering the global cluster cohesion. We compare the proposed method with the existing techniques by performing experiments. Experiments show that our method not only creates good quality clusters but also improves the processing time.

Keywords

XML Mining; XML Clustering; Structure Extraction; XML Document;

Citations & Related Records

Times Cited By KSCI : 1 (Citation Analysis)

Reference
Cited By KSCI

1	J. Pei, J. Han, B. M. Asi, H. Pinto, 'PrefixSpan: Mining Sequential Pattern Efficiently by Prefix-Projected Pattern Growth,' Proceedings of International Conference on Data Engineering(ICDE), 2001
2	NIAGARA query engine. http://www.cs.wisc.edu/niagara/data.html
3	http://www.acm.org/sigmod/record/xml, 2001
4	J. W. Lee, K. Lee, W. Kim, 'Preparation for Semantics-Based XML Mining,' Proceedings of IEEE International Conference on Data Mining(ICDM), 2001 DOI
5	T. Dalamagas, T. Cheng, K. J. Winkel, and T. Sellis, 'Clustering XML Document by Structure,' The 3rd Helenic Conference on AL. SETN, 2004
6	J. H. Hwang, K. H. Ryu, 'A Clustering Technique using Common Structures of XML Documents,' KISS, Vol.32, No.6, 2005 과학기술학회마을
7	http://www.cogsci.princeton.edu/~wn/wn2.0
8	Y. Shen and B. Wang, 'Clustering Schemaless XML Documents,' International Conference on Ontologies, Databases and Applications of SEmantics(ODBASE), 2003
9	R. Nayak, R. Witt, A. Tonev, 'Data Mining and XML Documents,' International Conference on Internet Computing, 2002
10	K. Wang and H. Liu, 'Discovery Typical Structures of Documents: A Road Map Approach,' ACM SIGIR Conference on Information Retrieval, 1998
11	M. Zaki, 'Efficiently Mining Frequent Tree in a Forest,' Proceedings of the ACM SIGKDD International Conference, 2002 DOI ScienceOn
12	A. Termier, M. C. Rouster, M. Sebag, 'Tree-Finder: A First Step towards XML Data Mining,' Proceedings of IEEE International Conference on Data Mining (ICDM), 2002
13	M. L. Lee, L. H. Yang, W. Hsu, X. Yang, 'XClust: Clustering XML Schemas for Effective Integration,' Proceedings of the ACM International Conference on Information and Knowledge Management, 2002 DOI
14	A. Doucet, H. A. Myka, 'Naive Clustering of a Large XML Document Collection,' Proceedings of INEX Workshop, 2002
15	J. Yoon, V. Raghavan, V. Chakilam, 'BitCube: Clustering and Statistical Analysis for XML Documents,' Proceedings of the International Conference on Scientific and Statistical Database Management, 2001
16	D. Braga, A. Campi, S. Ceri, M. Klemettinen, and P. Lanzi, 'A Tool for Extracting XML Association Rules from XML Documents,' Proceedings of IEEE-ICTAI 2002, USA, November 2002

KSCI

An Incremental Clustering Technique of XML Documents using Cluster Histograms 클러스터의 히스토그램을 이용한 XML 문서의 점진적 클러스터링 기법

An Incremental Clustering Technique of XML Documents using Cluster Histograms