An Incremental Clustering Technique of XML Documents using Cluster Histograms

Hwang, Jeong-Hee;

Journal of KIISE:Databases (한국정보과학회논문지:데이타베이스)

Volume 34 Issue 3
/
Pages.261-269
/
2007
/
1229-7739(pISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

An Incremental Clustering Technique of XML Documents using Cluster Histograms

클러스터의 히스토그램을 이용한 XML 문서의 점진적 클러스터링 기법

Hwang, Jeong-Hee

황정희 (남서울대학교 컴퓨터학과)

Published : 2007.06.15

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

As a basic research to integrate and to retrieve XML documents efficiently, this paper proposes a clustering method by structures of XML documents. We apply an algorithm processing the many transaction data to the clustering of XML documents, which is a quite different method from the previous algorithms measuring structure similarity. Our method performs the clustering of XML documents not only using the cluster histograms that represent the distribution of items in clusters but also considering the global cluster cohesion. We compare the proposed method with the existing techniques by performing experiments. Experiments show that our method not only creates good quality clusters but also improves the processing time.

이 논문에서는 XML 문서에 대한 효율적인 검색과 통합을 위한 기초연구로써 XML 문서들에 대한 구조 중심의 클러스터링 기법을 제안한다. 기존 연구에서 문서간의 구조적 유사도를 기반으로 클러스터를 형성해 가는 것과는 다르게 많은 데이타를 빠르게 처리할 수 있는 트랜잭션 데이타를 취급하는 알고리즘을 변형하여 적용한다. 각 클러스터에 포함되어 있는 항목들에 대한 누적 분포를 나타내는 히스토그램을 이용하여 전체적인 클러스터링의 응집도를 고려하는 클러스터링을 수행한다. 기존 연구와의 실험을 통해 클러스터링 처리 시간의 향상과 양질의 클러스터를 생성하는 것을 알 수 있었다.

Keywords

References

D. Braga, A. Campi, S. Ceri, M. Klemettinen, and P. Lanzi, 'A Tool for Extracting XML Association Rules from XML Documents,' Proceedings of IEEE-ICTAI 2002, USA, November 2002
M. L. Lee, L. H. Yang, W. Hsu, X. Yang, 'XClust: Clustering XML Schemas for Effective Integration,' Proceedings of the ACM International Conference on Information and Knowledge Management, 2002 https://doi.org/10.1145/584792.584841
A. Doucet, H. A. Myka, 'Naive Clustering of a Large XML Document Collection,' Proceedings of INEX Workshop, 2002
J. Yoon, V. Raghavan, V. Chakilam, 'BitCube: Clustering and Statistical Analysis for XML Documents,' Proceedings of the International Conference on Scientific and Statistical Database Management, 2001
R. Nayak, R. Witt, A. Tonev, 'Data Mining and XML Documents,' International Conference on Internet Computing, 2002
K. Wang and H. Liu, 'Discovery Typical Structures of Documents: A Road Map Approach,' ACM SIGIR Conference on Information Retrieval, 1998
M. Zaki, 'Efficiently Mining Frequent Tree in a Forest,' Proceedings of the ACM SIGKDD International Conference, 2002 https://doi.org/10.1109/TKDE.2005.125
A. Termier, M. C. Rouster, M. Sebag, 'Tree-Finder: A First Step towards XML Data Mining,' Proceedings of IEEE International Conference on Data Mining (ICDM), 2002
Y. Shen and B. Wang, 'Clustering Schemaless XML Documents,' International Conference on Ontologies, Databases and Applications of SEmantics(ODBASE), 2003
J. W. Lee, K. Lee, W. Kim, 'Preparation for Semantics-Based XML Mining,' Proceedings of IEEE International Conference on Data Mining(ICDM), 2001 https://doi.org/10.1109/ICDM.2001.989538
T. Dalamagas, T. Cheng, K. J. Winkel, and T. Sellis, 'Clustering XML Document by Structure,' The 3rd Helenic Conference on AL. SETN, 2004
J. H. Hwang, K. H. Ryu, 'A Clustering Technique using Common Structures of XML Documents,' KISS, Vol.32, No.6, 2005
http://www.cogsci.princeton.edu/~wn/wn2.0
J. Pei, J. Han, B. M. Asi, H. Pinto, 'PrefixSpan: Mining Sequential Pattern Efficiently by Prefix-Projected Pattern Growth,' Proceedings of International Conference on Data Engineering(ICDE), 2001
NIAGARA query engine. http://www.cs.wisc.edu/niagara/data.html
http://www.acm.org/sigmod/record/xml, 2001

Journal of KIISE:Databases (한국정보과학회논문지:데이타베이스)

An Incremental Clustering Technique of XML Documents using Cluster Histograms

클러스터의 히스토그램을 이용한 XML 문서의 점진적 클러스터링 기법

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)