[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3745/KIPSTD.2007.14-D.1.001

Clustering XML Documents Considering The Weight of Large Items in Clusters

Hwang, Jeong-Hee (남서울대학교)

Publication Information

The KIPS Transactions:PartD / v.14D, no.1, 2007 , pp. 1-8 More about this Journal

Abstract

As the web document of XML, an exchange language of data in the advanced Internet, is increasing, a target of information retrieval becomes the web documents. Therefore, there we researches on structure, integration and retrieval of XML documents. This paper proposes a clustering method of XML documents based on frequent structures, as a basic research to efficiently process query and retrieval. To do so, first, trees representing XML documents are decomposed and we extract frequent structures from them. Second, we perform clustering considering the weight of large items to adjust cluster creation and cluster cohesion, considering frequent structures as items of transactions. Third, we show the excellence of our method through some experiments which compare which the previous methods.

Keywords

XML Structure; Document Clustering; XML Clustering; XML Document;

Citations & Related Records

Reference

1	A. Doucet, H. A. Myka, 'Naive Clustering of a Large XML Document Collection,' Proceedings of INEX Workshop, 2002
2	M. L. Lee, L. H. Yang, W. Hsu, X. Yang, 'XClust: Clustering XML Schemas for Effective Integration,' Proceedings of the ACM International Conference on Information and Knowledge Management, 2002 DOI
3	S. W. Kim, et ai, 'Indexing and Retrieval of XML-encoded Structured Documents in Dynamic Environment', Lecture Notes in Computer Science(LNCS) Vol. 2480, 2002 DOI
4	J. Pei, J. Han, B. M. Asi, H. Pinto, 'PrefixSpan: Mining Sequential Pattern Efficiently by Prefix-Projected Pattern Growth,' Proceedings of International Conference on Data Engineering(ICDE), 2001
5	http://www.cogsci.princeton.edu/~wn/wn2.0
6	A. Termier, M. C. Rouster, M. Sebag, 'TreeFinder: A First Step towards XML Data Mining,' Proceedings of IEEE International Conference on Data Mining (ICDM), 2002 DOI
7	T. Miyahara, Y. Suzuki, T. Shoudai, T. Uchida, K. Takahashi, and H. Ueda, 'Discovery of Frequent Tag Tree Patterns in Semistructured Web Documents,' The 6th Pacific Asia Conference, Advances in Knowledge Discovery and Data Mining (PAKDD), 2002 DOI
8	J. T. Wang, D. Shasha, G. J. S. Chang, 'Structural Matching and Discovery in Document Databases,' Proceedings of the ACM SIGMOD on Management of Data, 1997 DOI ScienceOn
9	J. Yoon, V. Raghavan, V. Chakilam, 'BitCube: Clustering and Statistical Analysis for XML Documents,' Proceedings of the International Conference on Scientific and Statistical Database Management, 2001
10	M. Zaki, 'Efficiently Mining Frequent Tree in a Forest,' Proceedings of the ACM SIGKDD International Conference, 2002 DOI
11	A. Deutsch, M. F. Fernandez, and D. Suciu, 'Storing Semistructured Data with STORED,' Proceedings of ACM SIGMOD International Conference on Management of Data, pp.431-442, 1999 DOI
12	http://www.acm.org/sigmod/record/xml, 2001
13	Y. Yang, X. Guan, J. You, 'CLOPE : A fast and effective clustering algorithm for transaction data,' Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002 DOI
14	D. Katsaros, 'Efficiently Maintaining Structural Associations of Semistructured Data,' Panhellenic Conference on Informatics, LNCS 2563, 2003 DOI
15	K. Wang and H. Liu, 'Discovery Typical Structures of Documents: A Road Map Approach,' In ACM SIGIR Conference on Information Retrieval, 1998 DOI
16	J. H. Hwang, K. H. Ryu, 'A Clustering Technique using Common Structures of XML Documents,' KISS, Vol.32, No.6, 2005 과학기술학회마을
17	NIAGARA query engine. http://www.cs.wisc.edu/niagara/data.html

KSCI

Clustering XML Documents Considering The Weight of Large Items in Clusters 클러스터의 주요항목 가중치 기반 XML 문서 클러스터링

Clustering XML Documents Considering The Weight of Large Items in Clusters