Clustering Technique Using a Node and Level of XML tree

Kim, Woosaeng;

doi:10.6109/jkiice.2013.17.3.649

Journal of the Korea Institute of Information and Communication Engineering (한국정보통신학회논문지)

Volume 17 Issue 3
/
Pages.649-655
/
2013
/
2234-4772(pISSN)
/
2288-4165(eISSN)

The Korea Institute of Information and Commucation Engineering (한국정보통신학회)

DOI QR Code

Clustering Technique Using a Node and Level of XML tree

XML 트리의 노드와 레벨을 사용한 군집화 방법

Kim, Woosaeng

김우생 (광운대학교 컴퓨터과학과)

Received : 2012.08.27
Accepted : 2012.10.19
Published : 2013.03.31

https://doi.org/10.6109/jkiice.2013.17.3.649 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Recently, researches are studied in developing efficient techniques for accessing, querying, and managing XML documents which are frequently used in the Internet. In this paper, we propose a new method to cluster XML documents efficiently. An element and an inclusion relationship of a XML document corresponds to a node and a level of the corresponding tree, respectively. Therefore, when two XML documents are similar then their nodes' names and levels of the corresponding trees are also similar. In this paper, we cluster XML documents by using nodes' names and levels of the corresponding tree as a feature of a document. The experiment shows that our proposed method has a good performance.

최근 들어 인터넷에서 많이 사용되는 XML 문서들을 효율적으로 접근, 질의, 관리하는 방법들이 연구되고 있다. 본 논문은 XML 문서들을 효율적으로 군집화 하는 새로운 기법을 제안한다. XML 문서의 원소는 대응하는 트리의 노드에 대응하며, 문서에서의 내포 관계는 대응하는 트리의 레벨 관계에 대응한다. 따라서 유사한 XML 문서들은 대응하는 트리들에서 노드의 이름과 레벨이 유사하다. 본 논문에서는 XML 문서의 특징으로 대응하는 트리의 노드 이름과 레벨을 사용하여 군집화를 수행하였다. 제안하는 기법이 좋은 결과를 얻을 수 있음을 실험을 통하여 보였다.

Keywords

References

R.Behrens, "A Grammar based model for XML schema integration," Proc. of the 17th British National Conf. on Databases, pp.172-190, 2000.
H.Lee, "An Unsupervised clustering technique of XML documents based on function transform and FFT," Journal of Korea Information Processing Society, 2007. https://doi.org/10.3745/KIPSTD.2007.14-D.2.169
J.Yoon, V.Raghavan, V.Chakilam, "BitCube: clustering and statistical analysis for XML documents," Proc. of the 13th Int. Conf. on Scientific and Statistical Database Management, Fairfax, Virginia, 2001.
J.Yoon, V.Raghavan, V.Chakilam, L.Kerschberg, "BitCube: a 3-D bitmap indexing for XML documents," Journal of Intelligent Information Systems, Vol. 17, pp.241-254, 2001. https://doi.org/10.1023/A:1012861931139
A.Tagarelli, A.Greco, "Toward semantic XML clustering," 6th SIAM International Conference on Data Mining, pp. 188-199. Bethesda, Maryland, USA, 2006.
이정원, 이기호, "유사성 기반 XML 문서 분석 기법", 정보과학회논문지: 소프트웨어 및 응용 제 29 권 제 5-6호, 2002.6.
황정희, 류근호, "XML 문서의 공통 구조를 이용한 클러스터링 기법", 정보과학회논문지 D-데이타베이스 제 32권 제 6호, 2005.12.
황정희, 류근호 "유사 구조 기반 XML 문서의 점진적 클러스터링," 정보과학회 논문지- 데이터베이스 제 31권 제 6호, 2004. 12.
김우생, "유전자 알고리즘을 통한 XML 군집화 방법", 대한전자공학회 논문지, 2012.5.
Niagara Query Engine, http://www.cs.wisc.edu/niagara/data.html