• Title/Summary/Keyword: XML Clustering

Search Result 45, Processing Time 0.024 seconds

Clustering Technique Using a Node and Level of XML tree (XML 트리의 노드와 레벨을 사용한 군집화 방법)

  • Kim, Woosaeng
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.3
    • /
    • pp.649-655
    • /
    • 2013
  • Recently, researches are studied in developing efficient techniques for accessing, querying, and managing XML documents which are frequently used in the Internet. In this paper, we propose a new method to cluster XML documents efficiently. An element and an inclusion relationship of a XML document corresponds to a node and a level of the corresponding tree, respectively. Therefore, when two XML documents are similar then their nodes' names and levels of the corresponding trees are also similar. In this paper, we cluster XML documents by using nodes' names and levels of the corresponding tree as a feature of a document. The experiment shows that our proposed method has a good performance.

XML based on Clustering Method for personalized Product Category in E-Commerce

  • Lee, Kwon-Soo;Kim, Hoon-Hyun
    • Proceedings of the KAIS Fall Conference
    • /
    • 2003.11a
    • /
    • pp.118-126
    • /
    • 2003
  • In data mining, having access to large amount of data sets for the purpose of predictive data does not guarantee good method, even where the size of Real data is Mobile commerce unlimited. In addition to searching expected Goods objects for Users, it becomes necessary to develop a recommendation service based on XML. In this paper, we design the optimized XML Recommender product data. Efficient XML data preprocessing is required, include of formatting, structural, and attribute representation with dependent on User Profile Information. Our goal is to find a relationship among user interested products from E-Commerce and M-Commerce to XDB. Firstly, analyzing user profiles information. In the result creating clusters with analyzed user profile such as with set of sex, age, job. Secondly, it is clustering XML data which are associative products classify from user profile in shopping mall. Thirdly, after composing categories and goods data in which associative objects exist from the first clustering, it represent categories and goods in shopping mall and optimized clustering XML data which are personalized products. The proposed personalized user profile clustering method has been designed and simulated to demonstrate it's efficient.

  • PDF

An Indexing System for Retrieving Similar Paths in XML Documents (XML 문서의 유사 경로 검색을 위한 인덱싱 시스템)

  • Lee, Bum-Suk;Hwang, Byung-Yeon
    • The KIPS Transactions:PartD
    • /
    • v.15D no.2
    • /
    • pp.171-178
    • /
    • 2008
  • Since the XML standard was introduced by the W3C in 1998, documents that have been written in XML have been gradually increasing. Accordingly, several systems have been developed in order to efficiently manage and retrieve massive XML documents. BitCube-a bitmap indexing system-is a representative system for this field of research. Based on the bitmap indexing technique, the path bitmap indexing system(LH06), which performs the clustering of similar paths, improved the problem that the existing BitCube system could not solve, namely, determining similar paths. The path bitmap indexing system has the advantage of a higher retrieval speed in not only exactly matched path searching but also similar path searching. However, the similarity calculation algorithm of this system has a few particular problems. Consequently, it sometimes cannot calculate the similarity even though some of two paths have extremely similar relationships; further, it results in an increment in the number of meaningless clusters. In this paper, we have proposed a novel method that clustering, the similarity between the paths in order to solve these problems. The proposed system yields a stable result for clustering, and it obtains a high score in clustering precision during a performance evaluation against LH06.

Clustering of MPEG-7 Data for Efficient Management (MPEG-7 데이터의 효율적인 관리를 위한 클러스터링 방법)

  • Ahn, Byeong-Tae;Kang, Byeong-Shoo;Diao, Jianhua;Kang, Hyun-Syug
    • Journal of Korea Multimedia Society
    • /
    • v.10 no.1
    • /
    • pp.1-12
    • /
    • 2007
  • To use multimedia data in restricted resources of mobile environment, any management method of MPEG-7 documents is needed. At this time, some XML clustering methods can be used. But, to improve the performance efficiency better, a new clustering method which uses the characteristics of MPEG-7 documents is needed. A new clustering improved query processing speed at multimedia search and it possible document storage about various application suitably. In this paper, we suggest a new clustering method of MPEG-7 documents for effective management in multimedia data of large capacity, which uses some semantic relationships among elements of MPEG-7 documents. And also we compared it to the existed clustering methods.

  • PDF

Design and Implementation of MPEG-7 Document Management System Based on Native Embedded XML Database (순수 내장형 XML 데이터베이스 기반의 MPEG-7 문서 관리 시스템의 설계 및 구현)

  • Ahn, Byeong-Tae;Kang, Byeong-Shoo;Diao, Jianhua;Kang, Hyun-Syug
    • Journal of Korea Multimedia Society
    • /
    • v.10 no.2
    • /
    • pp.170-178
    • /
    • 2007
  • In restricted resources based on mobile environment, we can use an embedded database technology for management of MPEG-7 data. At this time, some XML clustering methods can be used. But, to improve the performance efficiency better, a new clustering method is need to store effective MPEG-7 document. In this paper, we have designed and implemented a MPEG-7 document management system to store MPEG-7 document effectively in mobile device such as PDA. The system used the 버클리 DB XML as a native embedded XML database system based on the clustering method of MPEG-7 data.

  • PDF

XML Document Clustering Technique by K-means algorithm through PCA (주성분 분석의 K 평균 알고리즘을 통한 XML 문서 군집화 기법)

  • Kim, Woo-Saeng
    • The KIPS Transactions:PartD
    • /
    • v.18D no.5
    • /
    • pp.339-342
    • /
    • 2011
  • Recently, researches are studied in developing efficient techniques for accessing, querying, and storing XML documents which are frequently used in the Internet. In this paper, we propose a new method to cluster XML documents efficiently. We use a K-means algorithm with a Principal Component Analysis(PCA) to cluster XML documents after they are represented by vectors in the feature vector space by transferring them as names and levels of the elements of the corresponding trees. The experiment shows that our proposed method has a good result.

k-Bitmap Clustering Method for XML Data based on Relational DBMS (관계형 DBMS 기반의 XML 데이터를 위한 k-비트맵 클러스터링 기법)

  • Lee, Bum-Suk;Hwang, Byung-Yeon
    • The KIPS Transactions:PartD
    • /
    • v.16D no.6
    • /
    • pp.845-850
    • /
    • 2009
  • Use of XML data has been increased with growth of Web 2.0 environment. XML is recognized its advantages by using based technology of RSS or ATOM for transferring information from blogs and news feed. Bitmap clustering is a method to keep index in main memory based on Relational DBMS, and which performed better than the other XML indexing methods during the evaluation. Existing method generates too many clusters, and it causes deterioration of result of searching quality. This paper proposes k-Bitmap clustering method that can generate user defined k clusters to solve above-mentioned problem. The proposed method also keeps additional inverted index for searching excluded terms from representative bits of k-Bitmap. We performed evaluation and the result shows that the users can control the number of clusters. Also our method has high recall value in single term search, and it guarantees the searching result includes all related documents for its query with keeping two indices.

Similarity Measure and Clustering Technique for XML Documents by a Parent-Child Matrix (부모-자식 행렬을 사용한 XML 문서 유사도 측정과 군집 기법)

  • Lee, Yun-Gu;Kim, Woosaeng
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.7
    • /
    • pp.1599-1607
    • /
    • 2015
  • Recently, researches have been developing efficient techniques for accessing, querying, and managing XML documents which are frequently used in the Internet. In this paper, we propose a parent-child matrix to cluster XML documents efficiently. A parent-child matrix analyzes both the content and structural features of an XML document. Each cell of a parent-child matrix has either the value of a node in an XML tree or the value of a child node, where a parent-child relationship exists in the XML tree. Then, the similarity between two XML documents can be measured by the similarity between two corresponding parent-child matrices. The experiment shows that our proposed method has good performance.

Clustering Techniques for XML Data Using Data Mining

  • Kim, Chun-Sik
    • Proceedings of the CALSEC Conference
    • /
    • 2005.03a
    • /
    • pp.189-194
    • /
    • 2005
  • Many studies have been conducted to classify documents, and to extract useful information from documents. However, most search engines have used a keyword based method. This method does not search and classify documents effectively. This paper identifies structures of XML document based on the fact that the XML document has a structural document using a set theory, which is suggested by Broder, and attempts a test for clustering XML document by applying a k-nearest neighbor algorithm. In addition, this study investigates the effectiveness of the clustering technique for large scaled data, compared to the existing bitmap method, by applying a test, which reveals a difference between the clause based documents instead of using a type of vector, in order to measure the similarity between the existing methods.

  • PDF

A Effective Storage Method for Managing of MPEG-7 Document (MPEG-7 문서 관리를 위한 효율적인 저장 방법)

  • Ahn, Byeong-Tae;Lee, Jong-Ha;Chung, Bhum-Suk
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2006.11a
    • /
    • pp.637-641
    • /
    • 2006
  • To use multimedia contents in restricted resources, any management method of MPEG-7 documents is needed. At this time, some XML clustering methods can be used. But, to improve the performance efficiency better, new clustering method which uses the characteristics of MPEG-7 documents is needed. In this paper, we suggest a new clustering method to manage MPEG-7 documents efficiently, which uses some semantic relationships among elements of MPEG-7 documents. And also we compare it to the existed clustering methods.

  • PDF