• Title/Summary/Keyword: XML Mining

Search Result 51, Processing Time 0.038 seconds

Encoding of XML Elements for Mining Association Rules

  • Hu Gongzhu;Liu Yan;Huang Qiong
    • The Journal of Information Systems
    • /
    • v.14 no.3
    • /
    • pp.37-47
    • /
    • 2005
  • Mining of association rules is to find associations among data items that appear together in some transactions or business activities. As of today, algorithms for association rule mining, as well as for other data mining tasks, are mostly applied to relational databases. As XML being adopted as the universal format for data storage and exchange, mining associations from XML data becomes an area of attention for researchers and developers. The challenge is that the semi-structured data format in XML is not directly suitable for traditional data mining algorithms and tools. In this paper we present an encoding method to encode XML tree-nodes. This method is used to store the XML data in Value Table and Transaction Table that can be easily accessed via indexing. The hierarchical relationship in the original XML tree structure is embedded in the encoding. We applied this method to association rules mining of XML data that may have missing data.

  • PDF

Frequent Patten Tree based XML Stream Mining (빈발 패턴 트리 기반 XML 스트림 마이닝)

  • Hwang, Jeong-Hee
    • The KIPS Transactions:PartD
    • /
    • v.16D no.5
    • /
    • pp.673-682
    • /
    • 2009
  • XML data are widely used for data representation and exchange on the Web and the data type is an continuous stream in ubiquitous environment. Therefore there are some mining researches related to the extracting of frequent structures and the efficient query processing of XML stream data. In this paper, we propose a mining method to extract frequent structures of XML stream data in recent window based on the sliding window. XML stream data are modeled as a tree set, called XFP_tree and we quickly extract the frequent structures over recent XML data in the XFP_tree.

Semi-Automatic Ontology Generation about XML Documents using Data Mining Method (데이터 마이닝 기법을 이용한 XML 문서의 온톨로지 반자동 생성)

  • Gu Mi-Sug;Hwang Jeong-Hee;Ryu Keun-Ho;Hong Jang-Eui
    • The KIPS Transactions:PartD
    • /
    • v.13D no.3 s.106
    • /
    • pp.299-308
    • /
    • 2006
  • As recently XML is becoming the standard of exchanging web documents and public documentations, XML data are increasing in many areas. To retrieve the information about XML documents efficiently, the semantic web based on the ontology is appearing. The existing ontology has been constructed manually and it was time and cost consuming. Therefore in this paper, we propose the semi-automatic ontology generation technique using the data mining technique, the association rules. The proposed method solves what type and how many conceptual relationships and determines the ontology domain level for the automatic ontology generation, using the data mining algorithm. Appying the association rules to the XML documents, we intend to find out the conceptual relationships to construct the ontology, finding the frequent patterns of XML tags in the XML documents. Using the conceptual ontology domain level extracted from the data mining, we implemented the semantic web based on the ontology by XML Topic Maps (XTM) and the topic map engine, TM4J.

Mining of Frequent Structures over Streaming XML Data (스트리밍 XML 데이터의 빈발 구조 마이닝)

  • Hwang, Jeong-Hee
    • The KIPS Transactions:PartD
    • /
    • v.15D no.1
    • /
    • pp.23-30
    • /
    • 2008
  • The basic research of context aware in ubiquitous environment is an internet technique and XML. The XML data of continuous stream type are popular in network application through the internet. And also there are researches related to query processing for streaming XML data. As a basic research to efficiently query, we propose not only a labeled ordered tree model representing the XML but also a mining method to extract frequent structures from streaming XML data. That is, XML data to continuously be input are modeled by a stream tree which is called by XFP_tree and we exactly extract the frequent structures from the XFP_tree of current window to mine recent data. The proposed method can be applied to the basis of the query processing and index method for XML stream data.

Extracting Maximal Similar Paths between Two XML Documents using Sequential Pattern Mining (순차 패턴 마이닝을 사용한 두 XML 문서간 최대 유사 경로 추출)

  • 이정원;박승수
    • Journal of KIISE:Databases
    • /
    • v.31 no.5
    • /
    • pp.553-566
    • /
    • 2004
  • Some of the current main research areas involving techniques related to XML consist of storing XML documents, optimizing the query, and indexing. As such we may focus on the set of documents that are composed of various structures, but that are not shared with common structure such as the same DTD or XML Schema. In the case, it is essential to analyze structural similarities and differences among many documents. For example, when the documents from the Web or EDMS (Electronic Document Management System) are required to be merged or classified, it is very important to find the common structure for the process of handling documents. In this paper, we transformed sequential pattern mining algorithms(1) to extract maximal similar paths between two XML documents. Experiments with XML documents show that our transformed sequential pattern mining algorithms can exactly find common structures and maximal similar paths between them. For analyzing experimental results, similarity metrics based on maximal similar paths can exactly classify the types of XML documents.

Common XML Structure Extracting Algorithm for Applying Data Mining Techniques (데이터마이닝 기법 적용을 위한 공용 XML 구조 추출 알고리즘)

  • Jang, Min-Seok;Bang, Hyun-Jin
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • v.9 no.1
    • /
    • pp.1072-1076
    • /
    • 2005
  • Importance of XML as a target of Data Mining is growing because XML is used generally as a standard markup language for describing structured data. Especially researches have been done about extracting wanted informations by applying association rules to XML documents. But there are few development about solving the problems of method for efficiently obtaining informations from similar kinds of XML documents. To solve the problem this paper tries to suggest the method by which common XML structure is extracted form the same kinds of XML documents having a various XML schemas. The resulted schema structure is supposed to be important one as a preliminary job because it helps us to acquire the useful informations from various kinds of documents by unifying their structures.

  • PDF

Accounting Information Processing Model Using Big Data Mining (빅데이터마이닝을 이용한 회계정보처리 모형)

  • Kim, Kyung-Ihl
    • Journal of Convergence for Information Technology
    • /
    • v.10 no.7
    • /
    • pp.14-19
    • /
    • 2020
  • This study suggests an accounting information processing model based on internet standard XBRL which applies an extensible business reporting language, the XML technology. Due to the differences in document characteristics among various companies, this is very important with regard to the purpose of accounting that the system should provide useful information to the decision maker. This study develops a data mining model based on XML hierarchy which is stored as XBRL in the X-Hive data base. The data ming analysis is experimented by the data mining association rule. And based on XBRL, the DC-Apriori data mining method is suggested combining Apriori algorithm and X-query together. Finally, the validity and effectiveness of the suggested model is investigated through experiments.

A Method of Frequent Structure Detection Based on Active Sliding Window (능동적 슬라이딩 윈도우 기반 빈발구조 탐색 기법)

  • Hwang, Jeong-Hee
    • Journal of Digital Contents Society
    • /
    • v.13 no.1
    • /
    • pp.21-29
    • /
    • 2012
  • In ubiquitous computing environment, rising large scale data exchange through sensor network with sharply growing the internet, the processing of the continuous stream data is required. Therefore there are some mining researches related to the extracting of frequent structures and the efficient query processing of XML stream data. In this paper, we propose a mining method to extract frequent structures of XML stream data in recent window based on the active window sliding using trigger rule. The proposed method is a basic research to control the stream data flow for data mining and continuous query by trigger rules.

Bootstrap Mining for Searching Similar Content of XML Data (XML 데이터의 유사내용 검색을 위한 Bootstrap Mining)

  • Lee Han-Su;Park Jong-Hyun;Kang Ji-Hoon
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.11a
    • /
    • pp.517-519
    • /
    • 2005
  • 인터넷 상의 정보교환을 위한 국제표준인 XML은 여러 분야의 응용에 사용되며 응용의 특성에 따라 다양한 형태의 구조로 정의되어 사용된다. 이러한 XML은 응용에 따라 의미적으로 유사한 정보라 하더라도 서로 다른 구조정보를 가질 수 있으며 때로는 스키마(DTD)가 없는 XML문서 형태로 존재하기도 한다. 그 결과 특정 영역(동일 스키마 따르는)의 응용들 사이의 통합은 용이해 졌으나 서로 다른 영역 또는 영역에서 소외된 응용과의 통합은 여전히 문제로 남아있다. 본 연구에서는 대부분의 XML문서는 구조정보에 의미를 내포하고 있다는 특성을 고려하여 문서의 구조정보만을 이용하여 서로 다른 영역의 정보들 사이의 유사성을 판단하고 이를 이용하여 의미적으로 유사한 정보를 찾는다. 또한 XML 문서의 특성을 고려하여 보다 정확한 유사정보를 찾기 위하여 처리의 단위를 정의하고 이를 기반으로 프로토타입 시스템을 구현하였다.

  • PDF

Development of Semantic-Based XML Mining for Intelligent Knowledge Services (지능형 지식서비스를 위한 의미기반 XML 마이닝 시스템 연구)

  • Paik, Juryon;Kim, Jinyeong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2018.07a
    • /
    • pp.59-62
    • /
    • 2018
  • XML을 대상으로 하는 연구가 최근 5~6년 사이에 꾸준한 증가를 보이며 이루어지고 있지만 대다수의 연구들은 XML을 구성하고 있는 엘리먼트 자체에 대한 통계적인 모델을 기반으로 이루어졌다. 이는 XML의 고유 속성인 트리 구조에서의 텍스트, 문장, 문장 구성 성분이 가지고 있는 의미(semantics)가 명시적으로 분석, 표현되어 사용되기 보다는 통계적인 방법으로만 데이터의 발생을 계산하여 사용자가 요구한 질의에 대한 결과, 즉 해당하는 정보 및 지식을 제공하는 형식이다. 지능형 지식서비스 제공을 위한 환경에 부합하기 위한 정보 추출은, 텍스트 및 문장의 구성 요소를 분석하여 문서의 내용을 단순한 단어 집합보다는 풍부한 의미를 내포하는 형식으로 표현함으로써 보다 정교한 지식과 정보의 추출이 수행될 수 있도록 하여야 한다. 본 연구는 범람하는 XML 데이터로부터 사용자 요구의 의미까지 파악하여 정확하고 다양한 지식을 추출할 수 있는 방법을 연구하고자 한다. 레코드 구조가 아닌 트리 구조 데이터로부터 의미 추출이 가능한 효율적인 마이닝 기법을 진일보시킴으로써 다양한 사용자 중심의 서비스 제공을 최종 목적으로 한다.

  • PDF