• Title/Summary/Keyword: XML Mining

Search Result 51, Processing Time 0.025 seconds

XML based on Clustering Method for personalized Product Category in E-Commerce

  • Lee, Kwon-Soo;Kim, Hoon-Hyun
    • Proceedings of the KAIS Fall Conference
    • /
    • 2003.11a
    • /
    • pp.118-126
    • /
    • 2003
  • In data mining, having access to large amount of data sets for the purpose of predictive data does not guarantee good method, even where the size of Real data is Mobile commerce unlimited. In addition to searching expected Goods objects for Users, it becomes necessary to develop a recommendation service based on XML. In this paper, we design the optimized XML Recommender product data. Efficient XML data preprocessing is required, include of formatting, structural, and attribute representation with dependent on User Profile Information. Our goal is to find a relationship among user interested products from E-Commerce and M-Commerce to XDB. Firstly, analyzing user profiles information. In the result creating clusters with analyzed user profile such as with set of sex, age, job. Secondly, it is clustering XML data which are associative products classify from user profile in shopping mall. Thirdly, after composing categories and goods data in which associative objects exist from the first clustering, it represent categories and goods in shopping mall and optimized clustering XML data which are personalized products. The proposed personalized user profile clustering method has been designed and simulated to demonstrate it's efficient.

  • PDF

Storage Policies for Versions Management of XML Documents using a Change Set (변경 집합을 이용한 XML 문서의 버전 관리를 위한 저장 기법)

  • Yun Hong Won
    • The KIPS Transactions:PartD
    • /
    • v.11D no.7 s.96
    • /
    • pp.1349-1356
    • /
    • 2004
  • The interest of version management is increasing in electronic commerce requiring data mining and documents processing system related to digital governmentapplications. In this paper, we define a change set that is to manage historicalinformation and to maintain XML documents during a long period of time and propose several storage policies of XML documents using a change set. A change set includes a change oper-ation set and temporal dimensions and a change operation set is composed with schema change operations and data change operations. We pro-pose three storage policies using a change set. Three storage policies are (1) storing all the change sets, (2) storing the change sets and the versions periodically. (3) storing the aggregation of change sets and the versions at a point of proper time. Also, we compare the performance between the existing storage policy and the proposed storage policies. Though the performance evaluation, we show that the method to store the aggregation of change sets and the versions at a point of proper time outperforms others.

A data mining approach for efficient matching of engineering document schemata (엔지니어링 문서 스키마의 효율적 매칭을 위한 데이터마이닝 기법의 활용방안)

  • Park, Sang-Il;An, Hyun-Jung;Kim, Hyo-Jin;Lee, Sang-Ho
    • Proceedings of the Computational Structural Engineering Institute Conference
    • /
    • 2010.04a
    • /
    • pp.226-229
    • /
    • 2010
  • 본 연구에서는 데이터 저장의 질적 향상을 도모하는 XML 스키마 매칭의 효율적 활용방안을 제시하였다. 이를 위하여 매칭의 가중치의 변화에 따라 달라지는 정확도 데이터를 수집하고, 수집한 데이터를 활용하여 데이터 마이닝 기법 중 하나인 의사결정나무 모델을 수립하였다. 수립모델을 응용하여 구현한 가중치 자동선정 모듈은 설명변수인 교량의 형식, 문서가 포함하고 있는 요소의 수, 문서를 작성한 회사 등의 값에 따라 의사결정나무 모델의 목표변수인 정확도뿐만 아니라, 가장 높은 정확도를 보일 수 있는 가중치까지 간접적으로 제안가능하다. 본 연구로 구현한 모듈을 통해 제안된 XML 스키마 매칭 가중치를 활용하면 그렇지 않은 경우에 비하여 약 10% 정확도 상승효과가 있음을 알 수 있었다.

  • PDF

Concept Extraction Technique from Documents Using Domain Ontology (지식 문서에서 도메인 온톨로지를 이용한 개념 추출 기법)

  • Mun Hyeon-Jeong;Woo Yong-Tae
    • The KIPS Transactions:PartD
    • /
    • v.13D no.3 s.106
    • /
    • pp.309-316
    • /
    • 2006
  • We propose a novel technique to categorize XML documents and extract a concept efficiently using domain ontology. First, we create domain ontology that use text mining technique and statistical technique. We propose a DScore technique to classify XML documents by using the structural characteristic of XML document. We also present TScore technique to extract a concept by comparing the association term set of domain ontology and the terms in the XML document. To verify the efficiency of the proposed technique, we perform experiment for 295 papers in the computer science area. The results of experiment show that the proposed technique using the structural information in the XML documents is more efficient than the existing technique. Especially, the TScore technique effectively extract the concept of documents although frequency of term is few. Hence, the proposed concept-based retrieval techniques can be expected to contribute to the development of an efficient ontology-based knowledge management system.

Text Mining and Sentiment Analysis for Predicting Box Office Success

  • Kim, Yoosin;Kang, Mingon;Jeong, Seung Ryul
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.8
    • /
    • pp.4090-4102
    • /
    • 2018
  • After emerging online communications, text mining and sentiment analysis has been frequently applied into analyzing electronic word-of-mouth. This study aims to develop a domain-specific lexicon of sentiment analysis to predict box office success in Korea film market and validate the feasibility of the lexicon. Natural language processing, a machine learning algorithm, and a lexicon-based sentiment classification method are employed. To create a movie domain sentiment lexicon, 233,631 reviews of 147 movies with popularity ratings is collected by a XML crawling package in R program. We accomplished 81.69% accuracy in sentiment classification by the Korean sentiment dictionary including 706 negative words and 617 positive words. The result showed a stronger positive relationship with box office success and consumers' sentiment as well as a significant positive effect in the linear regression for the predicting model. In addition, it reveals emotion in the user-generated content can be a more accurate clue to predict business success.

Semi Automatic Ontology Generation about XML Documents

  • Gu Mi Sug;Hwang Jeong Hee;Ryu Keun Ho;Jung Doo Yeong;Lee Keum Woo
    • Proceedings of the KSRS Conference
    • /
    • 2004.10a
    • /
    • pp.730-733
    • /
    • 2004
  • Recently XML (eXtensible Markup Language) is becoming the standard for exchanging the documents on the web. And as the amount of information is increasing because of the development of the technique in the Internet, semantic web is becoming to appear for more exact result of information retrieval than the existing one on the web. Ontology which is the basis of the semantic web provides the basic knowledge system to express a particular knowledge. So it can show the exact result of the information retrieval. Ontology defines the particular concepts and the relationships between the concepts about specific domain and it has the hierarchy similar to the taxonomy. In this paper, we propose the generation of semi-automatic ontology based on XML documents that are interesting to many researchers as the means of knowledge expression. To construct the ontology in a particular domain, we suggest the algorithm to determine the domain. So we determined that the domain of ontology is to extract the information of movie on the web. And we used the generalized association rules, one of data mining methods, to generate the ontology, using the tag and contents of XML documents. And XTM (XML Topic Maps), ISO Standard, is used to construct the ontology as an ontology language. The advantage of this method is that because we construct the ontology based on the terms frequently used documents related in the domain, it is useful to query and retrieve the related domain.

  • PDF

A Database Schema Integration Method Using XML Schema (XML Schema를 이용한 이질의 데이터베이스 스키마 통합)

  • 박우창
    • Journal of Internet Computing and Services
    • /
    • v.3 no.2
    • /
    • pp.39-56
    • /
    • 2002
  • In distributed computing environments, there are many database applications that should share data each other such as data warehousing and data mining with autonomy on local databases. The first step to such applications is the integration of heterogeneous database schema, but there is no accepted common data model for the integration and also are difficulties on the construction of integration program. In this paper, we use the XML Schema for the representation of common data model and exploit XSLT for reducing the programming difficulties. We define the schema integration operations and develop a methodology for the semi-automatic schema integration according to schema conflicts types. Our integration method has benefits on standardization, extendibility on schema integration process comparing to existing methodologies.

  • PDF

Frequently Occurred Information Extraction from a Collection of Labeled Trees (라벨 트리 데이터의 빈번하게 발생하는 정보 추출)

  • Paik, Ju-Ryon;Nam, Jung-Hyun;Ahn, Sung-Joon;Kim, Ung-Mo
    • Journal of Internet Computing and Services
    • /
    • v.10 no.5
    • /
    • pp.65-78
    • /
    • 2009
  • The most commonly adopted approach to find valuable information from tree data is to extract frequently occurring subtree patterns from them. Because mining frequent tree patterns has a wide range of applications such as xml mining, web usage mining, bioinformatics, and network multicast routing, many algorithms have been recently proposed to find the patterns. However, existing tree mining algorithms suffer from several serious pitfalls in finding frequent tree patterns from massive tree datasets. Some of the major problems are due to (1) modeling data as hierarchical tree structure, (2) the computationally high cost of the candidate maintenance, (3) the repetitious input dataset scans, and (4) the high memory dependency. These problems stem from that most of these algorithms are based on the well-known apriori algorithm and have used anti-monotone property for candidate generation and frequency counting in their algorithms. To solve the problems, we base a pattern-growth approach rather than the apriori approach, and choose to extract maximal frequent subtree patterns instead of frequent subtree patterns. The proposed method not only gets rid of the process for infrequent subtrees pruning, but also totally eliminates the problem of generating candidate subtrees. Hence, it significantly improves the whole mining process.

  • PDF

Subtree Mining to extract Association rules from Tree Data (트리 데이터에서 연관규칙 추출을 위한 서브트리 마이닝)

  • Kang, Woo-Jun;Shin, Jun
    • Annual Conference of KIPS
    • /
    • 2006.11a
    • /
    • pp.317-320
    • /
    • 2006
  • XML 트리 데이터들로부터 빈번 서브 트리들을 추출하는 기존 방법들은 복잡하고 다수의 입력데이터 스캐닝을 필요로 할 뿐만 아니라 빈번 서브 트리를 구하기 위해 에지 하나하나의 조인 작업을 필요로 하였다. 이는 결과적으로 많은 수행 시간을 요한다. 본 논문에서는 트리데이터를 레벨 별로 나누고 이를 마치 채로 거르듯이 필터링하여 특정 수치 이상의 출현 횟수를 가지는 노드들만을 남겨 빠르게 빈번한 서브 트리를 찾고, 이를 이용하여 XML 연관규칙들을 생성하는 방법을 제시한다. 제시된 방법을 위해서 PairSet 이라는 새로운 자료구조를 도입하였으며, 이를 이용하는 크로스필터링 알고리즘을 개발하여 제시하였다.

  • PDF

Using the PubAnnotation ecosystem to perform agile text mining on Genomics & Informatics: a tutorial review

  • Nam, Hee-Jo;Yamada, Ryota;Park, Hyun-Seok
    • Genomics & Informatics
    • /
    • v.18 no.2
    • /
    • pp.13.1-13.6
    • /
    • 2020
  • The prototype version of the full-text corpus of Genomics & Informatics has recently been archived in a GitHub repository. The full-text publications of volumes 10 through 17 are also directly downloadable from PubMed Central (PMC) as XML files. During the Biomedical Linked Annotation Hackathon 6 (BLAH6), we experimented with converting, annotating, and updating 301 PMC full-text articles of Genomics & Informatics using PubAnnotation, a system that provides a convenient way to add PMC publications based on PMCID. Thus, this review aims to provide a tutorial overview of practicing the iterative task of named entity recognition with the PubAnnotation/PubDictionaries/TextAE ecosystem. We also describe developing a conversion tool between the Genia tagger output and the JSON format of PubAnnotation during the hackathon.