• Title/Summary/Keyword: knowledge discovery in database

Search Result 69, Processing Time 0.025 seconds

Efficient Dynamic Weighted Frequent Pattern Mining by using a Prefix-Tree (Prefix-트리를 이용한 동적 가중치 빈발 패턴 탐색 기법)

  • Jeong, Byeong-Soo;Farhan, Ahmed
    • The KIPS Transactions:PartD
    • /
    • v.17D no.4
    • /
    • pp.253-258
    • /
    • 2010
  • Traditional frequent pattern mining considers equal profit/weight value of every item. Weighted Frequent Pattern (WFP) mining becomes an important research issue in data mining and knowledge discovery by considering different weights for different items. Existing algorithms in this area are based on fixed weight. But in our real world scenarios the price/weight/importance of a pattern may vary frequently due to some unavoidable situations. Tracking these dynamic changes is very necessary in different application area such as retail market basket data analysis and web click stream management. In this paper, we propose a novel concept of dynamic weight and an algorithm DWFPM (dynamic weighted frequent pattern mining). Our algorithm can handle the situation where price/weight of a pattern may vary dynamically. It scans the database exactly once and also eligible for real time data processing. To our knowledge, this is the first research work to mine weighted frequent patterns using dynamic weights. Extensive performance analyses show that our algorithm is very efficient and scalable for WFP mining using dynamic weights.

Requirement Analysis for Bio-Information Integration Systems

  • Lee, Sean;Lee, Phil-Hyoun;Dokyun Na;Lee, Doheon;Lee, Kwanghyung;Bae, Myung-Nam
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.11-15
    • /
    • 2003
  • Amount of biological data information has been increasing exponentially. In order to cope with this bio-information explosion, it is necessary to construct a biological data information integration system. The integration system could provide useful services for bio-application developers by answering general complex queries that require accessing information from heterogeneous bio data sources, and easily accommodate a new database into the integrated systems. In this paper, we analyze architectures and mechanisms of existing integration systems with their advantages and disadvantages. Based on this analysis and user requirement studies, we propose an integration system framework that embraces advantages of the existing systems. More specifically, we propose an integration system architecture composed of a mediator and wrappers, which can offer a service interface layer for various other applications as well as independent biologists, thus playing the role of database management system for biology applications. In other words, the system can help abstract the heterogeneous information structures and formats from the application layer. In the system, the wrappers send database-specific queries and report the result to the mediator using XML. The proposed system could facilitate in silico knowledge discovery by allowing combination of numerous discrete biological information databases.

  • PDF

Data Mining for Knowledge Management in a Health Insurance Domain

  • Chae, Young-Moon;Ho, Seung-Hee;Cho, Kyoung-Won;Lee, Dong-Ha;Ji, Sun-Ha
    • Journal of Intelligence and Information Systems
    • /
    • v.6 no.1
    • /
    • pp.73-82
    • /
    • 2000
  • This study examined the characteristicso f the knowledge discovery and data mining algorithms to demonstrate how they can be used to predict health outcomes and provide policy information for hypertension management using the Korea Medical Insurance Corporation database. Specifically this study validated the predictive power of data mining algorithms by comparing the performance of logistic regression and two decision tree algorithms CHAID (Chi-squared Automatic Interaction Detection) and C5.0 (a variant of C4.5) since logistic regression has assumed a major position in the healthcare field as a method for predicting or classifying health outcomes based on the specific characteristics of each individual case. This comparison was performed using the test set of 4,588 beneficiaries and the training set of 13,689 beneficiaries that were used to develop the models. On the contrary to the previous study CHAID algorithm performed better than logistic regression in predicting hypertension but C5.0 had the lowest predictive power. In addition CHAID algorithm and association rule also provided the segment characteristics for the risk factors that may be used in developing hypertension management programs. This showed that data mining approach can be a useful analytic tool for predicting and classifying health outcomes data.

  • PDF

Design and Implementation of Spatial Association Rule Discovery System for Spatial Data Analysis (공간 데이터 분석을 위한 공간 연관 규칙 탐사 시스템의 설계 및 구현)

  • Ahn, Chan-Min;Lee, Yun-Seok;Park, Sang-Ho;Lee, Ju-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.11 no.1 s.39
    • /
    • pp.27-34
    • /
    • 2006
  • Recently, the study about the technology which effectively manage spatial information is actively conducted. For the effective knowledge inquiry, various extended data mining methods are applied in spatial data mining. However, former spatial association rule system appears the problem that does not reflect various non-spatial property along the inquiries because it searches the rule from the calculation among predicates. To resolve the problem, present study suggests the system that extends the inquiries using in spatial database, searches the association rule among non-spatial object property after setting the data based on space information. Especially, the model which is applicable to geographical information system is embodied. Embodied system with this method enables to search more useful spatial association rule in real life since it shows high migration property with extended spatial database and considers spatial property and various non-spatial property.

  • PDF

Environmental Consciousness Data Modeling by Association Rules

  • Park, Hee-Chang;Cho, Kwang-Hyun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.3
    • /
    • pp.529-538
    • /
    • 2005
  • Data mining is the method to find useful information for large amounts of data in database. It is used to find hidden knowledge by massive data, unexpectedly pattern, relation to new rule. The methods of data mining are association rules, decision tree, clustering, neural network and so on. Association rule mining searches for interesting relationships among items in a riven large data set. Association rules are frequently used by retail stores to assist in marketing, advertising, floor placement, and inventory control. There are three primary quality measures for association rule, support and confidence and lift. We analyze Gyeongnam social indicator survey data using association rule technique for environmental information discovery. We can use to environmental preservation and environmental improvement by association rule outputs.

  • PDF

Environmental Consciousness Data Modeling by Association Rules

  • Park, Hee-Chang;Cho, Kwang-Hyun
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2004.10a
    • /
    • pp.115-124
    • /
    • 2004
  • Data mining is the method to find useful information for large amounts of data in database. It is used to find hidden knowledge by massive data, unexpectedly pattern, relation to new rule. The methods of data mining are association rules, decision tree, clustering, neural network and so on. Association rule mining searches for interesting relationships among items in a given large data set. Association rules are frequently used by retail stores to assist in marketing, advertising, floor placement, and inventory control. There are three primary quality measures for association rule, support and confidence and lift. We analyze Gyeongnam social indicator survey data using association rule technique for environmental information discovery. We can use to environmental preservation and environmental improvement by association rule outputs.

  • PDF

Extraction of Hierarchical Decision Rules from Clinical Databases using Rough Sets

  • Tsumoto, Shusaku
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2001.01a
    • /
    • pp.336-342
    • /
    • 2001
  • One of the most important problems on rule induction methods is that they cannot extract rules, which plausibly represent experts decision processes. On one hand, rule induction methods induce probabilistic rules, the description length of which is too short, compared with the experts rules. On the other hand, construction of Bayesian networks generates too lengthy rules. In this paper, the characteristics of experts rules are closely examined and a new approach to extract plausible rules is introduced, which consists of the following three procedures. First, the characterization of decision attributes (given classes) is extracted from databases and the classes are classified into several groups with respect to the characterization. Then, two kinds of sub-rules, characterization rules for each group and discrimination rules for each class in the group are induced. Finally, those two parts are integrated into one rule for each decision attribute. The proposed method was evaluated on a medical database, the experimental results of which show that induced rules correctly represent experts decision processes.

  • PDF

Extraction and Application of Spatial Association Rules: A Case Study for Urban Growth Modeling (공간 연관규칙의 추출과 적용 - 도시성장 예측모델을 사례로 -)

  • 조성휘;박수홍
    • Journal of the Korean Geographical Society
    • /
    • v.39 no.3
    • /
    • pp.444-456
    • /
    • 2004
  • Recently spatial modeling that combined GIS and Cellular Automata(CA) which are based on dynamic process modeling has been discussed and investigated. However, CA-based spatial modeling in previous research only provides the general modeling framework and environment, but lacks of providing simulation or transition rules for modeling. This study aims to propose a methodology for extracting spatial relation rules using GIS and Knowledge Discovery in Database(KDD) methods. This new methodology has great potentials to improve CA-based spatial modeling and is expected to be applied into several examples including urban growth simulation modeling.

A Knowledge Map Based on a Keyword-Relation Network by Using a Research Paper Database in the Computer Engineering Field (컴퓨터공학 분야 학술 논문 데이터베이스를 이용한 키워드 연관 네트워크 기반 지식지도)

  • Jung, Bo-Seok;Kwon, Yung-Keun;Kwak, Seung-Jin
    • The KIPS Transactions:PartD
    • /
    • v.18D no.6
    • /
    • pp.501-508
    • /
    • 2011
  • A knowledge map, which has been recently applied in various fields, is discovering characteristics hidden in a large amount of information and showing a tangible output to understand the meaning of the discovery. In this paper, we suggested a knowledge map for research trend analysis based on keyword-relation networks which are constructed by using a database of the domestic journal articles in the computer engineering field from 2000 through 2010. From that knowledge map, we could infer influential changes of a research topic related a specific keyword through examining the change of sizes of the connected components to which the keyword belongs in the keyword-relation networks. In addition, we observed that the size of the largest connected component in the keyword-relation networks is relatively small and groups of high-similarity keyword pairs are clustered in them by comparison with the random networks. This implies that the research field corresponding to the largest connected component is not so huge and many small-scale topics included in it are highly clustered and loosely-connected to each other. our proposed knowledge map can be considered as a approach for the research trend analysis while it is impossible to obtain those results by conventional approaches such as analyzing the frequency of an individual keyword.

Libraries for Life: A Case Study of National Library Board, Singapore

  • Foo, Schubert;Tang, Chris;Ng, Judy
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.44 no.4
    • /
    • pp.33-59
    • /
    • 2010
  • Library 2.0 advocates a socially rich, multimedia enabled, user originated and communally innovative environment that poses significant opportunities for the libraries to evolve and make themselves even more relevant and significant for her users. This paper presents a case study of the National Library Board of Singapore, in playing a vital role to facilitate the realisation of a long-term key national program, The Singapore Memory (SM) Project. SM embraces the attributes of the Library 2.0 environment to enable the nation's memory to be collected, organised, preserved, discovered, researched, augmented and created. The output of is an evolving collection of knowledge assets on Singapore along a Singapore Memory Content Continuum of existing content that is steadily augmented with new content. The content will be collected across all formats, in any language, from Singaporeans and non-Singaporeans, from any institution and agency, from Singapore and abroad, and from official and unofficial sources. The utopian scenario of SM Project is that any person, community, group or institution who has ever experienced Singapore in any way or has any material on Singapore will engage actively in the contribution, discovery and creation of content for the project, and thus become advocates to further encourage and catalyse more contribution, discovery and creation. The paper outlines the key approaches, concepts and ideas for the project. An important element is the proliferation, exposure and accessibility of the rich contents envisaged in the project. The SM proliferation plan along with examples of how two existing resources, namely, the Singapore Infopedia, a database of articles on Singapore's history, culture, people and events 4 and NewspaperSG, an online resource of current and historic Singapore and Malayan newspapers, have been designed are presented to demonstrate how content can be exposed, searched and discovered.