• Title/Summary/Keyword: knowledge discovery in database

Search Result 69, Processing Time 0.028 seconds

A Multiple Layered Database Design and Maintenance in Object-Oriented Databases (객체지향 데이터베이스에서 다계층 데이터베이스 설계 및 유지)

  • Kim, Nam-Jin;Shin, Dong-Cheon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.1
    • /
    • pp.11-23
    • /
    • 1998
  • In very large databases, the problem of searching for interesting information effectively is very important in terms of efficiency and flexibility. A multiple layered database approach based on AOG(attribute-oriented generalization) method is one of the useful approaches for knowledge discovery under various situations. In this paper, we propose a multiple layered database design methodology based on AOG method in object-oriented databases. In addition, we propose a dynamic schema evolution model and implementation strategy in order to continue providing information effectively in multiple layered databases.

  • PDF

A Study on a Statistical Matching Method Using Clustering for Data Enrichment

  • Kim Soon Y.;Lee Ki H.;Chung Sung S.
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.2
    • /
    • pp.509-520
    • /
    • 2005
  • Data fusion is defined as the process of combining data and information from different sources for the effectiveness of the usage of useful information contents. In this paper, we propose a data fusion algorithm using k-means clustering method for data enrichment to improve data quality in knowledge discovery in database(KDD) process. An empirical study was conducted to compare the proposed data fusion technique with the existing techniques and shows that the newly proposed clustering data fusion technique has low MSE in continuous fusion variables.

A comparison of three design tree based search algorithms for the detection of engineering parts constructed with CATIA V5 in large databases

  • Roj, Robin
    • Journal of Computational Design and Engineering
    • /
    • v.1 no.3
    • /
    • pp.161-172
    • /
    • 2014
  • This paper presents three different search engines for the detection of CAD-parts in large databases. The analysis of the contained information is performed by the export of the data that is stored in the structure trees of the CAD-models. A preparation program generates one XML-file for every model, which in addition to including the data of the structure tree, also owns certain physical properties of each part. The first search engine is specializes in the discovery of standard parts, like screws or washers. The second program uses certain user input as search parameters, and therefore has the ability to perform personalized queries. The third one compares one given reference part with all parts in the database, and locates files that are identical, or similar to, the reference part. All approaches run automatically, and have the analysis of the structure tree in common. Files constructed with CATIA V5, and search engines written with Python have been used for the implementation. The paper also includes a short comparison of the advantages and disadvantages of each program, as well as a performance test.

A Study on Customer's Purchase Trend Using Association Rule (연관규칙을 이용한 고객의 구매경향에 관한 연구)

  • 임영문;최영두
    • Proceedings of the Safety Management and Science Conference
    • /
    • 2000.11a
    • /
    • pp.299-306
    • /
    • 2000
  • General definition of data mining is the knowledge discovery or is to extract hidden necessary information from large databases. Its technique can be applied into decision making, prediction, and information analysis through analyzing of relationship and pattern among data. One of the most important work is to find association rules in data mining. The objective of this paper is to find customer's trend using association rule from analysis of database and the result can be used as fundamental data for CRM(Customer Relationship Management). This paper uses Apriori algorithm and FoodMart data in order to find association rules.

  • PDF

A Methodology for Searching Frequent Pattern Using Graph-Mining Technique (그래프마이닝을 활용한 빈발 패턴 탐색에 관한 연구)

  • Hong, June Seok
    • Journal of Information Technology Applications and Management
    • /
    • v.26 no.1
    • /
    • pp.65-75
    • /
    • 2019
  • As the use of semantic web based on XML increases in the field of data management, a lot of studies to extract useful information from the data stored in ontology have been tried based on association rule mining. Ontology data is advantageous in that data can be freely expressed because it has a flexible and scalable structure unlike a conventional database having a predefined structure. On the contrary, it is difficult to find frequent patterns in a uniformized analysis method. The goal of this study is to provide a basis for extracting useful knowledge from ontology by searching for frequently occurring subgraph patterns by applying transaction-based graph mining techniques to ontology schema graph data and instance graph data constituting ontology. In order to overcome the structural limitations of the existing ontology mining, the frequent pattern search methodology in this study uses the methodology used in graph mining to apply the frequent pattern in the graph data structure to the ontology by applying iterative node chunking method. Our suggested methodology will play an important role in knowledge extraction.

Virus communicable disease cpidemic forecasting search using KDD and DataMining (KDD와 데이터마이닝을 이용한 바이러스성전염병 유행예측조사)

  • Yun, JongChan;Youn, SungDae
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2004.05a
    • /
    • pp.47-50
    • /
    • 2004
  • 본 논문은 대량의 데이터를 처리하는 전염병에 관한 역학조사에 대한 과정을 KDD(Knowledge Discovery in Database)와 데이터마이닝 기법을 이용해서 의료 전문인들의 지식을 데이터베이스화하여 데이터 선정, 정제, 보강, 예측과 빠른 데이터 검출을 하도록 하였다. 그리고 각 바이러스의 동향은 데이터마이닝을 활용하므로 일부분만의 데이터를 산출하지 않고 전체적인 동향을 산출, 예측하도록 한다.

  • PDF

Integrated Method Based on Rough Sets for Knowledge Discovery (지식 발견을 위한 라프셋 중심의 통합 방법 연구)

  • Chung, Hong;Chung, Hwan-Mook
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.8 no.6
    • /
    • pp.27-36
    • /
    • 1998
  • This paper suggests an integrated method based on rough sets for discovering useful knowledge from a large databse. Our approach applies attribute-oriented concept hierarchy ascension technique to extract generalized data from actual data in database, induction of decision trees to measure the information gain, and knowledge reduction method of rough set theory to remove superfluous attributes and attribute values. The integrated algorithm first reduces the size of database through the concept generalization, reduces the number of attributes by means of eliminating condition attributes which have little influence on decision attribute, and finally induces simplified decision rules by removing the superfluous attribute values by analyzing the dependency relationships among the attributes.

  • PDF

Movie Popularity Classification Based on Support Vector Machine Combined with Social Network Analysis

  • Dorjmaa, Tserendulam;Shin, Taeksoo
    • Journal of Information Technology Services
    • /
    • v.16 no.3
    • /
    • pp.167-183
    • /
    • 2017
  • The rapid growth of information technology and mobile service platforms, i.e., internet, google, and facebook, etc. has led the abundance of data. Due to this environment, the world is now facing a revolution in the process that data is searched, collected, stored, and shared. Abundance of data gives us several opportunities to knowledge discovery and data mining techniques. In recent years, data mining methods as a solution to discovery and extraction of available knowledge in database has been more popular in e-commerce service fields such as, in particular, movie recommendation. However, most of the classification approaches for predicting the movie popularity have used only several types of information of the movie such as actor, director, rating score, language and countries etc. In this study, we propose a classification-based support vector machine (SVM) model for predicting the movie popularity based on movie's genre data and social network data. Social network analysis (SNA) is used for improving the classification accuracy. This study builds the movies' network (one mode network) based on initial data which is a two mode network as user-to-movie network. For the proposed method we computed degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality as centrality measures in movie's network. Those four centrality values and movies' genre data were used to classify the movie popularity in this study. The logistic regression, neural network, $na{\ddot{i}}ve$ Bayes classifier, and decision tree as benchmarking models for movie popularity classification were also used for comparison with the performance of our proposed model. To assess the classifier's performance accuracy this study used MovieLens data as an open database. Our empirical results indicate that our proposed model with movie's genre and centrality data has by approximately 0% higher accuracy than other classification models with only movie's genre data. The implications of our results show that our proposed model can be used for improving movie popularity classification accuracy.

Learning and Classification in the Extensional Object Model (확장개체모델에서의 학습과 계층파악)

  • Kim, Yong-Jae;An, Joon-M.;Lee, Seok-Jun
    • Asia pacific journal of information systems
    • /
    • v.17 no.1
    • /
    • pp.33-58
    • /
    • 2007
  • Quiet often, an organization tries to grapple with inconsistent and partial information to generate relevant information to support decision making and action. As such, an organization scans the environment interprets scanned data, executes actions, and learns from feedback of actions, which boils down to computational interpretations and learning in terms of machine learning, statistics, and database. The ExOM proposed in this paper is geared to facilitate such knowledge discovery found in large databases in a most flexible manner. It supports a broad range of learning and classification styles and integrates them with traditional database functions. The learning and classification components of the ExOM are tightly integrated so that learning and classification of objects is less burdensome to ordinary users. A brief sketch of a strategy as to the expressiveness of terminological language is followed by a description of prototype implementation of the learning and classification components of the ExOM.

An integrated bioinformatics analysis of mouse testis protein profiles with new understanding

  • Liu, Fujun;Wang, Haiyan;Li, Jianyuan
    • BMB Reports
    • /
    • v.44 no.5
    • /
    • pp.347-351
    • /
    • 2011
  • The testis is major male gonad responsible for spermatogenesis and steroidogenesis. Much knowledge is still remained to be learned about the control of these events. In this study, we performed a comprehensive bioinformatics analysis on 1,196 mouse testis proteins screened from public protein database. Integrated function and pathway analysis were performed through Database for Annotation, Visualization and Integrated Discovery (DAVID) and ingenuity Pathway Analysis (IPA), and significant features were clustered. Protein membrane organization and gene density on chromosomes were analyzed and discussed. The enriched bioinformatics analysis could provide clues and basis to the development of diagnostic markers and therapeutic targets for infertility and male contraception.