• Title/Summary/Keyword: knowledge-based datamining

Search Result 8, Processing Time 0.024 seconds

Datamining: Roadmap to Extract Inference Rules and Design Data Models from Process Data of Industrial Applications

  • Bae Hyeon;Kim Youn-Tae;Kim Sung-Shin;Vachtsevanos George J.
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.5 no.3
    • /
    • pp.200-205
    • /
    • 2005
  • The objectives of this study were to introduce the easiest and most proper applications of datamining in industrial processes. Applying datamining in manufacturing is very different from applying it in marketing. Misapplication of datamining in manufacturing system results in significant problems. Therefore, it is very important to determine the best procedure and technique in advance. In previous studies, related literature has been introduced, but there has not been much description of datamining applications. Research has not often referred to descriptions of particular examples dealing with application problems in manufacturing. In this study, a datamining roadmap was proposed to support datamining applications for industrial processes. The roadmap was classified into three stages, and each stage was categorized into reasonable classes according to the datamining purposed. Each category includes representative techniques for datamining that have been broadly applied over decades. Those techniques differ according to developers and application purposes; however, in this paper, exemplary methods are described. Based on the datamining roadmap, nonexperts can determine procedures and techniques for datamining in their applications.

Development of Datamining Roadmap and Its Application to Water Treatment Plant for Coagulant Control (데이터마이닝 로드맵 개발과 수처리 응집제 제어를 위한 데이터마이닝 적용)

  • Bae, Hyeon;Kim, Sung-Shin;Kim, Ye-Jin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.9 no.7
    • /
    • pp.1582-1587
    • /
    • 2005
  • In coagulant control of water treatment plants, rule extraction, one of datamining categories, was performed for coagulant control of a water treatment plant. Clustering methods were applied to extract control rules from data. These control rules can be used for fully automation of water treatment plants instead of operator's knowledge for plant control. To perform fuzzy clustering, there are some coefficients to be determined and these kinds of studies have been performed over decades such as clustering indices. In this study, statistical indices were taken to calculate the number of clusters. Simultaneously, seed points were found out based on hierarchical clustering. These statistical approaches give information about features of clusters, so it can reduce computing cost and increase accuracy of clustering. The proposed algorithm can play an important role in datamining and knowledge discovery.

Practical Utilization of Engineering Data based on Evolutionary Computation Method (진화연산에 의한 공학 데이터의 활용)

  • Lee Kyung-Ho;Yeon Yun-Seog;Yang Young-Soon
    • Proceedings of the Computational Structural Engineering Institute Conference
    • /
    • 2005.04a
    • /
    • pp.317-324
    • /
    • 2005
  • Korean shipyards have accumulated a great amount of data. But they do not have appropriate tools to utilize the data in practical works. Engineering data contains experts' experience and know-how In its own. It is very useful to extract knowledge or information from the accumulated existing data by using datamining technique. This paper treats an evolutionary computation method based on genetic programming (GP), which can be one of the components to realize datamining.

  • PDF

Real-time Fault Detection and Classification of Reactive Ion Etching Using Neural Networks (Neural Networks을 이용한 Reactive Ion Etching 공정의 실시간 오류 검출에 관한 연구)

  • Ryu Kyung-Han;Lee Song-Jae;Soh Dea-Wha;Hong Sang-Jeen
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.9 no.7
    • /
    • pp.1588-1593
    • /
    • 2005
  • In coagulant control of water treatment plants, rule extraction, one of datamining categories, was performed for coagulant control of a water treatment plant. Clustering methods were applied to extract control rules from data. These control rules can be used for fully automation of water treatment plants instead of operator's knowledge for plant control. To perform fuzzy clustering, there are some coefficients to be determined and these kinds of studies have been performed over decades such as clustering indices. In this study, statistical indices were taken to calculate the number of clusters. Simultaneously, seed points were found out based on hierarchical clustering. These statistical approaches give information about features of clusters, so it can reduce computing cost and increase accuracy of clustering. The proposed algorithm can play an important role in datamining and knowledge discovery.

Implementing Linear Models in Genetic Programming to Utilize Accumulated Data in Shipbuilding (조선분야의 축적된 데이터 활용을 위한 유전적프로그래밍에서의 선형(Linear) 모델 개발)

  • Lee, Kyung-Ho;Yeun, Yun-Seog;Yang, Young-Soon
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.42 no.5 s.143
    • /
    • pp.534-541
    • /
    • 2005
  • Until now, Korean shipyards have accumulated a great amount of data. But they do not have appropriate tools to utilize the data in practical works. Engineering data contains experts' experience and know-how in its own. It is very useful to extract knowledge or information from the accumulated existing data by using data mining technique This paper treats an evolutionary computation based on genetic programming (GP), which can be one of the components to realize data mining. The paper deals with linear models of GP for the regression or approximation problem when given learning samples are not sufficient. The linear model, which is a function of unknown parameters, is built through extracting all possible base functions from the standard GP tree by utilizing the symbolic processing algorithm. In addition to a standard linear model consisting of mathematic functions, one variant form of a linear model, which can be built using low order Taylor series and can be converted into the standard form of a polynomial, is considered in this paper. The suggested model can be utilized as a designing tool to predict design parameters with small accumulated data.

Class prediction of an independent sample using a set of gene modules consisting of gene-pairs which were condition(Tumor, Normal) specific (조건(암, 정상)에 따라 특이적 관계를 나타내는 유전자 쌍으로 구성된 유전자 모듈을 이용한 독립샘플의 클래스예측)

  • Jeong, Hyeon-Iee;Yoon, Young-Mi
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.12
    • /
    • pp.197-207
    • /
    • 2010
  • Using a variety of data-mining methods on high-throughput cDNA microarray data, the level of gene expression in two different tissues can be compared, and DEG(Differentially Expressed Gene) genes in between normal cell and tumor cell can be detected. Diagnosis can be made with these genes, and also treatment strategy can be determined according to the cancer stages. Existing cancer classification methods using machine learning select the marker genes which are differential expressed in normal and tumor samples, and build a classifier using those marker genes. However, in addition to the differences in gene expression levels, the difference in gene-gene correlations between two conditions could be a good marker in disease diagnosis. In this study, we identify gene pairs with a big correlation difference in two sets of samples, build gene classification modules using these gene pairs. This cancer classification method using gene modules achieves higher accuracy than current methods. The implementing clinical kit can be considered since the number of genes in classification module is small. For future study, Authors plan to identify novel cancer-related genes with functionality analysis on the genes in a classification module through GO(Gene Ontology) enrichment validation, and to extend the classification module into gene regulatory networks.

Finding the time sensitive frequent itemsets based on data mining technique in data streams (데이터 스트림에서 데이터 마이닝 기법 기반의 시간을 고려한 상대적인 빈발항목 탐색)

  • Park, Tae-Su;Chun, Seok-Ju;Lee, Ju-Hong;Kang, Yun-Hee;Choi, Bum-Ghi
    • Journal of The Korean Association of Information Education
    • /
    • v.9 no.3
    • /
    • pp.453-462
    • /
    • 2005
  • Recently, due to technical improvements of storage devices and networks, the amount of data increase rapidly. In addition, it is required to find the knowledge embedded in a data stream as fast as possible. Huge data in a data stream are created continuously and changed fast. Various algorithms for finding frequent itemsets in a data stream are actively proposed. Current researches do not offer appropriate method to find frequent itemsets in which flow of time is reflected but provide only frequent items using total aggregation values. In this paper we proposes a novel algorithm for finding the relative frequent itemsets according to the time in a data stream. We also propose the method to save frequent items and sub-frequent items in order to take limited memory into account and the method to update time variant frequent items. The performance of the proposed method is analyzed through a series of experiments. The proposed method can search both frequent itemsets and relative frequent itemsets only using the action patterns of the students at each time slot. Thus, our method can enhance the effectiveness of learning and make the best plan for individual learning.

  • PDF

Forecasting of Customer's Purchasing Intention Using Support Vector Machine (Support Vector Machine 기법을 이용한 고객의 구매의도 예측)

  • Kim, Jin-Hwa;Nam, Ki-Chan;Lee, Sang-Jong
    • Information Systems Review
    • /
    • v.10 no.2
    • /
    • pp.137-158
    • /
    • 2008
  • Rapid development of various information technologies creates new opportunities in online and offline markets. In this changing market environment, customers have various demands on new products and services. Therefore, their power and influence on the markets grow stronger each year. Companies have paid great attention to customer relationship management. Especially, personalized product recommendation systems, which recommend products and services based on customer's private information or purchasing behaviors in stores, is an important asset to most companies. CRM is one of the important business processes where reliable information is mined from customer database. Data mining techniques such as artificial intelligence are popular tools used to extract useful information and knowledge from these customer databases. In this research, we propose a recommendation system that predicts customer's purchase intention. Then, customer's purchasing intention of specific product is predicted by using data mining techniques using receipt data set. The performance of this suggested method is compared with that of other data mining technologies.