• Title/Summary/Keyword: KDD

Search Result 124, Processing Time 0.027 seconds

Personalized Recommand System Using Mining for the Association Rule (연관규칙 마이닝을 이용한 개인화된 추천시스템)

  • Sung, Chang-Gyu;Rhyu, Keel-Soo;Kim, Tae-Jin
    • Proceedings of the Korean Society of Marine Engineers Conference
    • /
    • 2005.06a
    • /
    • pp.246-250
    • /
    • 2005
  • Recommand Systems are being used by an ever-increasing number of E-Commerce to help customers find products to purchase. Recommend Systems offer a technology that allows personalized recommendations of items of potential interest to users based on information about similarities and dissimilarities among different customers tastes. In this paper, we design and build a Recommend System using the historical customer movie purchase transactions and extracts the knowledge needed to make association recommendations to new customers.

  • PDF

An Association Rules Mining System based-on SQL (SQL을 이용한 연관 규칙 탐사 시스템)

  • 전수정;김영지;우용태
    • Proceedings of the Korea Database Society Conference
    • /
    • 2000.11a
    • /
    • pp.89-94
    • /
    • 2000
  • 본 논문에서는 연관 규칙 탐사 시스템을 설계하고 구현하였다. 본 시스템은 관계형 데이터베이스의 표준 질의어를 이용하여 사용자가 제시한 질의 조건을 만족하는 항목집합에 대해 다양한 형태의 연관규칙을 탐사하기 위한 시스템이다. 질의처리 모듈에서는 사용자가 제시한 조건을 만족하는 질의를 동적으로 구성하여, 연관 규칙 탐사를 위해 사용되는 대상 트랜잭션 데이타베이스의 범위를 조절할 수 있다. 연관 규칙을 발견하기 위한 후보 항목집합을 생성하기 위해 연관 규칙 탐사 알고리즘을 사용하였다. 연관 규칙 알고리즘에서는 한 트랜잭션 데이타에 대해 생성될 수 있는 후보 항목집합을 배열을 이용하여 처리하는 효율적인 방법을 제안하였다.

  • PDF

A Study on a Statistical Matching Method Using Clustering for Data Enrichment

  • Kim Soon Y.;Lee Ki H.;Chung Sung S.
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.2
    • /
    • pp.509-520
    • /
    • 2005
  • Data fusion is defined as the process of combining data and information from different sources for the effectiveness of the usage of useful information contents. In this paper, we propose a data fusion algorithm using k-means clustering method for data enrichment to improve data quality in knowledge discovery in database(KDD) process. An empirical study was conducted to compare the proposed data fusion technique with the existing techniques and shows that the newly proposed clustering data fusion technique has low MSE in continuous fusion variables.

Data Mining-Based Performance Prediction Technology of Geothermal Heat Pump System (지열 히트펌프 시스템의 데이터 마이닝 기반 성능 예측 기술)

  • Hwang, Min Hye;Park, Myung Kyu;Jun, In Ki;Sohn, Byonghu
    • Transactions of the KSME C: Technology and Education
    • /
    • v.4 no.1
    • /
    • pp.27-34
    • /
    • 2016
  • This preliminary study investigated data mining-based methods to assess and predict the performance of geothermal heat pump(GHP) system. Data mining is a key process of the knowledge discovery in database (KDD), which includes five steps: 1) Selection; 2) Pre-processing; 3) Transformation; 4) Analysis(data mining); and 5) Interpretation/Evaluation. We used two analysis models, categorical and numerical decision tree models to ascertain the patterns of performance(COP) and electrical consumption of the GHP system. Prior to applying the decision tree models, we statistically analyzed measurement database to determine the effect of sampling intervals on the system performance. Analysis results showed that 10-min sampling data for the performance analysis had highest accuracy of 97.7% over the actual dataset of the GHP system.

Hybrid Multiple Classifier Systems (하이브리드 다중 분류기시스템)

  • Kim In-cheol
    • Journal of Intelligence and Information Systems
    • /
    • v.10 no.2
    • /
    • pp.133-145
    • /
    • 2004
  • Combining multiple classifiers to obtain improved performance over the individual classifier has been a widely used technique. The task of constructing a multiple classifier system(MCS) contains two different issues : how to generate a diverse set of base-level classifiers and how to combine their predictions. In this paper, we review the characteristics of the existing multiple classifier systems: bagging, boosting, and stacking. And then we propose new MCSs: stacked bagging, stacked boosting, bagged stacking, and boasted stacking. These MCSs are a sort of hybrid MCSs that combine advantageous characteristics of the existing ones. In order to evaluate the performance of the proposed schemes, we conducted experiments with nine different real-world datasets from UCI KDD archive. The result of experiments showed the superiority of our hybrid MCSs, especially bagged stacking and boosted stacking, over the existing ones.

  • PDF

Improvement of Network Intrusion Detection Rate by Using LBG Algorithm Based Data Mining (LBG 알고리즘 기반 데이터마이닝을 이용한 네트워크 침입 탐지율 향상)

  • Park, Seong-Chul;Kim, Jun-Tae
    • Journal of Intelligence and Information Systems
    • /
    • v.15 no.4
    • /
    • pp.23-36
    • /
    • 2009
  • Network intrusion detection have been continuously improved by using data mining techniques. There are two kinds of methods in intrusion detection using data mining-supervised learning with class label and unsupervised learning without class label. In this paper we have studied the way of improving network intrusion detection accuracy by using LBG clustering algorithm which is one of unsupervised learning methods. The K-means method, that starts with random initial centroids and performs clustering based on the Euclidean distance, is vulnerable to noisy data and outliers. The nonuniform binary split algorithm uses binary decomposition without assigning initial values, and it is relatively fast. In this paper we applied the EM(Expectation Maximization) based LBG algorithm that incorporates the strength of two algorithms to intrusion detection. The experimental results using the KDD cup dataset showed that the accuracy of detection can be improved by using the LBG algorithm.

  • PDF

Selection of Detection Measures using Relative Entropy based on Network Connections (상대 복잡도를 이용한 네트워크 연결기반의 탐지척도 선정)

  • Mun Gil-Jong;Kim Yong-Min;Kim Dongkook;Noh Bong-Nam
    • The KIPS Transactions:PartC
    • /
    • v.12C no.7 s.103
    • /
    • pp.1007-1014
    • /
    • 2005
  • A generation of rules or patterns for detecting attacks from network is very difficult. Detection rules and patterns are usually generated by Expert's experiences that consume many man-power, management expense, time and so on. This paper proposes statistical methods that effectively detect intrusion and attacks without expert's experiences. The methods are to select useful measures in measures of network connection(session) and to detect attacks. We extracted the network session data of normal and each attack, and selected useful measures for detecting attacks using relative entropy. And we made probability patterns, and detected attacks using likelihood ratio testing. The detecting method controled detection rate and false positive rate using threshold. We evaluated the performance of the proposed method using KDD CUP 99 Data set. This paper shows the results that are to compare the proposed method and detection rules of decision tree algorithm. So we can know that the proposed methods are useful for detecting Intrusion and attacks.

An Empirical Comparison Study on Attack Detection Mechanisms Using Data Mining (데이터 마이닝을 이용한 공격 탐지 메커니즘의 실험적 비교 연구)

  • Kim, Mi-Hui;Oh, Ha-Young;Chae, Ki-Joon
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.2C
    • /
    • pp.208-218
    • /
    • 2006
  • In this paper, we introduce the creation methods of attack detection model using data mining technologies that can classify the latest attack types, and can detect the modification of existing attacks as well as the novel attacks. Also, we evaluate comparatively these attack detection models in the view of detection accuracy and detection time. As the important factors for creating detection models, there are data, attribute, and detection algorithm. Thus, we used NetFlow data gathered at the real network, and KDD Cup 1999 data for the experiment in large quantities. And for attribute selection, we used a heuristic method and a theoretical method using decision tree algorithm. We evaluate comparatively detection models using a single supervised/unsupervised data mining approach and a combined supervised data mining approach. As a result, although a combined supervised data mining approach required more modeling time, it had better detection rate. All models using data mining techniques could detect the attacks within 1 second, thus these approaches could prove the real-time detection. Also, our experimental results for anomaly detection showed that our approaches provided the detection possibility for novel attack, and especially SOM model provided the additional information about existing attack that is similar to novel attack.

An Outlier Cluster Detection Technique for Real-time Network Intrusion Detection Systems (실시간 네트워크 침입탐지 시스템을 위한 아웃라이어 클러스터 검출 기법)

  • Chang, Jae-Young;Park, Jong-Myoung;Kim, Han-Joon
    • Journal of Internet Computing and Services
    • /
    • v.8 no.6
    • /
    • pp.43-53
    • /
    • 2007
  • Intrusion detection system(IDS) has recently evolved while combining signature-based detection approach with anomaly detection approach. Although signature-based IDS tools have been commonly used by utilizing machine learning algorithms, they only detect network intrusions with already known patterns, Ideal IDS tools should always keep the signature database of your detection system up-to-date. The system needs to generate the signatures to detect new possible attacks while monitoring and analyzing incoming network data. In this paper, we propose a new outlier cluster detection algorithm with density (or influence) function, Our method assumes that an outlier is a kind of cluster with similar instances instead of a single object in the context of network intrusion, Through extensive experiments using KDD 1999 Cup Intrusion Detection dataset. we show that the proposed method outperform the conventional outlier detection method using Euclidean distance function, specially when attacks occurs frequently.

  • PDF

국내 우수 연구자의 글로벌 공동연구 활동도 분석 연구 : 신산업 분야를 중심으로

  • Yu, Hwa-Seon;Kim, Yun-Myeong;Yang, Chi-Seung
    • Proceedings of the Korea Technology Innovation Society Conference
    • /
    • 2017.11a
    • /
    • pp.1167-1188
    • /
    • 2017
  • 최근 4차 산업혁명 등 대외적 R&D 환경의 급속한 변화와 이로 인한 과학기술의 융 복합 및 첨단화가 가속화됨에 따라 이에 대응하기 위해 신산업분야를 중심으로 국가 간 공동협력이 점차 활발해짐에도 불구하고, 우리나라는 연구개발 주체의 연구역량 열위, 연구주체의 폐쇄성, 국가 R&D 제도적 미흡 등으로 인해 국가 간 공동연구 활동도가 매우 미흡한 편이다. 2016년 국가과학 기술혁신역량평가 국제협력 항목에서도 우리나라의 국제협력 항목지수는 0.206으로 2015년(0.182) 대비 0.024p 상승하였으나, 여전히 OECD 30개국 중 16위에 머무르는 것으로 나타났으며, 국제협력 상위 3개국에 대한 상대수준에서도 평균 10.3% 수준에 불과하여 국제 공동연구 활동도를 높이기 위한 다각적인 개선방안 확립에 대한 요구가 점차 증대되고 있는 실정이다. 이에 본 연구에서는 2015년 연구에 이어 미래 신산업 분야에서 우리나라와 해외 주요국의 국제 공동연구 현황을 중심으로 핵심연구자 간(연구 활동도 상위 5위 이하) 국제공동연구에 대한 활동도 비교 분석을 통해 정확한 현황을 진단하고, 향후 우리나라 연구주체의 연구개발 개방화, 국제 협력 전략적 분야 및 대상 발굴, 국제공동연구 활성화 등에 대한 발전방안을 고찰하고자 하였다. 국내 및 글로벌 핵심 연구자 간 글로벌 공동연구 현황을 분석하기 위해서 KDD/KM 방법론을 활용한 공동연구자 분석(Co-author analysis)네트워크 기법을 활용하였으며, 동 방법론의 활용을 통해서 신산업 분야 중 가사로봇분야의 상위 10개 국가, 기관, 연구자에 대해 분석하고, 논문 활동도가 높은 글로벌 및 한국의 상위 5위까지의 핵심 연구자를 대상으로 연구자 간 국제공동연구에 대한 현황 및 활동도에 대한 공동연구 네트워크 분석을 수행하였다.

  • PDF