• 제목/요약/키워드: Classification mining

검색결과 734건 처리시간 0.024초

A Comparison Study of Classification Algorithms in Data Mining

  • Lee, Seung-Joo;Jun, Sung-Rae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제8권1호
    • /
    • pp.1-5
    • /
    • 2008
  • Generally the analytical tools of data mining have two learning types which are supervised and unsupervised learning algorithms. Classification and prediction are main analysis tools for supervised learning. In this paper, we perform a comparison study of classification algorithms in data mining. We make comparative studies between popular classification algorithms which are LDA, QDA, kernel method, K-nearest neighbor, naive Bayesian, SVM, and CART. Also, we use almost all classification data sets of UCI machine learning repository for our experiments. According to our results, we are able to select proper algorithms for given classification data sets.

TEMPORAL CLASSIFICATION METHOD FOR FORECASTING LOAD PATTERNS FROM AMR DATA

  • Lee, Heon-Gyu;Shin, Jin-Ho;Ryu, Keun-Ho
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2007년도 Proceedings of ISRS 2007
    • /
    • pp.594-597
    • /
    • 2007
  • We present in this paper a novel mid and long term power load prediction method using temporal pattern mining from AMR (Automatic Meter Reading) data. Since the power load patterns have time-varying characteristic and very different patterns according to the hour, time, day and week and so on, it gives rise to the uninformative results if only traditional data mining is used. Also, research on data mining for analyzing electric load patterns focused on cluster analysis and classification methods. However despite the usefulness of rules that include temporal dimension and the fact that the AMR data has temporal attribute, the above methods were limited in static pattern extraction and did not consider temporal attributes. Therefore, we propose a new classification method for predicting power load patterns. The main tasks include clustering method and temporal classification method. Cluster analysis is used to create load pattern classes and the representative load profiles for each class. Next, the classification method uses representative load profiles to build a classifier able to assign different load patterns to the existing classes. The proposed classification method is the Calendar-based temporal mining and it discovers electric load patterns in multiple time granularities. Lastly, we show that the proposed method used AMR data and discovered more interest patterns.

  • PDF

목표 속성을 고려한 연관규칙과 분류 기법 (Directed Association Rules Mining and Classification)

  • 한경록;김재련
    • 산업경영시스템학회지
    • /
    • 제24권63호
    • /
    • pp.23-31
    • /
    • 2001
  • Data mining can be either directed or undirected. One way of thinking about it is that we use undirected data mining to recognize relationship in the data and directed data mining to explain those relationships once they have been found. Several data mining techniques have received considerable research attention. In this paper, we propose an algorithm for discovering association rules as directed data mining and applying them to classification. In the first phase, we find frequent closed itemsets and association rules. After this phase, we construct the decision trees using discovered association rules. The algorithm can be applicable to customer relationship management.

  • PDF

유전자 알고리즘을 이용한 데이터 마이닝의 분류 시스템에 관한 연구 (Using Genetic Rule-Based Classifier System for Data Mining)

  • 한명묵
    • 인터넷정보학회논문지
    • /
    • 제1권1호
    • /
    • pp.63-72
    • /
    • 2000
  • 데이터마이닝은 방대한 데이터 자료로부터 숨어있는 지식이나 유용한 정보를 추출하는 과정이다. 이러한 데이터 마이닝 알고리즘은 통계학, 전자계산학, 그리고 기계학습 분야에서의 오랜 기간동안 이루어진 연구 결과의 산물이다. 어느 특정한 상황에 적용하는 특정한 기술들의 선택은 구현되어야 하는 데이터 마이닝 임무의 성격과 가용한 데이터의 성격에 의존한다. 데이터 마이닝에는 여러 임무가 있으며, 그 중에서 가장 대표적인 임무가 분류라고 (classification) 볼 수 있다. 분류는 인간 사고의 기본적인 요소이기 때문에 여러 응용 분야에서 많은 연구가 진행되어 왔으며, 문제 분석의 첫 단계라고 볼 수 있다. 본 논문에서는 학습문제에서 강건성(robust)을 갖는 유전자 알고리즘 기반의 분류시스템을 제안하고, 데이터 마이닝에서 중요한 분류기능에 관련된 문제인 nDmC에 응용해서 그 유효성을 검증한다.

  • PDF

Biomedical Ontologies and Text Mining for Biomedicine and Healthcare: A Survey

  • Yoo, Ill-Hoi;Song, Min
    • Journal of Computing Science and Engineering
    • /
    • 제2권2호
    • /
    • pp.109-136
    • /
    • 2008
  • In this survey paper, we discuss biomedical ontologies and major text mining techniques applied to biomedicine and healthcare. Biomedical ontologies such as UMLS are currently being adopted in text mining approaches because they provide domain knowledge for text mining approaches. In addition, biomedical ontologies enable us to resolve many linguistic problems when text mining approaches handle biomedical literature. As the first example of text mining, document clustering is surveyed. Because a document set is normally multiple topic, text mining approaches use document clustering as a preprocessing step to group similar documents. Additionally, document clustering is able to inform the biomedical literature searches required for the practice of evidence-based medicine. We introduce Swanson's UnDiscovered Public Knowledge (UDPK) model to generate biomedical hypotheses from biomedical literature such as MEDLINE by discovering novel connections among logically-related biomedical concepts. Another important area of text mining is document classification. Document classification is a valuable tool for biomedical tasks that involve large amounts of text. We survey well-known classification techniques in biomedicine. As the last example of text mining in biomedicine and healthcare, we survey information extraction. Information extraction is the process of scanning text for information relevant to some interest, including extracting entities, relations, and events. We also address techniques and issues of evaluating text mining applications in biomedicine and healthcare.

Experimental investigation on multi-parameter classification predicting degradation model for rock failure using Bayesian method

  • Wang, Chunlai;Li, Changfeng;Chen, Zeng;Liao, Zefeng;Zhao, Guangming;Shi, Feng;Yu, Weijian
    • Geomechanics and Engineering
    • /
    • 제20권2호
    • /
    • pp.113-120
    • /
    • 2020
  • Rock damage is the main cause of accidents in underground engineering. It is difficult to predict rock damage accurately by using only one parameter. In this study, a rock failure prediction model was established by using stress, energy, and damage. The prediction level was divided into three levels according to the ratio of the damage threshold stress to the peak stress. A classification predicting model was established, including the stress, energy, damage and AE impact rate using Bayesian method. Results show that the model is good practicability and effectiveness in predicting the degree of rock failure. On the basis of this, a multi-parameter classification predicting deterioration model of rock failure was established. The results provide a new idea for classifying and predicting rockburst.

연관 분류 마이닝 기법을 활용한 지식기반 신체활동 평가 모델 (A Knowledge Based Physical Activity Evaluation Model Using Associative Classification Mining Approach)

  • 손창식;최락현;강원석
    • 대한임베디드공학회논문지
    • /
    • 제13권4호
    • /
    • pp.215-223
    • /
    • 2018
  • Recently, as interest of wearable devices has increased, commercially available smart wristbands and applications have been used as a tool for personal healthy management. However most previous studies have focused on evaluating the accuracy and reliability of the technical problems of wearable devices, especially step counts, walking distance, and energy consumption measured from the smart wristbands. In this study, we propose a physical activity evaluation model using classification rules, induced from the associative classification mining approach. These rules associated with five physical activities were generated by considering activities and walking times in target heart rate zones such as 'Out-of Zone', 'Fat Burn Zone', 'Cardio Zone', and 'Peak Zone'. In the experiment, we evaluated the prediction power of classification rules and verified its effectiveness by comparing classification accuracies between the proposed model and support vector machine.

Genetic Algorithm Application to Machine Learning

  • Han, Myung-mook;Lee, Yill-byung
    • 한국지능시스템학회논문지
    • /
    • 제11권7호
    • /
    • pp.633-640
    • /
    • 2001
  • In this paper we examine the machine learning issues raised by the domain of the Intrusion Detection Systems(IDS), which have difficulty successfully classifying intruders. There systems also require a significant amount of computational overhead making it difficult to create robust real-time IDS. Machine learning techniques can reduce the human effort required to build these systems and can improve their performance. Genetic algorithms are used to improve the performance of search problems, while data mining has been used for data analysis. Data Mining is the exploration and analysis of large quantities of data to discover meaningful patterns and rules. Among the tasks for data mining, we concentrate the classification task. Since classification is the basic element of human way of thinking, it is a well-studied problem in a wide variety of application. In this paper, we propose a classifier system based on genetic algorithm, and the proposed system is evaluated by applying it to IDS problem related to classification task in data mining. We report our experiments in using these method on KDD audit data.

  • PDF

데이터 마이닝의 분류 규칙 발견을 위한 유전자알고리즘 학습방법 (Genetics-Based Machine Learning for Generating Classification Rule in Data Mining)

  • 김대희;박상호
    • 한국멀티미디어학회:학술대회논문집
    • /
    • 한국멀티미디어학회 2001년도 추계학술발표논문집
    • /
    • pp.429-434
    • /
    • 2001
  • 데이터(data)치 홍수와 정보의 빈곤이라는 환경에 처한 지금, 정보기술을 이용하여 데이터를 여과하고, 분석하며, 결과를 해석하는 자동화 된 데이터 분석 방안에 높은 관심을 가지게 되었으며, 데이터 마이닝(Data Mining))은 이러한 요구를 충족시키는 정보기술의 활용방법이다. 특히 데이터 마이닝(Data Mining)의 분류(Classification) 방법은 중요한 분야가 되고 있다. 분류 작업의 핵심은 어떻게 적당한 결정규칙(decision rule)을 정의하느냐에 달려 있는데 이를 위해 학습능력을 가지고 있는 알고리즘이 필요하다. 본 논문에서는 유전자 알고리즘(Genetic Algorithm)을 기반으로 하는 강건한 학습방법을 제시했으며, 이러한 학습을 통해 데이터 마이닝(Data Mining)의 분류시스템을 제안하였다.

  • PDF

캘린더 패턴 기반의 시간 연관적 분류 기법 (Temporal Associative Classification based on Calendar Patterns)

  • 이헌규;노기용;서성보;류근호
    • 한국정보과학회논문지:데이타베이스
    • /
    • 제32권6호
    • /
    • pp.567-584
    • /
    • 2005
  • 시간 데이타마이닝은 기존 데이타마이닝에 시간 개념을 추가하여 시간 속성을 가진 데이타로부터 이전에 잘 알려지지는 않았지만 묵시적이고 잠재적으로 유용한 시간 지식을 탐사하는 기술이다. 대표적 데이타마이닝 기법인 연관규칙과 분류기법은 실세계의 여러 응용분야에서 사용된다. 그러나 대부분의 데이타가 시간 속성을 포함함에도 불구하고 기존의 기법들은 시간 속성을 고려하지 않고 주로 정적인 데이타에 대한 지식 탐사만이 진행되었다. 그리고 시간 데이타에 대한 데이타마이닝 연구들은 데이타의 발생시점과 시간 제약조건을 추가한 지식 탐사에 중점을 두고 있어 데이타가 포함한 시간 의미나 시간 관계를 탐사하는데 부족하였다. 이 논문에서는 시간 클래스 연관규칙에 기반한 시간 연관적 분류기법을 제안한다. 이 기법은 분류규칙 생성을 위해서 연관적 분류에 시간 차원을 포함하여 확장한 시간 클래스 연관규칙에 의해 탐사된 규칙들을 적용하는 것이다. 그러므로 이 기법은 기존의 분류 기법들에 비해 더 유용한 지식탐사가 가능하다.