• Title/Summary/Keyword: Decision Tree Technique

Search Result 206, Processing Time 0.028 seconds

Twostep Clustering of Environmental Indicator Survey Data

  • Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.1
    • /
    • pp.1-11
    • /
    • 2006
  • Data mining technique is used to find hidden knowledge by massive data, unexpectedly pattern, relation to new rule. The methods of data mining are decision tree, association rules, clustering, neural network and so on. Clustering is the process of grouping the data into clusters so that objects within a cluster have high similarity in comparison to one another. It has been widely used in many applications, such that pattern analysis or recognition, data analysis, image processing, market research on off-line or on-line and so on. We analyze Gyeongnam social indicator survey data by 2001 using twostep clustering technique for environment information. The twostep clustering is classified as a partitional clustering method. We can apply these twostep clustering outputs to environmental preservation and improvement.

  • PDF

Program Plagiarism Detection based on X-treeDiff+ (X-treeDiff+ 기반의 프로그램 복제 탐지)

  • Lee, Suk-Kyoon
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.4
    • /
    • pp.44-53
    • /
    • 2010
  • Program plagiarism is a significant factor to reduce the quality of education in computer programming. In this paper, we propose the technique of identifying similar or identical programs in order to prevent students from reckless copying their programming assignments. Existing approaches for identifying similar programs are mainly based on fingerprints or pattern matching for text documents. Different from those existing approaches, we propose an approach based on the program structur. Using paring progrmas, we first transform programs into XML documents by representing syntactic components in the programs with elements in XML document, then run X-tree Diff+, which is the change detection algorithm for XML documents, and produce an edit script as a change. The decision of similar or identical programs is made on the analysis of edit scripts in terms of program plagiarism. Analysis of edit scripts allows users to understand the process of conversion between two programs so that users can make qualitative judgement considering the characteristics of program assignment and the degree of plagiarism.

The Training Data Generation and a Technique of Phylogenetic Tree Generation using Decision Tree (트레이닝 데이터 생성과 의사 결정 트리를 이용한 계통수 생성 방법)

  • Chae, Deok-Jin;Sin, Ye-Ho;Cheon, Tae-Yeong;Go, Heung-Seon;Ryu, Geun-Ho;Hwang, Bu-Hyeon
    • The KIPS Transactions:PartD
    • /
    • v.10D no.6
    • /
    • pp.897-906
    • /
    • 2003
  • The traditional animal phylogenetic tree is to align the body structure of the animal phylums from simple to complex based on the initial development character. Currently, molecular systematics research based on the molecular, it is on the fly, is again estimating prior trend and show the new genealogy and interest of the evolution. In this paper, we generate the training set which is obtained from a DNA sequence ans apply to the classification. We made use of the mitochondrial DNA for the experiment, and then proved the accuracy using the MEGA program which is anaysis program, it is used in the biology field. Although the result of the mining has to proved through biological experiment, it can provede the methodology for the efficient classify and can reduce the time and effort to the experiment.

Prediction of box office using data mining (데이터마이닝을 이용한 박스오피스 예측)

  • Jeon, Seonghyeon;Son, Young Sook
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.7
    • /
    • pp.1257-1270
    • /
    • 2016
  • This study deals with the prediction of the total number of movie audiences as a measure for the box office. Prediction is performed by classification techniques of data mining such as decision tree, multilayer perceptron(MLP) neural network model, multinomial logit model, and support vector machine over time such as before movie release, release day, after release one week, and after release two weeks. Predictors used are: online word-of-mouth(OWOM) variables such as the portal movie rating, the number of the portal movie rater, and blog; in addition, other variables include showing the inherent properties of the film (such as nationality, grade, release month, release season, directors, actors, distributors, the number of audiences, and screens). When using 10-fold cross validation technique, the accuracy of the neural network model showed more than 90 % higher predictability before movie release. In addition, it can be seen that the accuracy of the prediction increases by adding estimates of the final OWOM variables as predictors.

A Personalized Hand Gesture Recognition System using Soft Computing Techniques (소프트 컴퓨팅 기법을 이용한 개인화된 손동작 인식 시스템)

  • Jeon, Moon-Jin;Do, Jun-Hyeong;Lee, Sang-Wan;Park, Kwang-Hyun;Bien, Zeung-Nam
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.1
    • /
    • pp.53-59
    • /
    • 2008
  • Recently, vision-based hand gesture recognition techniques have been developed for assisting elderly and disabled people to control home appliances. Frequently occurred problems which lower the hand gesture recognition rate are due to the inter-person variation and intra-person variation. The recognition difficulty caused by inter-person variation can be handled by using user dependent model and model selection technique. And the recognition difficulty caused by intra-person variation can be handled by using fuzzy logic. In this paper, we propose multivariate fuzzy decision tree learning and classification method for a hand motion recognition system for multiple users. When a user starts to use the system, the most appropriate recognition model is selected and used for the user.

Customer Churn Prediction of Automobile Insurance by Multiple Models (다중모델을 이용한 자동차 보험 고객의 이탈예측)

  • LeeS Jae-Sik;Lee Jin-Chun
    • Journal of Intelligence and Information Systems
    • /
    • v.12 no.2
    • /
    • pp.167-183
    • /
    • 2006
  • Since data mining attempts to find unknown facts or rules by dealing with also vaguely-known data sets, it always suffers from high error rate. In order to reduce the error rate, many researchers have employed multiple models in solving a problem. In this research, we present a new type of multiple models, called DyMoS, whose unique feature is that it classifies the input data and applies the different model developed appropriately for each class of data. In order to evaluate the performance of DyMoS, we applied it to a real customer churn problem of an automobile insurance company, The result shows that the DyMoS outperformed any model which employed only one data mining technique such as artificial neural network, decision tree and case-based reasoning.

  • PDF

A Study on the Work-time Estimation for Block Erections Using Stacking Ensemble Learning (Stacking Ensemble Learning을 활용한 블록 탑재 시수 예측)

  • Kwon, Hyukcheon;Ruy, Wonsun
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.56 no.6
    • /
    • pp.488-496
    • /
    • 2019
  • The estimation of block erection work time at a dock is one of the important factors when establishing or managing the total shipbuilding schedule. In order to predict the work time, it is a natural approach that the existing block erection data would be used to solve the problem. Generally the work time per unit is the product of coefficient value, quantity, and product value. Previously, the work time per unit is determined statistically by unit load data. However, we estimate the work time per unit through work time coefficient value from series ships using machine learning. In machine learning, the outcome depends mainly on how the training data is organized. Therefore, in this study, we use 'Feature Engineering' to determine which one should be used as features, and to check their influence on the result. In order to get the coefficient value of each block, we try to solve this problem through the Ensemble learning methods which is actively used nowadays. Among the many techniques of Ensemble learning, the final model is constructed by Stacking Ensemble techniques, consisting of the existing Ensemble models (Decision Tree, Random Forest, Gradient Boost, Square Loss Gradient Boost, XG Boost), and the accuracy is maximized by selecting three candidates among all models. Finally, the results of this study are verified by the predicted total work time for one ship among the same series.

Using Data Mining Techniques to Predict Win-Loss in Korean Professional Baseball Games (데이터마이닝을 활용한 한국프로야구 승패예측모형 수립에 관한 연구)

  • Oh, Younhak;Kim, Han;Yun, Jaesub;Lee, Jong-Seok
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.40 no.1
    • /
    • pp.8-17
    • /
    • 2014
  • In this research, we employed various data mining techniques to build predictive models for win-loss prediction in Korean professional baseball games. The historical data containing information about players and teams was obtained from the official materials that are provided by the KBO website. Using the collected raw data, we additionally prepared two more types of dataset, which are in ratio and binary format respectively. Dividing away-team's records by the records of the corresponding home-team generated the ratio dataset, while the binary dataset was obtained by comparing the record values. We applied seven classification techniques to three (raw, ratio, and binary) datasets. The employed data mining techniques are decision tree, random forest, logistic regression, neural network, support vector machine, linear discriminant analysis, and quadratic discriminant analysis. Among 21(= 3 datasets${\times}$7 techniques) prediction scenarios, the most accurate model was obtained from the random forest technique based on the binary dataset, which prediction accuracy was 84.14%. It was also observed that using the ratio and the binary dataset helped to build better prediction models than using the raw data. From the capability of variable selection in decision tree, random forest, and stepwise logistic regression, we found that annual salary, earned run, strikeout, pitcher's winning percentage, and four balls are important winning factors of a game. This research is distinct from existing studies in that we used three different types of data and various data mining techniques for win-loss prediction in Korean professional baseball games.

The Study on Applying Ankle Joint Load Variable Lower-Knee Prosthesis to Development of Terrain-Adaptive Above-Knee Prosthesis (노면 적응형 대퇴 의족개발을 위한 발목 관절 부하 가변형 하퇴 의족 적용에 대한 연구)

  • Eom, Su-Hong;Na, Sun-Jong;You, Jung-Hwun;Park, Se-Hoon;Lee, Eung-Hyuk
    • Journal of IKEEE
    • /
    • v.23 no.3
    • /
    • pp.883-892
    • /
    • 2019
  • This study is the method which is adapted to control ankle joint movement for resolving the problem of gait imbalance in intervals where gait environments are changed and slope walking, as applying terrain-adaptive technique to intelligent above-knee prosthesis. In this development of above-knee prosthesis, to classify the gait modes is essential. For distinguishing the stance phases and the swing phase depending on roads, a machine learning which combines decision tree and random forest from knee angle data and inertial sensor data, is proposed and adapted. By using this method, the ankle movement state of the prosthesis is controlled. This study verifies whether the problem is resolved through butterfly diagram.

Diabetes prediction mechanism using machine learning model based on patient IQR outlier and correlation coefficient (환자 IQR 이상치와 상관계수 기반의 머신러닝 모델을 이용한 당뇨병 예측 메커니즘)

  • Jung, Juho;Lee, Naeun;Kim, Sumin;Seo, Gaeun;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.10
    • /
    • pp.1296-1301
    • /
    • 2021
  • With the recent increase in diabetes incidence worldwide, research has been conducted to predict diabetes through various machine learning and deep learning technologies. In this work, we present a model for predicting diabetes using machine learning techniques with German Frankfurt Hospital data. We apply outlier handling using Interquartile Range (IQR) techniques and Pearson correlation and compare model-specific diabetes prediction performance with Decision Tree, Random Forest, Knn (k-nearest neighbor), SVM (support vector machine), Bayesian Network, ensemble techniques XGBoost, Voting, and Stacking. As a result of the study, the XGBoost technique showed the best performance with 97% accuracy on top of the various scenarios. Therefore, this study is meaningful in that the model can be used to accurately predict and prevent diabetes prevalent in modern society.