• 제목/요약/키워드: Decision Tree analysis

검색결과 736건 처리시간 0.025초

맵리듀스 기반 DFP-Tree를 이용한 클러스터링 알고리즘 (Clustering Algorithm using the DFP-Tree based on the MapReduce)

  • 서영원;김창수
    • 인터넷정보학회논문지
    • /
    • 제16권6호
    • /
    • pp.23-30
    • /
    • 2015
  • 빅 데이터가 이슈화됨에 따라 데이터 분석의 결과를 기반으로 동작하는 많은 응용들이연구되고 왔고, 대표적인 응용들은 전자상거래 시스템의 상품 추천 서비스, 검색 엔진에서의 검색 서비스, 소셜 네트워크 서비스에서의 친구 추천 서비스 등이 있다. 본 논문은 기존의 데이터 마이닝 기법 중 데이터 집합에서 나타나는 유사한 패턴들을 마이닝하는 빈발 패턴 트리와 컴퓨터 과학의 이론에 기초한 결정트리를 결합하여 결정 빈발 트리 알고리즘을 제안한다. 이는 기존의 빈발 패턴 트리 알고리즘은 패튼 트리에서 패턴 생성에 대한 정확성은 보장되나 소셜 데이터처럼 다양한 패턴이 나타는 데이터에 대해서는 많은 수의 패턴들을 생성시켜 분석에 대한 어려움이 있어, 서브트리들과의 수렴 여부를 판단하는 모델로 변형시켜 문제를 개선한다. 또한 맵리듀스로 모델링하여 분산처리를 통한 고속 처리 알고리즘을 제시한다.

데이터 마이닝 기법을 이용한 피고용자의 근로환경 만족도 요인 분석 (Analysis of employee's satisfaction factor in working environment using data mining algorithm)

  • 이동열;김태호;이홍철
    • 대한안전경영과학회지
    • /
    • 제16권4호
    • /
    • pp.275-284
    • /
    • 2014
  • Decision Tree is one of analysis techniques which conducts grouping and prediction into several sub-groups from interested groups. Researcher can easily understand this progress and explain than other techniques. Because Decision Tree is easy technique to see results. This paper uses CART algorithm which is one of data mining technique. It used 273 variables and 70094 data(2010-2011) of working environment survey conducted by Korea Occupational Safety and Health Agency(KOSHA). And then refines this data, uses final 12 variables and 35447 data. To find satisfaction factor in working environment, this page has grouped employee to 3 types (under 30 age, 30 ~ 49age, over 50 age) and analyzed factor. Using CART algorithm, finds the best grouping variables in 155 data. It appeared that 'comfortable in organization' and 'proper reward' is the best grouping factor.

의사결정트리와 인공 신경망 기법을 이용한 침입탐지 효율성 비교 연구 (A Comparative Study on the Performance of Intrusion Detection using Decision Tree and Artificial Neural Network Models)

  • 조성래;성행남;안병혁
    • 디지털산업정보학회논문지
    • /
    • 제11권4호
    • /
    • pp.33-45
    • /
    • 2015
  • Currently, Internet is used an essential tool in the business area. Despite this importance, there is a risk of network attacks attempting collection of fraudulence, private information, and cyber terrorism. Firewalls and IDS(Intrusion Detection System) are tools against those attacks. IDS is used to determine whether a network data is a network attack. IDS analyzes the network data using various techniques including expert system, data mining, and state transition analysis. This paper tries to compare the performance of two data mining models in detecting network attacks. They are decision tree (C4.5), and neural network (FANN model). I trained and tested these models with data and measured the effectiveness in terms of detection accuracy, detection rate, and false alarm rate. This paper tries to find out which model is effective in intrusion detection. In the analysis, I used KDD Cup 99 data which is a benchmark data in intrusion detection research. I used an open source Weka software for C4.5 model, and C++ code available for FANN model.

A Statistical Analysis of Professional Baseball Team Data: The Case of the Lotte Giants

  • Cho, Young-Seuk;Han, Jun-Tae;Park, Chan-Keun;Heo, Tae-Young
    • 응용통계연구
    • /
    • 제23권6호
    • /
    • pp.1191-1199
    • /
    • 2010
  • Knowing what factors into a player's ability to affect the outcome of a sports game is crucial. This knowledge helps determine the relative degree of contribution by each team member as well as sets appropriate annual salaries. This study uses statistical analysis to investigate how much the outcome of a professional baseball game is influenced by the records of individual players. We used the Lotte Giants' data on 252 games played between 2007 and 2008 that included environmental data(home or away games and opponents) as well as pitchers' and batters' data. Using a SAS Enterprise Miner, we performed a logistic regression analysis and decision tree analysis on the data. The results obtained through the two analytic methods are compared and discussed.

학업성취도에 대한 대입전형 요인들의 영향력 분석 (The influence analysis of admission variables on academic achievements)

  • 조장식
    • Journal of the Korean Data and Information Science Society
    • /
    • 제21권4호
    • /
    • pp.729-736
    • /
    • 2010
  • 본 논문에서는 부산 소재 K 대학교 신입생들의 학업성취도에 대해 신입생의 특성변수를 포함한 전형관련 변수들에 대한 영향력 분석을 연구한다. 이를 위해 모수적인 방법인 다중회귀분석과 비모수적인 방법인 의사결정나무 분석을 통하여 학업성취도에 대한 전형관련 변수들에 대한 주효과와 상호 작용효과를 각각 분석하였다.

Relations between Information Items of Job Posting and Vacancy Duration in Mid-level Labour Market - by GLM, Decision Tree

  • Kim, Hyoungrae;Jeon, Dohong
    • 한국컴퓨터정보학회논문지
    • /
    • 제21권4호
    • /
    • pp.89-96
    • /
    • 2016
  • In this paper, we study the relationship between vacancy duration and information items of a job posting by using generalized linear models and a decision tree analysis w.r.t. the three factors such as company characteristics, employment conditions, and constraints. The results indicate that the employment conditions rather than company characteristics are more influential to the vacancy duration. These effects are presumed to be based on the complex relations between the decisions of the employers and the job seekers. And in this paper we suggest the need to provide personalized and profiled labor market information tailored for a quick decision to job seekers and employers. Policy implication is that since employer's decision affects the vacation duration, employers may had better to provide a comprehensive labour market information including supply and demand of the required skills in order to reduce the time for judgment on the cost-effectiveness.

머신러닝 알고리즘 기반의 의료비 예측 모델 개발 (Development of Medical Cost Prediction Model Based on the Machine Learning Algorithm)

  • Han Bi KIM;Dong Hoon HAN
    • Journal of Korea Artificial Intelligence Association
    • /
    • 제1권1호
    • /
    • pp.11-16
    • /
    • 2023
  • Accurate hospital case modeling and prediction are crucial for efficient healthcare. In this study, we demonstrate the implementation of regression analysis methods in machine learning systems utilizing mathematical statics and machine learning techniques. The developed machine learning model includes Bayesian linear, artificial neural network, decision tree, decision forest, and linear regression analysis models. Through the application of these algorithms, corresponding regression models were constructed and analyzed. The results suggest the potential of leveraging machine learning systems for medical research. The experiment aimed to create an Azure Machine Learning Studio tool for the speedy evaluation of multiple regression models. The tool faciliates the comparision of 5 types of regression models in a unified experiment and presents assessment results with performance metrics. Evaluation of regression machine learning models highlighted the advantages of boosted decision tree regression, and decision forest regression in hospital case prediction. These findings could lay the groundwork for the deliberate development of new directions in medical data processing and decision making. Furthermore, potential avenues for future research may include exploring methods such as clustering, classification, and anomaly detection in healthcare systems.

빅 데이터 기반의 체납 수용가 예측 모델 (Prediction Model for Unpaid Customers Using Big Data)

  • 정재안;이규환;정회경
    • 한국정보통신학회논문지
    • /
    • 제24권7호
    • /
    • pp.827-833
    • /
    • 2020
  • 본 논문에서는 지자체의 요금 체납을 줄이기 위해 특정 지자체를 대상으로 검침원의 면담 등을 통해 지방상수도 통합정보시스템에서 체납에 영향을 미치는 내부 데이터 요소를 찾았다. 또한 국가 통계 데이터 중에서 체납에 영향을 미치는 후보 데이터를 도출하였다. 독립변수가 종속변수에 미치는 영향도는 정보이득이라는 데이터 집합에서 종속변수에 대한 무질서도를 조사하여 표본 데이터를 수집하였다. 그리고 빅 데이터 분석 알고리즘인 의사결정트리와 로지스틱 회귀기법 중 어느 알고리즘이 더 높은 예측율을 나타내는지 n-fold cross-validation 방법을 사용하여 평가하였다. 이를 통해 지자체의 데이터를 기초로 알고리즘의 성능을 비교한 결과 의사결정트리가 로지스틱회귀보다 더 정확한 수용가 납부 패턴을 찾을 수 있음을 확인하였다. 머신러닝을 이용한 분석 알고리즘 모델 개발의 과정에서는 알고리즘의 정확성 향상을 위해 의사결정트리의 복잡성과 정확성에 직접적인 영향을 주는 최소 데이터 개수와 최대 순도라는 두 개의 환경변수의 최적값을 도출하였다.

A decision support system (DSS) for construction risk efficiency in Taiwan

  • Tsai, Tsung-Chieh;Li, Hsiang-Wen
    • Smart Structures and Systems
    • /
    • 제21권2호
    • /
    • pp.249-255
    • /
    • 2018
  • Many studies in risk management have been focused on management process, contract relation, and risk analysis in the past decade, but very few studies have addressed project risks from the perspective of risk efficiency. This study started with using Fault Tree Analysis to develop a framework for the decision-making support system of risk management from the perspective of risk efficiency, in order for the support system to find risk strategies of optimal combination for the project manager by the trade-off between project risk and cost of project strategies. Comprehensive and realistic risk strategies must strive for optimal decisions that minimize project risks and risk strategies cost while addressing important data such as risk causes, risk probability, risk impact and risk strategies cost. The risk management in the construction phase of building projects in Taiwan upon important data has been analyzed, that provided the data for support system to include 247 risk causes. Then, 17 risk causes were extracted to demonstrates the decision-making support system of risk management from the perspective of risk efficiency in building project of Taiwan which could reach better combination type of risk strategies for the project manager by the trade-off between risk cost and project risk.

음소 결정트리의 노드 분할을 위한 임계치 자동 결정 알고리즘 (The Automated Threshold Decision Algorithm for Node Split of Phonetic Decision Tree)

  • 김범승;김순협
    • 한국음향학회지
    • /
    • 제31권3호
    • /
    • pp.170-178
    • /
    • 2012
  • 본 논문에서는 코레일에서 운영중인 640개 기차역명의 음소기반의 음성인식을 위하여 트라이폰 단위의 음소 결정트리 구축 시 노드 분할 과정에서 사용되는 임계치의 결정에 있어 통계적 기법인 상관관계 분석과 회귀분석을 활용하여 군집화율을 추정하고 이를 이용한 평균 군집화율에 따른 임계치의 값에 의해 자동으로 결정하는 방법을 제안하였다. 제안된 방법의 유효성 검증을 위한 실험에서 기존의 일괄 적용된 Baseline 보다 1.4~2.3 %의 인식률 향상을 보였다.