• 제목/요약/키워드: Decision-tree model

검색결과 735건 처리시간 0.024초

맵리듀스 기반 DFP-Tree를 이용한 클러스터링 알고리즘 (Clustering Algorithm using the DFP-Tree based on the MapReduce)

  • 서영원;김창수
    • 인터넷정보학회논문지
    • /
    • 제16권6호
    • /
    • pp.23-30
    • /
    • 2015
  • 빅 데이터가 이슈화됨에 따라 데이터 분석의 결과를 기반으로 동작하는 많은 응용들이연구되고 왔고, 대표적인 응용들은 전자상거래 시스템의 상품 추천 서비스, 검색 엔진에서의 검색 서비스, 소셜 네트워크 서비스에서의 친구 추천 서비스 등이 있다. 본 논문은 기존의 데이터 마이닝 기법 중 데이터 집합에서 나타나는 유사한 패턴들을 마이닝하는 빈발 패턴 트리와 컴퓨터 과학의 이론에 기초한 결정트리를 결합하여 결정 빈발 트리 알고리즘을 제안한다. 이는 기존의 빈발 패턴 트리 알고리즘은 패튼 트리에서 패턴 생성에 대한 정확성은 보장되나 소셜 데이터처럼 다양한 패턴이 나타는 데이터에 대해서는 많은 수의 패턴들을 생성시켜 분석에 대한 어려움이 있어, 서브트리들과의 수렴 여부를 판단하는 모델로 변형시켜 문제를 개선한다. 또한 맵리듀스로 모델링하여 분산처리를 통한 고속 처리 알고리즘을 제시한다.

원자력 발전소 사고 예측 모형과 병합한 최적 운행중지 결정 모형 (Deciding the Optimal Shutdown Time Incorporating the Accident Forecasting Model)

  • 양희중
    • 산업경영시스템학회지
    • /
    • 제41권4호
    • /
    • pp.171-178
    • /
    • 2018
  • Recently, the continuing operation of nuclear power plants has become a major controversial issue in Korea. Whether to continue to operate nuclear power plants is a matter to be determined considering many factors including social and political factors as well as economic factors. But in this paper we concentrate only on the economic factors to make an optimum decision on operating nuclear power plants. Decisions should be based on forecasts of plant accident risks and large and small accident data from power plants. We outline the structure of a decision model that incorporate accident risks. We formulate to decide whether to shutdown permanently, shutdown temporarily for maintenance, or to operate one period of time and then periodically repeat the analysis and decision process with additional information about new costs and risks. The forecasting model to predict nuclear power plant accidents is incorporated for an improved decision making. First, we build a one-period decision model and extend this theory to a multi-period model. In this paper we utilize influence diagrams as well as decision trees for modeling. And bayesian statistical approach is utilized. Many of the parameter values in this model may be set fairly subjective by decision makers. Once the parameter values have been determined, the model will be able to present the optimal decision according to that value.

Decision Tree of Occupational Lung Cancer Using Classification and Regression Analysis

  • Kim, Tae-Woo;Koh, Dong-Hee;Park, Chung-Yill
    • Safety and Health at Work
    • /
    • 제1권2호
    • /
    • pp.140-148
    • /
    • 2010
  • Objectives: Determining the work-relatedness of lung cancer developed through occupational exposures is very difficult. Aims of the present study are to develop a decision tree of occupational lung cancer. Methods: 153 cases of lung cancer surveyed by the Occupational Safety and Health Research Institute (OSHRI) from 1992-2007 were included. The target variable was whether the case was approved as work-related lung cancer, and independent variables were age, sex, pack-years of smoking, histological type, type of industry, latency, working period and exposure material in the workplace. The Classification and Regression Test (CART) model was used in searching for predictors of occupational lung cancer. Results: In the CART model, the best predictor was exposure to known lung carcinogens. The second best predictor was 8.6 years or higher latency and the third best predictor was smoking history of less than 11.25 pack-years. The CART model must be used sparingly in deciding the work-relatedness of lung cancer because it is not absolute. Conclusion: We found that exposure to lung carcinogens, latency and smoking history were predictive factors of approval for occupational lung cancer. Further studies for work-relatedness of occupational disease are needed.

머신 러닝을 활용한 의류제품의 판매량 예측 모델 - 아우터웨어 품목을 중심으로 - (Sales Forecasting Model for Apparel Products Using Machine Learning Technique - A Case Study on Forecasting Outerwear Items -)

  • 채진미;김은희
    • 한국의류산업학회지
    • /
    • 제23권4호
    • /
    • pp.480-490
    • /
    • 2021
  • Sales forecasting is crucial for many retail operations. For apparel retailers, accurate sales forecast for the next season is critical to properly manage inventory and plan their supply chains. The challenge in this increases because apparel products are always new for the next season, have numerous variations, short life cycles, long lead times, and seasonal trends. In this study, a sales forecasting model is proposed for apparel products using machine learning techniques. The sales data pertaining to outerwear items for four years were collected from a Korean sports brand and filtered with outliers. Subsequently, the data were standardized by removing the effects of exogenous variables. The sales patterns of outerwear items were clustered by applying K-means clustering, and outerwear attributes associated with the specific sales-pattern type were determined by using a decision tree classifier. Six types of sales pattern clusters were derived and classified using a hybrid model of clustering and decision tree algorithm, and finally, the relationship between outerwear attributes and sales patterns was revealed. Each sales pattern can be used to predict stock-keeping-unit-level sales based on item attributes.

A Study on a car Insurance purchase Prediction Using Two-Class Logistic Regression and Two-Class Boosted Decision Tree

  • AN, Su Hyun;YEO, Seong Hee;KANG, Minsoo
    • 한국인공지능학회지
    • /
    • 제9권1호
    • /
    • pp.9-14
    • /
    • 2021
  • This paper predicted a model that indicates whether to buy a car based on primary health insurance customer data. Currently, automobiles are being used to land transportation and living, and the scope of use and equipment is expanding. This rapid increase in automobiles has caused automobile insurance to emerge as an essential business target for insurance companies. Therefore, if the car insurance sales are predicted and sold using the information of existing health insurance customers, it can generate continuous profits in the insurance company's operating performance. Therefore, this paper aims to analyze existing customer characteristics and implement a predictive model to activate advertisements for customers interested in such auto insurance. The goal of this study is to maximize the profits of insurance companies by devising communication strategies that can optimize business models and profits for customers. This study was conducted through the Microsoft Azure program, and an automobile insurance purchase prediction model was implemented using Health Insurance Cross-sell Prediction data. The program algorithm uses Two-Class Logistic Regression and Two-Class Boosted Decision Tree at the same time to compare two models and predict and compare the results. According to the results of this study, when the Threshold is 0.3, the AUC is 0.837, and the accuracy is 0.833, which has high accuracy. Therefore, the result was that customers with health insurance could induce a positive reaction to auto insurance purchases.

의사결정트리를 이용한 교육성과 요인에 관한 연구 (A Study on Factors of Education's Outcome using Decision Trees)

  • 김완섭
    • 공학교육연구
    • /
    • 제13권4호
    • /
    • pp.51-59
    • /
    • 2010
  • 대학에서 운영되는 강좌를 효과적으로 관리하고 교육성과를 향상시키기 위해서는 각 클래스의 현재의 교육성과를 진단하고 교육성과에 영향을 미치는 요인들을 파악하는 과정이 요구된다. 요인을 발견하는 연구에는 연관성 분석, 회귀분석 등의 통계기법들이 많이 사용되고 있으며 최근에는 데이터마이닝의 결정트리 분석도 사용되고 있다. 결정트리 분석은 결과 모델을 이해하기 쉽고 의사결정에 적용하기 쉽다는 장점이 있지만, 다중공선성 등의 입력 데이터의 특성에 견고하지 못한 문제점이 있다. 본 연구에서는 기존의 결정트리 분석의 문제점들을 정리하고, 이 문제점들을 보완하기 위한 하나의 실험적 해결책으로 다중 결정트리를 이용한 요인의 발견 방법을 제안한다. 실험을 통해 다중 결정트리를 수행이 다중 결정트리를 적용할 때보다 신뢰할 수 있는 요인을 발견하고 각 변수의 중요성을 발견할 수 있음을 보였다.

  • PDF

리스크 분석에 기초한 대형건설공사의 예비비 산정에 관한 연구 (Risk-based Decision Model to Estimate the Contingency for Large Construction Projects)

  • 김두연;한구수;한승헌
    • 한국건설관리학회:학술대회논문집
    • /
    • 한국건설관리학회 2003년도 학술대회지
    • /
    • pp.485-490
    • /
    • 2003
  • 최근 대내외적인 건설환경의 급격한 변화와 건설공사의 대형화${\cdot}$복잡화 추세는 내외부적으로 많은 리스크 요인을 증대시키고 있기 때문에, 이에 대한 합리적이고 효율적인 관리방안의 중요성이 크게 대두되고 있다. 본 연구에서는 이러한 리스크 관리방안의 하나로서, 사업추진과정에서의 공사비 증액이 매우 제한되어 있어 입찰단계에서 견적금액의 불확실성 요소(예비비)를 고려해야만 하는 턴키공사 등 대형 건설공사를 대상으로, 이에 내재된 리스크요인의 정량화를 통해 합리적이고 적절한 예비비를 산정할 수 있는 모델을 제시하고자 한다. 예비비 산정모델의 개발을 위하여 실제 수행된 프로젝트를 선정하여 각 공사의 예비비 집행현황, 공사비 현황 등의 자료를 토대로 공사비에 영향을 미치는 인자를 도출하였으며, 몬테칼로 시뮬레이션(Monte Carlo Simulation)과 영향도(Influence Diagram), 의사결정 수형도(Decision Tree)를 혼합한 CRM(Cost Risk Model)을 적용하여 이러한 리스크 인자의 영향을 구조화하였다. 또한 구축된 모델을 기 완료된 대형공사에 적용하여 그 타당성을 검증하고자 하였다.

  • PDF

다변량 퍼지 의사결정트리와 사용자 적응을 이용한 손동작 인식 (Hand Gesture Recognition using Multivariate Fuzzy Decision Tree and User Adaptation)

  • 전문진;도준형;이상완;박광현;변증남
    • 로봇학회논문지
    • /
    • 제3권2호
    • /
    • pp.81-90
    • /
    • 2008
  • While increasing demand of the service for the disabled and the elderly people, assistive technologies have been developed rapidly. The natural signal of human such as voice or gesture has been applied to the system for assisting the disabled and the elderly people. As an example of such kind of human robot interface, the Soft Remote Control System has been developed by HWRS-ERC in $KAIST^[1]$. This system is a vision-based hand gesture recognition system for controlling home appliances such as television, lamp and curtain. One of the most important technologies of the system is the hand gesture recognition algorithm. The frequently occurred problems which lower the recognition rate of hand gesture are inter-person variation and intra-person variation. Intra-person variation can be handled by inducing fuzzy concept. In this paper, we propose multivariate fuzzy decision tree(MFDT) learning and classification algorithm for hand motion recognition. To recognize hand gesture of a new user, the most proper recognition model among several well trained models is selected using model selection algorithm and incrementally adapted to the user's hand gesture. For the general performance of MFDT as a classifier, we show classification rate using the benchmark data of the UCI repository. For the performance of hand gesture recognition, we tested using hand gesture data which is collected from 10 people for 15 days. The experimental results show that the classification and user adaptation performance of proposed algorithm is better than general fuzzy decision tree.

  • PDF

의사결정나무 기법을 적용한 DSRC 통행속도패턴 분류방안 (Study on the Classification Methodology for DSRC Travel Speed Patterns Using Decision Trees)

  • 이민하;이상수;남궁성;최기주
    • 한국ITS학회 논문지
    • /
    • 제13권2호
    • /
    • pp.1-11
    • /
    • 2014
  • 본 논문의 목적은 DSRC 기반 통행속도 이력데이터를 활용하여 IC-IC 구간 단위의 통행패턴을 도출하는 것이며, 이를 통해 방대한 이력정보 데이터의 활용도를 높이고, 단순하지만 정확성 높은 방법으로 도로의 통행패턴을 용이하게 파악할 수 있게 하는 것이다. 통행패턴 분류는 의사결정나무 기법을 적용하였고, 월 시간대 구간 단위로 분리된 통행패턴을 생성하여 시 공간이 변화되어도 이에 대응 가능하도록 하였다. 경부고속도로 서울TG~안성IC 구간을 대상으로 의사결정나무 기법을 적용한 결과, 요일 기준으로 (월)(화 수 목)(금)(토)(일) 5개 그룹으로 고정 통행패턴이 분류되었다. 분류 결과를 영동, 중부, 중부내륙 고속도로의 9개 구간에 적용하여 통계적 검증을 수행한 결과 약 93%의 적합도를 갖는 것으로 나타났다. 의사결정나무를 통한 통행패턴 오차를 개선하기 위하여 4개의 추가변수를 도입한 결과, "직전월의 소통상황"을 설명변수로 추가할 경우 통행속도 분산이 약 50% 감소함을 확인하였고, 실제 상황에 적용할 경우 소통 원활 시의 오차가 약 4% 감소되었다.

계층구조의 속성을 가지는 의사결정 문제의 선호순위도출을 위한 수리계획모형 (Mathematical Programming Models for Establishing Dominance with Hierarchically Structured Attribute Tree)

  • 한창희
    • 한국국방경영분석학회지
    • /
    • 제28권2호
    • /
    • pp.34-55
    • /
    • 2002
  • This paper deals with the multiple attribute decision making problem when a decision maker incompletely articulates his/her preferences about the attribute weight and alternative value. Furthermore, we consider the attribute tree which is structured hierarchically. Techniques for establishing dominance with linear partial information are proposed in a hierarchically structured attribute tree. The linear additive value function under certainty is used in the model. The incompletely specified information constructs a feasible region of linear constraints and therefore the pairwise dominance relationship between alternatives leads to intractable non-linear programming. Hence, we propose solution techniques to handle this difficulty. Also, to handle the tree structure, we break down the attribute tree into sub-trees. Due to there cursive structure of the solution technique, the optimization results from sub-trees can be utilized in computing the value interval on the topmost attribute. The value intervals computed by the proposed solution techniques can be used to establishing the pairwise dominance relation between alternatives. In this paper, pairwise dominance relation will be represented as strict dominance and weak dominance, which ware already defined in earlier researches.