• Title/Summary/Keyword: data mining(CART)

Search Result 68, Processing Time 0.022 seconds

Estimate Soil Moisutre Using Satelite Image and Data Mining (위성영상과 데이터 마이닝 기법을 이용한 토양수분 산정)

  • Kim, Gwang-Seob;Park, Han-Gyun;Cho, So-Hyun
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2010.05a
    • /
    • pp.1615-1619
    • /
    • 2010
  • 토양수분은 토양입자에 포함되어 있는 물을 의미하는 것으로 지표면과 대기간의 에너지 균형과 물 순환을 조절하는데 중요한 요소이다. 본 연구에서는 토양수분 산정을 위하여 2003년 1월부터 2008년 12월까지의 MODIS(Moderate Resolution Imaging Spectroradiometer) 위성관측 자료로부터 획득한 정규식생지수(NDVI: Normalized Difference Vegetation Index)자료와 지표면 온도자료, 우리나라 76개소 기상관측소 중에 자료의 보유기간이 30년 이하인 관측소와 섬 지역들을 제외한 57개 지점의 강수량, 토양온도 자료 및 우리나라 전역에 대한 토지피복, 유효토심자료를 이용하여 데이터 마이닝(Data Mining) 기법의 하나인 CART(Classification And Regression Tree) 기법을 이용하여 토양수분을 산정하였다. 먼저 신뢰성 높은 토양수분 관측 자료를 가진 용담댐 유역의 6개 지점에 대하여 토양수분을 산정하여 적용 가능성을 분석하였다. 3개 지점의 토양수분 관측치는 토양수분 산정 모형 수립에 사용하였으며 검증에 사용된 1개 지점의 토양수분의 관측치와 추정치 간의 상관계수를 확인한 결과 전체적인 토양수분의 거동을 잘 나타내고 있어 토양수분 추정 모형의 적용가능성을 확인하였다. 이를 이용하여 용담댐 유역의 토양수분 분포와 우리나라 전역에 대한 토양수분 분포도를 추정하였다. 신뢰할 수 있는 지상관측 토양수분 관측치가 다양한 지상조건에 대하여 존재하지 않는 한계가 있음에도 불구하고 제시된 토양수분산정 방법은 제한된 가용자료를 사용한 우리나라 전역의 토양수분 산정에 있어 합리적인 접근법이라 판단된다.

  • PDF

A Study of Combined Splitting Rules in Regression Trees

  • Lee, Yung-Seop
    • Journal of the Korean Data and Information Science Society
    • /
    • v.13 no.1
    • /
    • pp.97-104
    • /
    • 2002
  • Regression trees, a technique in data mining, are constructed by splitting function-a independent variable and its threshold. Lee (2002) considered one-sided purity (OSP) and one-sided extreme (OSE) splitting criteria for finding a interesting node as early as possible. But these methods cannot be crossed each other in the same tree. They are just concentrated on OSP or OSE separately in advance. In this paper, a new splitting method, which is the combination and extension of OSP and OSE, is proposed. By these combined criteria, we can select the nodes by considering both pure and extreme in the same tree. These criteria are not the generalized one of the previous criteria but another option depending on the circumstance.

  • PDF

Study on the Application of Decision Trees for Personalization based on e-CRM (e-CRM에서 개인화 향상을 위한 의사결정나무 사용에 관한 연구)

  • 양정희;한서정
    • Journal of the Korea Safety Management & Science
    • /
    • v.5 no.3
    • /
    • pp.107-119
    • /
    • 2003
  • Expectation and interest about e-CRM are rising for more efficient customer management in on-line including electronic commerce. The decision-making tree can be used usefully as the data mining technology for e-CRM. In this paper, the representative decision making techniques, CART, C4.5, CHAID analyzed the differences in personalization point of view with actuality customer data through an experiment. With these analysis data, it is proposed a new decision-making tree system that has big advantage in personalization techniques. Through new system, it can get following advantage. First, it can form superior model more qualitatively in personalization by adding individual's weight value. Second it can supply information personalized more to customer. Third, it can have high position about customer's loyalty than other site of similar types of business. Fourth, it can reduce expense that cost marketing and decision-making. Fifth, it becomes possible that know that customer through smooth communication with customer who use personalized service wants and make from goods or service's quality to more worth thing.

Robust Feature Selection and Shot Change Detection Method Using the Neural Networks (강인한 특징 변수 선별과 신경망을 이용한 장면 전환점 검출 기법)

  • Hong, Seung-Bum;Hong, Gyo-Young
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.7
    • /
    • pp.877-885
    • /
    • 2004
  • In this paper, we propose an enhancement shot change detection method using the neural net and the robust feature selection out of multiple features. The previous shot change detection methods usually used single feature and fixed threshold between consecutive frames. However, contents such as color, shape, background, and texture change simultaneously at shot change points in a video sequence. Therefore, in this paper, we detect the shot changes effectively using robust features, which are supplementary each other, rather than using single feature. In this paper, we use the typical CART (classification and regression tree) of data mining method to select the robust features, and the backpropagation neural net to determine the threshold of the each selected features. And to evaluation the performance of the robust feature selection, we compare the proposed method to the PCA(principal component analysis) method of the typical feature selection. According to the experimental result. it was revealed that the performance of our method had better that than the PCA method.

  • PDF

A Study on Occupancy Estimation Method of a Private Room Using IoT Sensor Data Based Decision Tree Algorithm (IoT 센서 데이터를 이용한 단위실의 재실추정을 위한 Decision Tree 알고리즘 성능분석)

  • Kim, Seok-Ho;Seo, Dong-Hyun
    • Journal of the Korean Solar Energy Society
    • /
    • v.37 no.2
    • /
    • pp.23-33
    • /
    • 2017
  • Accurate prediction of stochastic behavior of occupants is a well known problem for improving prediction performance of building energy use. Many researchers have been tried various sensors that have information on the status of occupant such as $CO_2$ sensor, infrared motion detector, RFID etc. to predict occupants, while others have been developed some algorithm to find occupancy probability with those sensors or some indirect monitoring data such as energy consumption in spaces. In this research, various sensor data and energy consumption data are utilized for decision tree algorithms (C4.5 & CART) for estimation of sub-hourly occupancy status. Although the experiment is limited by space (private room) and period (cooling season), the prediction result shows good agreement of above 95% accuracy when energy consumption data are used instead of measured $CO_2$ value. This result indicates potential of IoT data for awareness of indoor environmental status.

A Study on Approximation Model for Optimal Predicting Model of Industrial Accidents (산업재해의 최적 예측모형을 위한 근사모형에 관한 연구)

  • Leem, Young-Moon;Ryu, Chang-Hyun
    • Journal of the Korea Safety Management & Science
    • /
    • v.8 no.3
    • /
    • pp.1-9
    • /
    • 2006
  • Recently data mining techniques have been used for analysis and classification of data related to industrial accidents. The main objective of this study is to compare algorithms for data analysis of industrial accidents and this paper provides an optimal predicting model of 5 kinds of algorithms including CHAID, CART, C4.5, LR (Logistic Regression) and NN (Neural Network) with ROC chart, lift chart and response threshold. Also, this paper provides an approximation model for an optimal predicting model based on NN. The approximation model provided in this study can be utilized for easy interpretation of data analysis using NN. This study uses selected ten independent variables to group injured people according to a dependent variable in a way that reduces variation. In order to find an optimal predicting model among 5 algorithms, a retrospective analysis was performed in 67,278 subjects. The sample for this work chosen from data related to industrial accidents during three years ($2002\;{\sim}\;2004$) in korea. According to the result analysis, NN has excellent performance for data analysis and classification of industrial accidents.

Selection of an Optimal Algorithm among Decision Tree Techniques for Feature Analysis of Industrial Accidents in Construction Industries (건설업의 산업재해 특성분석을 위한 의사결정나무 기법의 상용 최적 알고리즘 선정)

  • Leem Young-Moon;Choi Yo-Han
    • Journal of the Korea Safety Management & Science
    • /
    • v.7 no.5
    • /
    • pp.1-8
    • /
    • 2005
  • The consequences of rapid industrial advancement, diversified types of business and unexpected industrial accidents have caused a lot of damage to many unspecified persons both in a human way and a material way Although various previous studies have been analyzed to prevent industrial accidents, these studies only provide managerial and educational policies using frequency analysis and comparative analysis based on data from past industrial accidents. The main objective of this study is to find an optimal algorithm for data analysis of industrial accidents and this paper provides a comparative analysis of 4 kinds of algorithms including CHAID, CART, C4.5, and QUEST. Decision tree algorithm is utilized to predict results using objective and quantified data as a typical technique of data mining. Enterprise Miner of SAS and AnswerTree of SPSS will be used to evaluate the validity of the results of the four algorithms. The sample for this work chosen from 19,574 data related to construction industries during three years ($2002\sim2004$) in Korea.

Empirical Study on the Risk Analysis of Young Driver Utilizing Integrated Data Base(DB) (통합DB를 활용한 청년운전자의 위험도 실증분석)

  • Kim, Tae-Ho;Lee, Soo-Il;Choe, Byong-Ho
    • Journal of the Korean Society of Safety
    • /
    • v.27 no.5
    • /
    • pp.203-210
    • /
    • 2012
  • Traffic accident risk of young drivers(less than 25) is reported to have 8 times as high as that of middle aged drivers(between 30 and 49). Despite the rise of traffic accident risk, few have been attempted to take a look into driving characteristics of young drivers. The purpose of this paper is to analyze age-specific risks of young driver by means of database of insurance and vehicle inspection, thereby collecting data such as age, vehicle mileage, injuries and so on. We conducted Data-Mining(CART) and Portfolio analysis according to age groups(every 10 years). The conclusions which can be drawn from this empirical study are as follows: (1) Despite the fact that young drivers have low vehicle mileage, the rate of fatality is relatively high. (2) Being concerned of vehicle mileage, 24,000km of driving experience is thought to be critical in differing in fatality rate. Having annual average mileage fewer than 24,169 km, accident frequency is relatively lower than that exceeding 24,169 km(1,571 cases). Backed upon these, some recommendations about driver's license system for young driver to improve are given.

Market Segmentation of Patient-Utilization in Oriental Medical Care and Western Medical Care (양.한방 의료서비스 이용환자의 시장 세분화에 관한 연구)

  • 이선희;조희숙;최은영;최귀선;채유미
    • Health Policy and Management
    • /
    • v.12 no.1
    • /
    • pp.125-143
    • /
    • 2002
  • The objectives of this study were analysis of patient\`s characteristics and market segmentation in oriental medical care and western medical care. This study focused on medical utilization using Anderson's health utilization model. The source of data was 1998 National Health and Nutrition Survey which Korean Institute For Health and Social Affairs carried out. A stratified multistage probability sampling design was used in this survey. The analysis was conducted using the statistical software package SPSS version 10.0 and Answer Tree 2.1 which is one of data mining methodology. The results were as follows ; 1) 44.9% of respondents reported visiting oriental medical center within recent two weeks. 3.4% of them used oriental medical care. The group of age, kind of disease and medical expenditure are associated with the difference western and oriental medical utilization rate. 2) There were several factors related to utilization of oriental medical care according to decision tree. Especially, important factors that patient chose his medical center were kinds of disease, kinds of common medical use, and expenditure. 3) in the results of CART analysis, market of oriental medical care were classified by seven categories. The major groups who have a preference for oriental medicine were those musculo-skeletal, cerebra-vascular disease, or chronic headache patients, and they had a preference fur oriental medical care in common use. These results show that oriental and western medical market were divided into various areas by market segmentation.

Optimization of Decision Tree for Classification Using a Particle Swarm

  • Cho, Yun-Ju;Lee, Hye-Seon;Jun, Chi-Hyuck
    • Industrial Engineering and Management Systems
    • /
    • v.10 no.4
    • /
    • pp.272-278
    • /
    • 2011
  • Decision tree as a classification tool is being used successfully in many areas such as medical diagnosis, customer churn prediction, signal detection and so on. The main advantage of decision tree classifiers is their capability to break down a complex structure into a collection of simpler structures, thus providing a solution that is easy to interpret. Since decision tree is a top-down algorithm using a divide and conquer induction process, there is a risk of reaching a local optimal solution. This paper proposes a procedure of optimally determining thresholds of the chosen variables for a decision tree using an adaptive particle swarm optimization (APSO). The proposed algorithm consists of two phases. First, we construct a decision tree and choose the relevant variables. Second, we find the optimum thresholds simultaneously using an APSO for those selected variables. To validate the proposed algorithm, several artificial and real datasets are used. We compare our results with the original CART results and show that the proposed algorithm is promising for improving prediction accuracy.