• Title/Summary/Keyword: C5.0 나무 모형

Search Result 19, Processing Time 0.021 seconds

Design and Evaluation of ANFIS-based Classification Model (ANFIS 기반 분류모형의 설계 및 성능평가)

  • Song, Hee-Seok;Kim, Jae-Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.15 no.3
    • /
    • pp.151-165
    • /
    • 2009
  • Fuzzy neural network is an integrated model of artificial neural network and fuzzy system and it has been successfully applied in control and forecasting area. Recently ANFIS(Adaptive Network-based Fuzzy Inference System) has been noticed widely among various fuzzy neural network models because of its outstanding accuracy of control and forecasting area. We design a new classification model based on ANFIS and evaluate it in terms of classification accuracy. We identified ANFIS-based classification model has higher classification accuracy compared to existing classification model, C5.0 decision tree model by comparing their experimental results.

  • PDF

Major gene identification for FASN gene in Korean cattles by data mining (데이터마이닝을 이용한 한우의 우수 지방산합성효소 유전자 조합 선별)

  • Kim, Byung-Doo;Kim, Hyun-Ji;Lee, Seong-Won;Lee, Jea-Young
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.6
    • /
    • pp.1385-1395
    • /
    • 2014
  • Economic traits of livestock are affected by environmental factors and genetic factors. In addition, it is not affected by one gene, but is affected by interaction of genes. We used a linear regression model in order to adjust environmental factors. And, in order to identify gene-gene interaction effect, we applied data mining techniques such as neural network, logistic regression, CART and C5.0 using five-SNPs (single nucleotide polymorphism) of FASN (fatty acid synthase). We divided total data into training (60%) and testing (40%) data, and applied the model which was designed by training data to testing data. By the comparison of prediction accuracy, C5.0 was identified as the best model. It were selected superior genotype using the decision tree.

Biomass and Net Primary Productivity in Natural Forests of Quercus mongolica and Quercus variabilis (신갈나무와 굴참나무 천연림(天然林) 생태계(生態系)의 현존량(現存量) 및 물질(物質) 생산성(生産性)에 관한 연구)

  • Song, Cheel Young;Lee, Soo Wook
    • Journal of Korean Society of Forest Science
    • /
    • v.85 no.3
    • /
    • pp.443-452
    • /
    • 1996
  • A study has been made to estimate biomass and NPP based on equation form of $Wt=aD^bH^c$ for Quercus variabilis and Quercus mongolica natural stands(Mean age; 67, 62yrs old) in Chungju. Equation form of $Wt=aD^bH^c$ was more adequate than $Wt=a(D^2H)^b$ and $Wt=aD^b$ for the estimation of the biomass and NPP. Individual biomass was compared using a paired t-test by tree component which showed no significant differences. Total aboveground biomass of Quercus mongolica was 130.6 t/ha and that of Quercus variabilis was 137.4 t/ha. Biomass of Q. mongolica was composed of foliage 5.1 t/ha(3.9%), dead branch 3.5 t/ha(2.7%), live branch 29.7 t/ha(23.0%), bolebark 16.2 t/ha(12.5%), and bolewood 74.9 t/ha(58.0%), and that of Q. variabilis was composed of foliage 3.8 t/ha(2.9%), dead branch 2.9 t/ha(2.2%), live branch 24.3 t/ha(18.4%), bolebark 20.4 t/ha(15.5%), and bolewood 80.4 t/ha(61.0%). Net primary production was 10.0 t/ha/yr in the Q. mongolica stand and 8.6 t/ha/yr in the Q. variabilis stand, respectively. Net primary production of Quercus forest in Chungju was very close to the mean NPP of the broadleaved forest of temperate zone.

  • PDF

Identification of major risk factors association with respiratory diseases by data mining (데이터마이닝 모형을 활용한 호흡기질환의 주요인 선별)

  • Lee, Jea-Young;Kim, Hyun-Ji
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.2
    • /
    • pp.373-384
    • /
    • 2014
  • Data mining is to clarify pattern or correlation of mass data of complicated structure and to predict the diverse outcomes. This technique is used in the fields of finance, telecommunication, circulation, medicine and so on. In this paper, we selected risk factors of respiratory diseases in the field of medicine. The data we used was divided into respiratory diseases group and health group from the Gyeongsangbuk-do database of Community Health Survey conducted in 2012. In order to select major risk factors, we applied data mining techniques such as neural network, logistic regression, Bayesian network, C5.0 and CART. We divided total data into training and testing data, and applied model which was designed by training data to testing data. By the comparison of prediction accuracy, CART was identified as best model. Depression, smoking and stress were proved as the major risk factors of respiratory disease.

Verification Test of High-activity SMEs Using Technology Appraisal Items (기술력 평가항목을 이용한 고활동성 중소기업 판별)

  • Lee, Jun-won
    • Journal of Technology Innovation
    • /
    • v.28 no.1
    • /
    • pp.31-52
    • /
    • 2020
  • This study was started to verify the preliminary(Ex-ante) discrimination power of the firm's high-activity using the 'Forward-looking' oriented technology appraisal model used in technology financing. The analytical firms are classified into the industry (manufacturing / non-manufacturing) and the age of company (initial / non-initial). High-activity SMEs are defined as those that achieve at least twice the average asset turnover ratio of the cluster. As a result of the discriminant model by applying C5.0 method, which is one of decision tree models, classification accuracy is more than 99% in all industries and the age of company, and it is confirmed that the discriminant power of the model is stable. As a result, the management expertise, capital involvement and funding capacity items were identified as a critical variable for the high-activity SMEs. In addition, the technology management capability and technology life cycle were also confirmed to be the items to determine high-activity SMEs in the manufacturing industry. Through this, it was possible to confirm some possibility of prior discrimination and policy utilization of high-activity SMEs by using technology appraisal items.

An Empirical Model for the Prediction of the Onset of Upward-Movement of Overwintered Caccopsylla pyricola (Homoptera: Psyllidae) in Pear Orchards (배과원에서 꼬마배나무이 월동성충의 수상 이동시기 예측 모형)

  • Kim, Dong-Soon;Yang, Chang-Yeol;Jeon, Heung-Yong
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.9 no.4
    • /
    • pp.228-233
    • /
    • 2007
  • Pear psylla, Caccopsylla pyricola (Homoptera: Psyllidae), is a serious insect pest in pear orchards. C. pyricola overwinters as adults under rough bark scales of pear trees. When the weather warms up in the spring, the overwintered adults become active, climb up to the tree branches, and inhabit on fruit twigs to lay eggs. This study was conducted to develop a forecasting model for the onset of upward-movement of overwintered C. pyricola adults to control them by timely spraying of petroleum oil. The adult population densities were observed under rough barks (B) and on fruit twigs (T) of pear trees. Relative upward-movement rates (R) were calculated as T/(B+T). Low threshold temperatures for the activation of overwintered C. pyricola adults were selected arbitrarily from 5 to $9^{\circ}C$ at a $1^{\circ}C$ interval. Then, the days (D) when daily maximum air temperatures were above each low threshold temperature were counted from 1 February until to the dates with R $\geq$ 0.8. The same methods were applied for the prediction of the first observation of eggs. The variation of coefficients (CV) for the mean Des were lowest with the low threshold temperature of $6^{\circ}C$. At this selected threshold temperature, the upward movement of C. pyricola adults occurred with 12 D and they started laying eggs with 25 D. In the field validation, the model outputs with the $6^{\circ}C$ threshold temperature reasonably well explained the observed data in Suwon and Cheonan in 2002. Practical usages of the model were also discussed.

Churn Analysis for the First Successful Candidates in the Entrance Examination for K University

  • Kim, Kyu-Il;Kim, Seung-Han;Kim, Eun-Young;Kim, Hyun;Yang, Jae-Wan;Cho, Jang-Sik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.1
    • /
    • pp.1-10
    • /
    • 2007
  • In this paper, we focus on churn analysis for the first successful candidates in the entrance examination on 2006 year using Clementine, data mining tool. The goal of this study is to apply decision tree including C5.0 and CART algorithms, neural network and logistic regression techniques to predict a successful candidate churn. And we analyze the churning and nochurning successful candidates and why the successful candidates churn and which successful candidates are most likely to churn in the future using data from entrance examination data of K university on 2006 year.

  • PDF

Estimating the Change of Potential Forest Distribution and Carton Stock by Climate Changes - Focused on Forest in Yongin-City - (기후변화에 따른 임상분포 변화 및 탄소저장량 예측 - 용인시 산림을 기반으로 -)

  • Jeong, Hyeon yong;Lee, Woo-Kyun;Nam, Kijun;Kim, Moonil
    • Journal of Climate Change Research
    • /
    • v.4 no.2
    • /
    • pp.177-188
    • /
    • 2013
  • In this research, forest cover distribution change, forest volume and carbon stock in Yongin-city, Gyeonggi procince were estimated focused on the forest of Yongin-City using forest type map and HyTAG model in relation to climate change. Present forest volume of Yongin-city was estimated using the data from $5^{th}$ Forest Type Map and Korean National Forest Inventory (NFI). And for the future 100 years potential forest distribution by 10-year interval were estimated using HyTAG model. Forest volume was also calculated using algebraic differences form of the growth model. According to the $5^{th}$ Forest Type Map, present needleleaf forest occupied 37.8% and broadleaf forest 62.2% of forest area. And the forest cover distribution after 30 years would be changed to 0.13% of needleleaf forest and 99.97% of broadleaf forest. Finally, 60 years later, whole forest of Yongin-city would be covered by broad-leaf forest. Also the current forest carbon stocks was measured 1,773,862 tC(56.79 tC/ha) and future carbon stocks after 50 years was predicted to 4,432,351 tC(141.90 tC/ha) by HyTAG model. The carbon stocks after 100 years later was 6,884,063 tC (220.40 tC/ha). According to the HyTAG model prediction, Pinus koraiensis, Larix kaempferi, Pinus rigida, and Pinus densiflora are not suitable to the future climate of 10-year, 30-year, 30-year, and 50-year later respectively. All Quercus spp. was predicted to be suitable to the future climate.

A Study on Weighting Cells by Survey Methods for Social Surveys: Telephone, Internet and Mobile Surveys (사회조사에서 조사방법에 따른 가중 칸 설정에 관한 연구: 전화조사, 인터넷 조사, 모바일 조사)

  • 허명회;강용수;손은진
    • Survey Research
    • /
    • v.5 no.1
    • /
    • pp.1-26
    • /
    • 2004
  • The aim of this study lies in answering the question "How to form weighting cells to enhance sample representativeness in telephone, Internet and mobile surveys\ulcorner". For this, we explored 2% raw data of Year 2000 Population and Housing Census of Korea looking for meaningful patterns for ownership of telephones, the usage of Internet and/or mobile phones. We found that telephone coverage rates vary significantly by household size; 84.6% for one member households, contrasting 98.5% for two-or-more member households. Thus, telephone survey samples need to be weighted differently in sub-groups by household size for proportional representation of target population. Searching socio-demographic factors influencing the use of Internet by C5.0 tree models, we found that education levels and the occupation (or housing type, the automobile ownership) are two most important factors in addition to gender and age. Thus, surveyor might form weighting cells by such factors at the stage of post-stratification or set quotas, a priori, proportional to size of the cells by such factors. For mobile surveys, we approached similarly and found that education levels and the occupation (or the automobile ownership, marriage status) are two additional factors that may be used in forming weighing cells or in setting quotas for cells.

  • PDF

Prediction of Slope Hazard Probability around Express Way using Decision Tree Model (의사결정나무모형을 이용한 고속도로 주변 급경사지재해 발생가능성 예측)

  • Kim, Chan-Kee;Bak, Gueon Jun;Kim, Joong Chul;Song, Young-Suk;Yun, Jung-Mann
    • Journal of the Korean Geosynthetics Society
    • /
    • v.12 no.2
    • /
    • pp.67-74
    • /
    • 2013
  • In this study, the prediction of slope hazard probability was performed to the study area located in Hadae-ri, Woochun-myeon, Hoengsung-gun, Gangwon Province around Youngdong express way using the computer program SHAPP ver 1.0 developed by a decision tree model. The soil samples were collected at total 10 points, and soil tests were performed to measure soil properties. The thematic maps of soil properties such as coefficient of permeability and void ratio were made on the basis of soil test results. The slope angle analysis of topography was performed using a digital map. As the prediction result of slope hazard probability, 2,120 cells among total 27,776 cells were predicted to be in the event of slope hazards. Therefore, the predicted area of occurring slope hazards may be $53,000m^2$ because the analyzed cell size was $5m{\times}5m$.