• Title/Summary/Keyword: Decision Tree analysis

Search Result 736, Processing Time 0.031 seconds

Evaluation of Suitable REDD+ Sites Based on Multiple-Criteria Decision Analysis (MCDA): A Case Study of Myanmar

  • Park, Jeongmook;Sim, Woodam;Lee, Jungsoo
    • Journal of Forest and Environmental Science
    • /
    • v.34 no.6
    • /
    • pp.461-471
    • /
    • 2018
  • In this study, the deforestation and forest degradation areas have been obtained in Myanmar using a land cover lamp (LCM) and a tree cover map (TCM) to get the $CO_2$ potential reduction and the strength of occurrence was evaluated by using the geostatistical technique. By applying a multiple criteria decision-making method to the regions having high strength of occurrence for the $CO_2$ potential reduction for the deforestation and forest degradation areas, the priority was selected for candidate lands for REDD+ project. The areas of deforestation and forest degradation were 609,690ha and 43,515ha each from 2010 to 2015. By township, Mong Kung had the highest among the area of deforestation with 3,069ha while Thlangtlang had the highest in the area of forest degradation with 9,213 ha. The number of $CO_2$ potential reduction hotspot areas among the deforestation areas was 15, taking up the $CO_2$ potential reduction of 192,000 ton in average, which is 6 times higher than that of all target areas. Especially, the township of Hsipaw inside the Shan region had a $CO_2$ potential reduction of about 772,000 tons, the largest reduction potential among the hotpot areas. There were many $CO_2$ potential reduction hot spot areas among the forest degradation area in the eastern part of the target region and has the $CO_2$ potential reduction of 1,164,000 tons, which was 27 times higher than that of the total area. AHP importance analysis showed that the topographic characteristic was 0.41 (0.40 for height from surface, 0.29 for the slope and 0.31 for the distance from water area) while the geographical characteristic was 0.59 (0.56 for the distance from road, 0.56 for the distance from settlement area and 0.19 for the distance from Capital). Yawunghwe, Kalaw, and Hsi Hseng were selected as the preferred locations for the REDD+ candidate region for the deforestation area while Einme, Tiddim, and Falam were selected as the preferred locations for the forest degradation area.

Optimal Selection of Classifier Ensemble Using Genetic Algorithms (유전자 알고리즘을 이용한 분류자 앙상블의 최적 선택)

  • Kim, Myung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.99-112
    • /
    • 2010
  • Ensemble learning is a method for improving the performance of classification and prediction algorithms. It is a method for finding a highly accurateclassifier on the training set by constructing and combining an ensemble of weak classifiers, each of which needs only to be moderately accurate on the training set. Ensemble learning has received considerable attention from machine learning and artificial intelligence fields because of its remarkable performance improvement and flexible integration with the traditional learning algorithms such as decision tree (DT), neural networks (NN), and SVM, etc. In those researches, all of DT ensemble studies have demonstrated impressive improvements in the generalization behavior of DT, while NN and SVM ensemble studies have not shown remarkable performance as shown in DT ensembles. Recently, several works have reported that the performance of ensemble can be degraded where multiple classifiers of an ensemble are highly correlated with, and thereby result in multicollinearity problem, which leads to performance degradation of the ensemble. They have also proposed the differentiated learning strategies to cope with performance degradation problem. Hansen and Salamon (1990) insisted that it is necessary and sufficient for the performance enhancement of an ensemble that the ensemble should contain diverse classifiers. Breiman (1996) explored that ensemble learning can increase the performance of unstable learning algorithms, but does not show remarkable performance improvement on stable learning algorithms. Unstable learning algorithms such as decision tree learners are sensitive to the change of the training data, and thus small changes in the training data can yield large changes in the generated classifiers. Therefore, ensemble with unstable learning algorithms can guarantee some diversity among the classifiers. To the contrary, stable learning algorithms such as NN and SVM generate similar classifiers in spite of small changes of the training data, and thus the correlation among the resulting classifiers is very high. This high correlation results in multicollinearity problem, which leads to performance degradation of the ensemble. Kim,s work (2009) showedthe performance comparison in bankruptcy prediction on Korea firms using tradition prediction algorithms such as NN, DT, and SVM. It reports that stable learning algorithms such as NN and SVM have higher predictability than the unstable DT. Meanwhile, with respect to their ensemble learning, DT ensemble shows the more improved performance than NN and SVM ensemble. Further analysis with variance inflation factor (VIF) analysis empirically proves that performance degradation of ensemble is due to multicollinearity problem. It also proposes that optimization of ensemble is needed to cope with such a problem. This paper proposes a hybrid system for coverage optimization of NN ensemble (CO-NN) in order to improve the performance of NN ensemble. Coverage optimization is a technique of choosing a sub-ensemble from an original ensemble to guarantee the diversity of classifiers in coverage optimization process. CO-NN uses GA which has been widely used for various optimization problems to deal with the coverage optimization problem. The GA chromosomes for the coverage optimization are encoded into binary strings, each bit of which indicates individual classifier. The fitness function is defined as maximization of error reduction and a constraint of variance inflation factor (VIF), which is one of the generally used methods to measure multicollinearity, is added to insure the diversity of classifiers by removing high correlation among the classifiers. We use Microsoft Excel and the GAs software package called Evolver. Experiments on company failure prediction have shown that CO-NN is effectively applied in the stable performance enhancement of NNensembles through the choice of classifiers by considering the correlations of the ensemble. The classifiers which have the potential multicollinearity problem are removed by the coverage optimization process of CO-NN and thereby CO-NN has shown higher performance than a single NN classifier and NN ensemble at 1% significance level, and DT ensemble at 5% significance level. However, there remain further research issues. First, decision optimization process to find optimal combination function should be considered in further research. Secondly, various learning strategies to deal with data noise should be introduced in more advanced further researches in the future.

Interpreting Bounded Rationality in Business and Industrial Marketing Contexts: Executive Training Case Studies (집행관배훈안례연구(阐述工商业背景下的有限合理性):집행관배훈안례연구(执行官培训案例研究))

  • Woodside, Arch G.;Lai, Wen-Hsiang;Kim, Kyung-Hoon;Jung, Deuk-Keyo
    • Journal of Global Scholars of Marketing Science
    • /
    • v.19 no.3
    • /
    • pp.49-61
    • /
    • 2009
  • This article provides training exercises for executives into interpreting subroutine maps of executives' thinking in processing business and industrial marketing problems and opportunities. This study builds on premises that Schank proposes about learning and teaching including (1) learning occurs by experiencing and the best instruction offers learners opportunities to distill their knowledge and skills from interactive stories in the form of goal.based scenarios, team projects, and understanding stories from experts. Also, (2) telling does not lead to learning because learning requires action-training environments should emphasize active engagement with stories, cases, and projects. Each training case study includes executive exposure to decision system analysis (DSA). The training case requires the executive to write a "Briefing Report" of a DSA map. Instructions to the executive trainee in writing the briefing report include coverage in the briefing report of (1) details of the essence of the DSA map and (2) a statement of warnings and opportunities that the executive map reader interprets within the DSA map. The length maximum for a briefing report is 500 words-an arbitrary rule that works well in executive training programs. Following this introduction, section two of the article briefly summarizes relevant literature on how humans think within contexts in response to problems and opportunities. Section three illustrates the creation and interpreting of DSA maps using a training exercise in pricing a chemical product to different OEM (original equipment manufacturer) customers. Section four presents a training exercise in pricing decisions by a petroleum manufacturing firm. Section five presents a training exercise in marketing strategies by an office furniture distributer along with buying strategies by business customers. Each of the three training exercises is based on research into information processing and decision making of executives operating in marketing contexts. Section six concludes the article with suggestions for use of this training case and for developing additional training cases for honing executives' decision-making skills. Todd and Gigerenzer propose that humans use simple heuristics because they enable adaptive behavior by exploiting the structure of information in natural decision environments. "Simplicity is a virtue, rather than a curse". Bounded rationality theorists emphasize the centrality of Simon's proposition, "Human rational behavior is shaped by a scissors whose blades are the structure of the task environments and the computational capabilities of the actor". Gigerenzer's view is relevant to Simon's environmental blade and to the environmental structures in the three cases in this article, "The term environment, here, does not refer to a description of the total physical and biological environment, but only to that part important to an organism, given its needs and goals." The present article directs attention to research that combines reports on the structure of task environments with the use of adaptive toolbox heuristics of actors. The DSA mapping approach here concerns the match between strategy and an environment-the development and understanding of ecological rationality theory. Aspiration adaptation theory is central to this approach. Aspiration adaptation theory models decision making as a multi-goal problem without aggregation of the goals into a complete preference order over all decision alternatives. The three case studies in this article permit the learner to apply propositions in aspiration level rules in reaching a decision. Aspiration adaptation takes the form of a sequence of adjustment steps. An adjustment step shifts the current aspiration level to a neighboring point on an aspiration grid by a change in only one goal variable. An upward adjustment step is an increase and a downward adjustment step is a decrease of a goal variable. Creating and using aspiration adaptation levels is integral to bounded rationality theory. The present article increases understanding and expertise of both aspiration adaptation and bounded rationality theories by providing learner experiences and practice in using propositions in both theories. Practice in ranking CTSs and writing TOP gists from DSA maps serves to clarify and deepen Selten's view, "Clearly, aspiration adaptation must enter the picture as an integrated part of the search for a solution." The body of "direct research" by Mintzberg, Gladwin's ethnographic decision tree modeling, and Huff's work on mapping strategic thought are suggestions on where to look for research that considers both the structure of the environment and the computational capabilities of the actors making decisions in these environments. Such research on bounded rationality permits both further development of theory in how and why decisions are made in real life and the development of learning exercises in the use of heuristics occurring in natural environments. The exercises in the present article encourage learning skills and principles of using fast and frugal heuristics in contexts of their intended use. The exercises respond to Schank's wisdom, "In a deep sense, education isn't about knowledge or getting students to know what has happened. It is about getting them to feel what has happened. This is not easy to do. Education, as it is in schools today, is emotionless. This is a huge problem." The three cases and accompanying set of exercise questions adhere to Schank's view, "Processes are best taught by actually engaging in them, which can often mean, for mental processing, active discussion."

  • PDF

Monetary policy synchronization of Korea and United States reflected in the statements (통화정책 결정문에 나타난 한미 통화정책 동조화 현상 분석)

  • Chang, Youngjae
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.1
    • /
    • pp.115-126
    • /
    • 2021
  • Central banks communicate with the market through a statement on the direction of monetary policy while implementing monetary policy. The rapid contraction of the global economy due to the recent Covid-19 pandemic could be compared to the crisis situation during the 2008 global financial crisis. In this paper, we analyzed the text data from the monetary policy statements of the Bank of Korea and Fed reflecting monetary policy directions focusing on how they were affected in the face of a global crisis. For analysis, we collected the text data of the two countries' monetary policy direction reports published from October 1999 to September 2020. We examined the semantic features using word cloud and word embedding, and analyzed the trend of the similarity between two countries' documents through a piecewise regression tree model. The visualization result shows that both the Bank of Korea and the US Fed have published the statements with refined words of clear meaning for transparent and effective communication with the market. The analysis of the dissimilarity trend of documents in both countries also shows that there exists a sense of synchronization between them as the rapid changes in the global economic environment affect monetary policy.

Bounds of PIM-based similarity measures with partially marginal proportion (부분적 주변 비율에 의한 확률적 흥미도 측도 기반 유사성 측도의 상한 및 하한의 설정)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.4
    • /
    • pp.857-864
    • /
    • 2015
  • By Wikipedia, data mining is the computational process of discovering patterns in huge data sets involving methods at the intersection of association rule, decision tree, clustering, artificial intelligence, machine learning. Clustering or cluster analysis is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups. The similarity measures being used in the clustering may be classified into various types depending on the characteristics of data. In this paper, we computed bounds for similarity measures based on the probabilistic interestingness measure with partially marginal probability such as Peirce I, Peirce II, Cole I, Cole II, Loevinger, Park I, and Park II measure. We confirmed the absolute value of Loevinger measure wasthe upper limit of the absolute value of any other existing measures. Ordering of other measures is determined by the size of concurrence proportion, non-simultaneous occurrence proportion, and mismatch proportion.

A Study on the Machine Learning Model for Product Faulty Prediction in Internet of Things Environment (사물인터넷 환경에서 제품 불량 예측을 위한 기계 학습 모델에 관한 연구)

  • Ku, Jin-Hee
    • Journal of Convergence for Information Technology
    • /
    • v.7 no.1
    • /
    • pp.55-60
    • /
    • 2017
  • In order to provide intelligent services without human intervention in the Internet of Things environment, it is necessary to analyze the big data generated by the IoT device and learn the normal pattern, and to predict the abnormal symptoms such as faulty or malfunction based on the learned normal pattern. The purpose of this study is to implement a machine learning model that can predict product failure by analyzing big data generated in various devices of product process. The machine learning model uses the big data analysis tool R because it needs to analyze based on existing data with a large volume. The data collected in the product process include the information about product faulty, so supervised learning model is used. As a result of the study, I classify the variables and variable conditions affecting the product failure, and proposed a prediction model for the product failure based on the decision tree. In addition, the predictive power of the model was significantly higher in the conformity and performance evaluation analysis of the model using the ROC curve.

FAFS: A Fuzzy Association Feature Selection Method for Network Malicious Traffic Detection

  • Feng, Yongxin;Kang, Yingyun;Zhang, Hao;Zhang, Wenbo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.1
    • /
    • pp.240-259
    • /
    • 2020
  • Analyzing network traffic is the basis of dealing with network security issues. Most of the network security systems depend on the feature selection of network traffic data and the detection ability of malicious traffic in network can be improved by the correct method of feature selection. An FAFS method, which is short for Fuzzy Association Feature Selection method, is proposed in this paper for network malicious traffic detection. Association rules, which can reflect the relationship among different characteristic attributes of network traffic data, are mined by association analysis. The membership value of association rules are obtained by the calculation of fuzzy reasoning. The data features with the highest correlation intensity in network data sets are calculated by comparing the membership values in association rules. The dimension of data features are reduced and the detection ability of malicious traffic detection algorithm in network is improved by FAFS method. To verify the effect of malicious traffic feature selection by FAFS method, FAFS method is used to select data features of different dataset in this paper. Then, K-Nearest Neighbor algorithm, C4.5 Decision Tree algorithm and Naïve Bayes algorithm are used to test on the dataset above. Moreover, FAFS method is also compared with classical feature selection methods. The analysis of experimental results show that the precision and recall rate of malicious traffic detection in the network can be significantly improved by FAFS method, which provides a valuable reference for the establishment of network security system.

ISAG-recommended Microsatellite Marker Analysis Among Five Korean Native Chicken Lines

  • Choi, Nu-Ri;Hoque, Md. Rashedul;Seo, Dong-Won;Sultana, Hasina;Park, Hee-Bok;Lim, Hyun-Tae;Heo, Kang-Nyeong;Kang, Bo-Seok;Jo, Cheorun;Lee, Jun-Heon
    • Journal of Animal Science and Technology
    • /
    • v.54 no.6
    • /
    • pp.401-409
    • /
    • 2012
  • The objective of this study was to determine genetic variation of five Korean native chicken lines using 30 microsatellite (MS) markers, which were previously recommended by ISAG (International Society for Animal Genetics). The initial study indicated that two microsatellite markers, MCW0284 and LEI0192, were not amplified in these lines and excluded for further analysis. Twenty eight microsatellite markers were investigated in 83 birds from five Korean native chicken lines. The identified mean number of alleles was 4.57. Also, the expected, observed heterozygosity (He, Ho) and polymorphism information content (PIC) values were estimated in these markers and they ranged from 0.31~0.868, 0.145~0.699, and 0.268~0.847, respectively. The results were used for the discrimination of five chicken lines using genetic distance values and also neighbor-joining phylogenetic tree was constructed. Based on the He and PIC values, eighteen markers are enough for the discrimination of these Korean native chicken lines for the expected probability of identity values among genotypes of random individuals (PI), random half sibs ($PI_{half-sibs}$) and random sibs ($PI_{sibs}$). Taken together, these results will help the decision of conservation strategies and establishment of traceability system in this native chicken breed. Also, the use of ISAG-recommended microsatellite markers may indicate that the global comparison with other chicken breeds is possible.

A Classifier for the association study between SNPs and quantitative traits (SNP와 양적 표현형의 연관성 분석을 위한 분류기)

  • Uhmn, Saangyong;Lee, Kwang Mo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.11
    • /
    • pp.141-148
    • /
    • 2012
  • The advance of technologies for human genome makes it possible that the analysis of association between genetic variants and diseases and the application of the results to predict risk or susceptibility to them. Many of those studies carried out in case-control study. For quantitative traits, statistical analysis methods are applied to find single nucleotide polymorphisms (SNP) relevant to the diseases and consider them one by one. In this study, we presented methods to select informative single nucleotide polymorphisms and predict risk for quantitative traits and compared their performance. We adopted two SNP selection methods: one considering single SNP only and the other of all possible pairs of SNPs.

A Study on Injury Severity Prediction for Car-to-Car Traffic Accidents (차대차 교통사고에 대한 상해 심각도 예측 연구)

  • Ko, Changwan;Kim, Hyeonmin;Jeong, Young-Seon;Kim, Jaehee
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.19 no.4
    • /
    • pp.13-29
    • /
    • 2020
  • Automobiles have long been an essential part of daily life, but the social costs of car traffic accidents exceed 9% of the national budget of Korea. Hence, it is necessary to establish prevention and response system for car traffic accidents. In order to present a model that can classify and predict the degree of injury in car traffic accidents, we used big data analysis techniques of K-nearest neighbor, logistic regression analysis, naive bayes classifier, decision tree, and ensemble algorithm. The performances of the models were analyzed by using the data on the nationwide traffic accidents over the past three years. In particular, considering the difference in the number of data among the respective injury severity levels, we used down-sampling methods for the group with a large number of samples to enhance the accuracy of the classification of the models and then verified the statistical significance of the models using ANOVA.