• Title/Summary/Keyword: Decision Tree analysis

Search Result 729, Processing Time 0.032 seconds

Selection of the Strategic R&D Field Satisfying SMEs' Specific Needs by Technology Relevance/Cluster Analysis (기술연관분석을 통한 중소기업형 전략적 기술개발과제의 우선순위 도출)

  • 고병열;홍정진;손종구;박영서
    • Journal of Korea Technology Innovation Society
    • /
    • v.6 no.3
    • /
    • pp.373-390
    • /
    • 2003
  • With limited resources, proper allocation of the national R&D budget is very crucial matter for reinforcing the national competence, and the importance of selecting strategic R&D fields have been increasingly emphasized by technology policy-makers and CTOs. This paper deals with technology relevance/cluster analysis, which measures technological dependency and relevancy among technologies, and how it can be used for selecting the strategic R&D fields especially satisfying SMEs(small and medium enterprises)' specific needs. As a result of this study, technology-product tree composed of 7 major technology fields, 22 clusters, 41 groups, 335 core-need technologies and hundreds of related business items are produced that can be used for designing SMEs' R&D/business portfolio as well as R&D investment decision-making of the Ministry of Small and Medium Business Administration.

  • PDF

A Hybrid Genetic Algorithm for K-Means Clustering

  • Jun, Sung-Hae;Han, Jin-Woo;Park, Minjae;Oh, Kyung-Whan
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.330-333
    • /
    • 2003
  • Initial cluster size for clustering of partitioning methods is very important to the clustering result. In K-means algorithm, the result of cluster analysis becomes different with optimal cluster size K. Usually, the initial cluster size is determined by prior and subjective information. Sometimes this may not be optimal. Now, more objective method is needed to solve this problem. In our research, we propose a hybrid genetic algorithm, a tree induction based evolution algorithm, for determination of optimal cluster size. Initial population of this algorithm is determined by the number of terminal nodes of tree induction. From the initial population based on decision tree, our optimal cluster size is generated. The fitness function of ours is defined an inverse of dissimilarity measure. And the bagging approach is used for saying computational time cost.

  • PDF

The Difference Analysis between Maturity Stages of Venture Firms by Classification Techniques of Big Data (빅데이터 분류 기법에 따른 벤처 기업의 성장 단계별 차이 분석)

  • Jung, Byoungho
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.15 no.4
    • /
    • pp.197-212
    • /
    • 2019
  • The purpose of this study is to identify the maturity stages of venture firms through classification analysis, which is widely used as a big data technique. Venture companies should develop a competitive advantage in the market. And the maturity stage of a company can be classified into five stages. I will analyze a difference in the growth stage of venture firms between the survey response and the statistical classification methods. The firm growth level distinguished five stages and was divided into the period of start-up and declines. A classification method of big data uses popularly k-mean cluster analysis, hierarchical cluster analysis, artificial neural network, and decision tree analysis. I used variables that asset increase, capital increase, sales increase, operating profit increase, R&D investment increase, operation period and retirement number. The research results, each big data analysis technique showed a large difference of samples sized in the group. In particular, the decision tree and neural networks' methods were classified as three groups rather than five groups. The groups size of all classification analysis was all different by the big data analysis methods. Furthermore, according to the variables' selection and the sample size may be dissimilar results. Also, each classed group showed a number of competitive differences. The research implication is that an analysts need to interpret statistics through management theory in order to interpret classification of big data results correctly. In addition, the choice of classification analysis should be determined by considering not only management theory but also practical experience. Finally, the growth of venture firms needs to be examined by time-series analysis and closely monitored by individual firms. And, future research will need to include significant variables of the company's maturity stages.

A study on Natural Disaster Prediction Using Multi-Class Decision Forest

  • Eom, Tae-Hyuk;Kim, Kyung-A
    • Korean Journal of Artificial Intelligence
    • /
    • v.10 no.1
    • /
    • pp.1-7
    • /
    • 2022
  • In this paper, a study was conducted to predict natural disasters in Afghanistan based on machine learning. Natural disasters need to be prepared not only in Korea but also in other vulnerable countries. Every year in Afghanistan, natural disasters(snow, earthquake, drought, flood) cause property and casualties. We decided to conduct research on this phenomenon because we thought that the damage would be small if we were to prepare for it. The Azure Machine Learning Studio used in the study has the advantage of being more visible and easier to use than other Machine Learning tools. Decision Forest is a model for classifying into decision tree types. Decision forest enables intuitive analysis as a model that is easy to analyze results and presents key variables and separation criteria. Also, since it is a nonparametric model, it is free to assume (normality, independence, equal dispersion) required by the statistical model. Finally, linear/non-linear relationships can be searched considering interactions between variables. Therefore, the study used decision forest. The study found that overall accuracy was 89 percent and average accuracy was 97 percent. Although the results of the experiment showed a little high accuracy, items with low natural disaster frequency were less accurate due to lack of learning. By learning and complementing more data, overall accuracy can be improved, and damage can be reduced by predicting natural disasters.

Global Big Data Analysis Exploring the Determinants of Application Ratings: Evidence from the Google Play Store

  • Seo, Min-Kyo;Yang, Oh-Suk;Yang, Yoon-Ho
    • Journal of Korea Trade
    • /
    • v.24 no.7
    • /
    • pp.1-28
    • /
    • 2020
  • Purpose - This paper empirically investigates the predictors and main determinants of consumers' ratings of mobile applications in the Google Play Store. Using a linear and nonlinear model comparison to identify the function of users' review, in determining application rating across countries, this study estimates the direct effects of users' reviews on the application rating. In addition, extending our modelling into a sentimental analysis, this paper also aims to explore the effects of review polarity and subjectivity on the application rating, followed by an examination of the moderating effect of user reviews on the polarity-rating and subjectivity-rating relationships. Design/methodology - Our empirical model considers nonlinear association as well as linear causality between features and targets. This study employs competing theoretical frameworks - multiple regression, decision-tree and neural network models - to identify the predictors and main determinants of app ratings, using data from the Google Play Store. Using a cross-validation method, our analysis investigates the direct and moderating effects of predictors and main determinants of application ratings in a global app market. Findings - The main findings of this study can be summarized as follows: the number of user's review is positively associated with the ratings of a given app and it positively moderates the polarity-rating relationship. Applying the review polarity measured by a sentimental analysis to the modelling, it was found that the polarity is not significantly associated with the rating. This result best applies to the function of both positive and negative reviews in playing a word-of-mouth role, as well as serving as a channel for communication, leading to product innovation. Originality/value - Applying a proxy measured by binomial figures, previous studies have predominantly focused on positive and negative sentiment in examining the determinants of app ratings, assuming that they are significantly associated. Given the constraints to measurement of sentiment in current research, this paper employs sentimental analysis to measure the real integer for users' polarity and subjectivity. This paper also seeks to compare the suitability of three distinct models - linear regression, decision-tree and neural network models. Although a comparison between methodologies has long been considered important to the empirical approach, it has hitherto been underexplored in studies on the app market.

A Pattern Analysis on the Possibility of Near Miss Connection in Construction Sites (건설현장의 아차사고 연결가능성에 대한 패턴분석)

  • Sang Hyun Kim;Yeon Cheol Shin;Yu Mi Moon
    • Journal of the Society of Disaster Information
    • /
    • v.19 no.1
    • /
    • pp.216-230
    • /
    • 2023
  • Purpose: The purpose is to prevent accidents by predicting disasters through the analysis of near-miss. Method: In this study, a near-miss literature review and data were collected at construction sites, and a questionnaire survey was conducted to use logistic regression analysis and decision tree analysis to classify the possibility of near-miss connection. Result: As a result of analyzing the effects of near-miss types on mental, physical, and safety habits and behaviors, the factor with a high influence on the body is the need for near-miss management, the type of job is electricity·information communication, and health status in order, and the mental factor is the construction scale The influence was high, and the factors with the highest influence on the habit behavior factors were analyzed in the order of experience, number of serious injuries, and occupation in order of illusion, inappropriate work instructions, and body parts. Through decision tree analysis, factors and patterns that affect the possibility of a near-miss being a surprise accident were identified. Conclusion: Construction site officials consider the observation of near-miss and mentally and physically. Specific management of the relevance of physical aspects to near-miss should be implemented, and a work environment in which serious accidents are reduced is expected through personnel allocation, work plans, work procedures and methods, and feedback so that inappropriate work instructions do not lead to near-miss.

Scenarios for Manufacturing Process Data Analysis using Data Mining (데이터 마이닝을 이용한 생산공정 데이터 분석 시나리오)

  • Lee, Hyoung-wook;Bae, Sung-min
    • Journal of Institute of Convergence Technology
    • /
    • v.3 no.1
    • /
    • pp.41-44
    • /
    • 2013
  • Process and manufacturing data are numerously accumulated to the enterprise database in industries but little of those data are utilized. Data mining can support a decision to manager in process from the data. However, it is not easy to field managers because a proper adoption of various schemes is very difficult. In this paper, six scenarios are conducted using data mining schemes for the various situations of field claims such as yield problem, trend analysis and prediction of yield according to changes of operating conditions, etc. Scenarios, like templates, of various analysis situations are helpful to users.

  • PDF

Iowa Liquor Sales Data Predictive Analysis Using Spark

  • Ankita Paul;Shuvadeep Kundu;Jongwook Woo
    • Asia pacific journal of information systems
    • /
    • v.31 no.2
    • /
    • pp.185-196
    • /
    • 2021
  • The paper aims to analyze and predict sales of liquor in the state of Iowa by applying machine learning algorithms to models built for prediction. We have taken recourse of Azure ML and Spark ML for our predictive analysis, which is legacy machine learning (ML) systems and Big Data ML, respectively. We have worked on the Iowa liquor sales dataset comprising of records from 2012 to 2019 in 24 columns and approximately 1.8 million rows. We have concluded by comparing the models with different algorithms applied and their accuracy in predicting the sales using both Azure ML and Spark ML. We find that the Linear Regression model has the highest precision and Decision Forest Regression has the fastest computing time with the sample data set using the legacy Azure ML systems. Decision Tree Regression model in Spark ML has the highest accuracy with the quickest computing time for the entire data set using the Big Data Spark systems.

A Study on the Big Data Analysis and Predictive Models for Quality Issues in Defense C5ISR (국방 C5ISR 분야 품질문제의 빅데이터 분석 및 예측 모델에 대한 연구)

  • Hyoung Jo Huh;Sujin Ko;Seung Hyun Baek
    • Journal of Korean Society for Quality Management
    • /
    • v.51 no.4
    • /
    • pp.551-571
    • /
    • 2023
  • Purpose: The purpose of this study is to propose useful suggestions by analyzing the causal effect relationship between the failure rate of quality and the process variables in the C5ISR domain of the defense industry. Methods: The collected data through the in house Systems were analyzed using Big data analysis. Data analysis between quality data and A/S history data was conducted using the CRISP-DM(Cross-Industry Standard Process for Data Mining) analysis process. Results: The results of this study are as follows: After evaluating the performance of candidate models for the influence of inspection data and A/S history data, logistic regression was selected as the final model because it performed relatively well compared to the decision tree with an accuracy of 82%/67% and an AUC of 0.66/0.57. Based on this model, we estimated the coefficients using 'R', a data analysis tool, and found that a specific variable(continuous maximum discharge current time) had a statistically significant effect on the A/S quality failure rate and it was analysed that 82% of the failure rate could be predicted. Conclusion: As the first case of applying big data analysis to quality issues in the defense industry, this study confirms that it is possible to improve the market failure rates of defense products by focusing on the measured values of the main causes of failures derived through the big data analysis process, and identifies improvements, such as the number of data samples and data collection limitations, to be addressed in subsequent studies for a more reliable analysis model.

A Feature Analysis of Industrial Accidents Using C4.5 Algorithm (C4.5 알고리즘을 이용한 산업 재해의 특성 분석)

  • Leem, Young-Moon;Kwag, Jun-Koo;Hwang, Young-Seob
    • Journal of the Korean Society of Safety
    • /
    • v.20 no.4 s.72
    • /
    • pp.130-137
    • /
    • 2005
  • Decision tree algorithm is one of the data mining techniques, which conducts grouping or prediction into several sub-groups from interested groups. This technique can analyze a feature of type on groups and can be used to detect differences in the type of industrial accidents. This paper uses C4.5 algorithm for the feature analysis. The data set consists of 24,887 features through data selection from total data of 25,159 taken from 2 year observation of industrial accidents in Korea For the purpose of this paper, one target value and eight independent variables are detailed by type of industrial accidents. There are 222 total tree nodes and 151 leaf nodes after grouping. This paper Provides an acceptable level of accuracy(%) and error rate(%) in order to measure tree accuracy about created trees. The objective of this paper is to analyze the efficiency of the C4.5 algorithm to classify types of industrial accidents data and thereby identify potential weak points in disaster risk grouping.