• Title/Summary/Keyword: Decision trees

Search Result 303, Processing Time 0.03 seconds

A Feature Analysis of Industrial Accidents Using C4.5 Algorithm (C4.5 알고리즘을 이용한 산업 재해의 특성 분석)

  • Leem, Young-Moon;Kwag, Jun-Koo;Hwang, Young-Seob
    • Journal of the Korean Society of Safety
    • /
    • v.20 no.4 s.72
    • /
    • pp.130-137
    • /
    • 2005
  • Decision tree algorithm is one of the data mining techniques, which conducts grouping or prediction into several sub-groups from interested groups. This technique can analyze a feature of type on groups and can be used to detect differences in the type of industrial accidents. This paper uses C4.5 algorithm for the feature analysis. The data set consists of 24,887 features through data selection from total data of 25,159 taken from 2 year observation of industrial accidents in Korea For the purpose of this paper, one target value and eight independent variables are detailed by type of industrial accidents. There are 222 total tree nodes and 151 leaf nodes after grouping. This paper Provides an acceptable level of accuracy(%) and error rate(%) in order to measure tree accuracy about created trees. The objective of this paper is to analyze the efficiency of the C4.5 algorithm to classify types of industrial accidents data and thereby identify potential weak points in disaster risk grouping.

Korean Transition-based Dependency Parsing with Recurrent Neural Network (순환 신경망을 이용한 전이 기반 한국어 의존 구문 분석)

  • Li, Jianri;Lee, Jong-Hyeok
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.8
    • /
    • pp.567-571
    • /
    • 2015
  • Transition-based dependency parsing requires much time and efforts to design and select features from a very large number of possible combinations. Recent studies have successfully applied Multi-Layer Perceptrons (MLP) to find solutions to this problem and to reduce the data sparseness. However, most of these methods have adopted greedy search and can only consider a limited amount of information from the context window. In this study, we use a Recurrent Neural Network to handle long dependencies between sub dependency trees of current state and current transition action. The results indicate that our method provided a higher accuracy (UAS) than an MLP based model.

Development of the Road Weather Detection Algorithm on CCTV Video Images using Double Decision Trees (이중결정트리를 이용한 CCTV영상에서의 도로 날씨정보검출알고리즘 개발)

  • Park, Beung-Raul;NamKoong, Sung;Lim, Joong-Tae
    • The KIPS Transactions:PartB
    • /
    • v.14B no.6
    • /
    • pp.445-452
    • /
    • 2007
  • We proposed a detection scheme of weather information in CCTV video images in this paper. The scheme obtains the RGB distribution of shiny day and divide a target image into cloud, rain, snow and for RGB distributions. shiny day RGB distribution. Our scheme designed systematically to detection and separation special characteristics of images from complex weather information. Our algorithm has less overhead than the previous methods to use weather database DB at the view of time and space. And our algorithm can be use in real world system with low cost of implementation. Also, our algorithm use informations of temperature, humidity, date, and time to detect the information of weather with high quality.

Decision Support System fur Arrival/Departure of Ships in Port by using Enhanced Genetic Programming (개선된 유전적 프로그래밍 기법을 이용한 선박 입출항 의사결정 지원 시스템)

  • Lee, K. H.;Rhee, W.
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2001.06a
    • /
    • pp.383-389
    • /
    • 2001
  • 된 연구에서 대상으로 하고 있는 LG 정유 광양항 제품부두는 7 선석(Berth)에 재화중량(DWT) 300톤에서 48000 톤의 선박까지 다양한 선박이 이용하고 있으며, 해상의 기상상태에 따른 선박 입출향 통제 지침 설정이 어렵고, 현재 사용하고 있는 지침의 근거가 명확하지 않아 현재의 부두 운영이 비효율적이거나 안전성이 결여되어 있다고 할 수 있다. 따라서 이를 개선하기 위한 합리적인 부두운영 제한조건 개발이 절실히 요구되었다. 본 논문에서는 대상 부두의 특성, 대상 선박의 특성, 하중상태, 선박 운항자의 특성 등을 고려하여 해상/기상 상황(바람, 조류 및 파랑)에 따른 부두 입출항 가능 여부를 정량적으로 판단하고, 안전성 향상 방안을 제시할 수 있는 의사결정 시스템을 개발하고 5번, 7번 선석을 대상으로 이를 검증하였다. 여기서는 입출항 여부를 정량적으로 판단하여 결과를 제시하기 위해서 유전적 프로그래밍(Genetic Programming)을 이용한 기계학습 방법을 이용하였으며, GP의 방대한 계산량을 줄이기 위한 가중 선형 연상 기억(Weighted Linear Associative Memory: WLAM) 방법의 도입 및 전역 최적점을 쉽게 찾기 위한 Group of Additive Genetic Programming Trees(GAGPT)를 도입함으로써 학습 성능을 개선하였다.

  • PDF

A comparative study of machine learning methods for automated identification of radioisotopes using NaI gamma-ray spectra

  • Galib, S.M.;Bhowmik, P.K.;Avachat, A.V.;Lee, H.K.
    • Nuclear Engineering and Technology
    • /
    • v.53 no.12
    • /
    • pp.4072-4079
    • /
    • 2021
  • This article presents a study on the state-of-the-art methods for automated radioactive material detection and identification, using gamma-ray spectra and modern machine learning methods. The recent developments inspired this in deep learning algorithms, and the proposed method provided better performance than the current state-of-the-art models. Machine learning models such as: fully connected, recurrent, convolutional, and gradient boosted decision trees, are applied under a wide variety of testing conditions, and their advantage and disadvantage are discussed. Furthermore, a hybrid model is developed by combining the fully-connected and convolutional neural network, which shows the best performance among the different machine learning models. These improvements are represented by the model's test performance metric (i.e., F1 score) of 93.33% with an improvement of 2%-12% than the state-of-the-art model at various conditions. The experimental results show that fusion of classical neural networks and modern deep learning architecture is a suitable choice for interpreting gamma spectra data where real-time and remote detection is necessary.

Development and Comparison of Data Mining-based Prediction Models of Building Fire Probability

  • Hong, Sung-gwan;Jeong, Seung Ryul
    • Journal of Internet Computing and Services
    • /
    • v.19 no.6
    • /
    • pp.101-112
    • /
    • 2018
  • A lot of manpower and budgets are being used to prevent fires, and only a small portion of the data generated during this process is used for disaster prevention activities. This study develops a prediction model of fire occurrence probability based on data mining in order to more actively use these data for disaster prevention activities. For this purpose, variables for predicting fire occurrence probability of various buildings were selected and data of construction administrative system, national fire information system, and Korea Fire Insurance Association were collected and integrated data set was constructed. After appropriate data cleansing and preprocessing, various data mining methodologies such as artificial neural network, decision trees, SVM, and Naive Bayesian were used to develop a prediction model of the fire occurrence probability of buildings. The most accurate model among the derived models is Linear SVM model which shows 68.42% as experimental data and 63.54% as verification data and it is the best model to predict fire occurrence probability of buildings. As this study develops the prediction model which uses only the set values of the specific ranges, future studies may explore more opportunites to use various setting values not shown in this study.

Correlated variable importance for random forests (랜덤포레스트를 위한 상관예측변수 중요도)

  • Shin, Seung Beom;Cho, Hyung Jun
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.2
    • /
    • pp.177-190
    • /
    • 2021
  • Random forests is a popular method that improves the instability and accuracy of decision trees by ensembles. In contrast to increasing the accuracy, the ease of interpretation is sacrificed; hence, to compensate for this, variable importance is provided. The variable importance indicates which variable plays a role more importantly in constructing the random forests. However, when a predictor is correlated with other predictors, the variable importance of the existing importance algorithm may be distorted. The downward bias of correlated predictors may reduce the importance of truly important predictors. We propose a new algorithm remedying the downward bias of correlated predictors. The performance of the proposed algorithm is demonstrated by the simulated data and illustrated by the real data.

Prediction of ultimate shear strength and failure modes of R/C ledge beams using machine learning framework

  • Ahmed M. Yousef;Karim Abd El-Hady;Mohamed E. El-Madawy
    • Structural Monitoring and Maintenance
    • /
    • v.9 no.4
    • /
    • pp.337-357
    • /
    • 2022
  • The objective of this study is to present a data-driven machine learning (ML) framework for predicting ultimate shear strength and failure modes of reinforced concrete ledge beams. Experimental tests were collected on these beams with different loading, geometric and material properties. The database was analyzed using different ML algorithms including decision trees, discriminant analysis, support vector machine, logistic regression, nearest neighbors, naïve bayes, ensemble and artificial neural networks to identify the governing and critical parameters of reinforced concrete ledge beams. The results showed that ML framework can effectively identify the failure mode of these beams either web shear failure, flexural failure or ledge failure. ML framework can also derive equations for predicting the ultimate shear strength for each failure mode. A comparison of the ultimate shear strength of ledge failure was conducted between the experimental results and the results from the proposed equations and the design equations used by international codes. These comparisons indicated that the proposed ML equations predict the ultimate shear strength of reinforced concrete ledge beams better than the design equations of AASHTO LRFD-2020 or PCI-2020.

New Approaches to Xerostomia with Salivary Flow Rate Based on Machine Learning Algorithm

  • Yeon-Hee Lee;Q-Schick Auh;Hee-Kyung Park
    • Journal of Korean Dental Science
    • /
    • v.16 no.1
    • /
    • pp.47-62
    • /
    • 2023
  • Purpose: We aimed to investigate the objective cutoff values of unstimulated flow rates (UFR) and stimulated salivary flow rates (SFR) in patients with xerostomia and to present an optimal machine learning model with a classification and regression tree (CART) for all ages. Materials and Methods: A total of 829 patients with oral diseases were enrolled (591 females; mean age, 59.29±16.40 years; 8~95 years old), 199 patients with xerostomia and 630 patients without xerostomia. Salivary and clinical characteristics were collected and analyzed. Result: Patients with xerostomia had significantly lower levels of UFR (0.29±0.22 vs. 0.41±0.24 ml/min) and SFR (1.12±0.55 vs. 1.39±0.94 ml/min) (P<0.001), respectively, compared to those with non-xerostomia. The presence of xerostomia had a significantly negative correlation with UFR (r=-0.603, P=0.002) and SFR (r=-0.301, P=0.017). In the diagnosis of xerostomia based on the CART algorithm, the presence of stomatitis, candidiasis, halitosis, psychiatric disorder, and hyperlipidemia were significant predictors for xerostomia, and the cutoff ranges for xerostomia for UFR and SFR were 0.03~0.18 ml/min and 0.85~1.6 ml/min, respectively. Conclusion: Xerostomia was correlated with decreases in UFR and SFR, and their cutoff values varied depending on the patient's underlying oral and systemic conditions.

The Effect of Inaccurate Quality Signaling under Information Asymmetry

  • Seung Huh
    • Asia-Pacific Journal of Business
    • /
    • v.14 no.1
    • /
    • pp.231-246
    • /
    • 2023
  • Purpose - This study attempts to provide a new theoretical perspective on the quality signaling and its impact on a market under information asymmetry, focusing on how the accuracy and the cost of quality signaling affect sellers' and buyers' profit, suggesting appropriate designs of quality signaling methods which mitigates information asymmetry. Design/methodology/approach - In order to examine the effect of quality signaling on strategic interactions within the market, we establish an analytic model where market outcomes are determined by seller's quality claim and price, and buyers are risk-neutral. By investigating this analytic model through relevant game trees, we find the subgame perfect Nash equilibria of the market and predict related market outcomes based on sellers' quality signaling strategy. Findings - Our analytic model shows counterintuitive results that seller profit will be the lowest with inaccurate quality signaling and the highest with no quality signaling, mostly due to the certification cost. Consequently, sellers should proceed with caution if the quality signaling is less than accurate, as it may backfire. We believe that this is due to the fact that the inaccuracy of quality signaling causes some confusion and uncertainty in both sellers and buyers' decision to maximize profit, making it hard for sellers to predict buyers' behavior. Research implications or Originality - Although the sources and types of quality signaling errors have been investigated in the literature, there has not been satisfactory understanding regarding how inaccuracy of quality certification affects specific market outcomes. We expect that our theoretical model would provide important implications on how to utilize quality signaling to solve adverse selection issues in markets under information asymmetry.