• 제목/요약/키워드: Decision-trees

검색결과 311건 처리시간 0.03초

Development and Comparison of Data Mining-based Prediction Models of Building Fire Probability

  • 홍성관;정승렬
    • 인터넷정보학회논문지
    • /
    • 제19권6호
    • /
    • pp.101-112
    • /
    • 2018
  • A lot of manpower and budgets are being used to prevent fires, and only a small portion of the data generated during this process is used for disaster prevention activities. This study develops a prediction model of fire occurrence probability based on data mining in order to more actively use these data for disaster prevention activities. For this purpose, variables for predicting fire occurrence probability of various buildings were selected and data of construction administrative system, national fire information system, and Korea Fire Insurance Association were collected and integrated data set was constructed. After appropriate data cleansing and preprocessing, various data mining methodologies such as artificial neural network, decision trees, SVM, and Naive Bayesian were used to develop a prediction model of the fire occurrence probability of buildings. The most accurate model among the derived models is Linear SVM model which shows 68.42% as experimental data and 63.54% as verification data and it is the best model to predict fire occurrence probability of buildings. As this study develops the prediction model which uses only the set values of the specific ranges, future studies may explore more opportunites to use various setting values not shown in this study.

랜덤포레스트를 위한 상관예측변수 중요도 (Correlated variable importance for random forests)

  • 신승범;조형준
    • 응용통계연구
    • /
    • 제34권2호
    • /
    • pp.177-190
    • /
    • 2021
  • 랜덤포레스트는 여러 의사결정나무 모형들을 융합하여 안정성과 예측력을 높여주기 때문에 종종 사용되는 방법이다. 예측력을 증가시키는 반면 해석의 용이성을 희생하기 때문에 이를 보상하기 위해 변수의 중요도를 제공한다. 변수의 중요도는 랜덤포레스트를 구축할 때 변수가 얼마나 중요한 역할을 하는지를 알려 준다. 그러나 어떤 예측변수가 다른 예측변수들과 상관되어 있을 때 기존 알고리즘의 변수중요도는 왜곡될 수 있다. 상관된 예측변수들의 하향 편향은 예측변수의 중요도를 실제 중요도보다 낮게 측정하게 한다. 우리는 기존 알고리즘을 수정하여 상관 예측변수의 하향 편향을 회복하는 새로운 알고리즘을 제안한다. 제안된 알고리즘의 성능은 모의 자료에 의해 증명되고 실제 자료에 의해 설명된다.

Prediction of ultimate shear strength and failure modes of R/C ledge beams using machine learning framework

  • Ahmed M. Yousef;Karim Abd El-Hady;Mohamed E. El-Madawy
    • Structural Monitoring and Maintenance
    • /
    • 제9권4호
    • /
    • pp.337-357
    • /
    • 2022
  • The objective of this study is to present a data-driven machine learning (ML) framework for predicting ultimate shear strength and failure modes of reinforced concrete ledge beams. Experimental tests were collected on these beams with different loading, geometric and material properties. The database was analyzed using different ML algorithms including decision trees, discriminant analysis, support vector machine, logistic regression, nearest neighbors, naïve bayes, ensemble and artificial neural networks to identify the governing and critical parameters of reinforced concrete ledge beams. The results showed that ML framework can effectively identify the failure mode of these beams either web shear failure, flexural failure or ledge failure. ML framework can also derive equations for predicting the ultimate shear strength for each failure mode. A comparison of the ultimate shear strength of ledge failure was conducted between the experimental results and the results from the proposed equations and the design equations used by international codes. These comparisons indicated that the proposed ML equations predict the ultimate shear strength of reinforced concrete ledge beams better than the design equations of AASHTO LRFD-2020 or PCI-2020.

New Approaches to Xerostomia with Salivary Flow Rate Based on Machine Learning Algorithm

  • Yeon-Hee Lee;Q-Schick Auh;Hee-Kyung Park
    • Journal of Korean Dental Science
    • /
    • 제16권1호
    • /
    • pp.47-62
    • /
    • 2023
  • Purpose: We aimed to investigate the objective cutoff values of unstimulated flow rates (UFR) and stimulated salivary flow rates (SFR) in patients with xerostomia and to present an optimal machine learning model with a classification and regression tree (CART) for all ages. Materials and Methods: A total of 829 patients with oral diseases were enrolled (591 females; mean age, 59.29±16.40 years; 8~95 years old), 199 patients with xerostomia and 630 patients without xerostomia. Salivary and clinical characteristics were collected and analyzed. Result: Patients with xerostomia had significantly lower levels of UFR (0.29±0.22 vs. 0.41±0.24 ml/min) and SFR (1.12±0.55 vs. 1.39±0.94 ml/min) (P<0.001), respectively, compared to those with non-xerostomia. The presence of xerostomia had a significantly negative correlation with UFR (r=-0.603, P=0.002) and SFR (r=-0.301, P=0.017). In the diagnosis of xerostomia based on the CART algorithm, the presence of stomatitis, candidiasis, halitosis, psychiatric disorder, and hyperlipidemia were significant predictors for xerostomia, and the cutoff ranges for xerostomia for UFR and SFR were 0.03~0.18 ml/min and 0.85~1.6 ml/min, respectively. Conclusion: Xerostomia was correlated with decreases in UFR and SFR, and their cutoff values varied depending on the patient's underlying oral and systemic conditions.

The Effect of Inaccurate Quality Signaling under Information Asymmetry

  • Seung Huh
    • 아태비즈니스연구
    • /
    • 제14권1호
    • /
    • pp.231-246
    • /
    • 2023
  • Purpose - This study attempts to provide a new theoretical perspective on the quality signaling and its impact on a market under information asymmetry, focusing on how the accuracy and the cost of quality signaling affect sellers' and buyers' profit, suggesting appropriate designs of quality signaling methods which mitigates information asymmetry. Design/methodology/approach - In order to examine the effect of quality signaling on strategic interactions within the market, we establish an analytic model where market outcomes are determined by seller's quality claim and price, and buyers are risk-neutral. By investigating this analytic model through relevant game trees, we find the subgame perfect Nash equilibria of the market and predict related market outcomes based on sellers' quality signaling strategy. Findings - Our analytic model shows counterintuitive results that seller profit will be the lowest with inaccurate quality signaling and the highest with no quality signaling, mostly due to the certification cost. Consequently, sellers should proceed with caution if the quality signaling is less than accurate, as it may backfire. We believe that this is due to the fact that the inaccuracy of quality signaling causes some confusion and uncertainty in both sellers and buyers' decision to maximize profit, making it hard for sellers to predict buyers' behavior. Research implications or Originality - Although the sources and types of quality signaling errors have been investigated in the literature, there has not been satisfactory understanding regarding how inaccuracy of quality certification affects specific market outcomes. We expect that our theoretical model would provide important implications on how to utilize quality signaling to solve adverse selection issues in markets under information asymmetry.

Automated Phase Identification in Shingle Installation Operation Using Machine Learning

  • Dutta, Amrita;Breloff, Scott P.;Dai, Fei;Sinsel, Erik W.;Warren, Christopher M.;Wu, John Z.
    • 국제학술발표논문집
    • /
    • The 9th International Conference on Construction Engineering and Project Management
    • /
    • pp.728-735
    • /
    • 2022
  • Roofers get exposed to increased risk of knee musculoskeletal disorders (MSDs) at different phases of a sloped shingle installation task. As different phases are associated with different risk levels, this study explored the application of machine learning for automated classification of seven phases in a shingle installation task using knee kinematics and roof slope information. An optical motion capture system was used to collect knee kinematics data from nine subjects who mimicked shingle installation on a slope-adjustable wooden platform. Four features were used in building a phase classification model. They were three knee joint rotation angles (i.e., flexion, abduction-adduction, and internal-external rotation) of the subjects, and the roof slope at which they operated. Three ensemble machine learning algorithms (i.e., random forests, decision trees, and k-nearest neighbors) were used for training and prediction. The simulations indicate that the k-nearest neighbor classifier provided the best performance, with an overall accuracy of 92.62%, demonstrating the considerable potential of machine learning methods in detecting shingle installation phases from workers knee joint rotation and roof slope information. This knowledge, with further investigation, may facilitate knee MSD risk identification among roofers and intervention development.

  • PDF

Assessment of wall convergence for tunnels using machine learning techniques

  • Mahmoodzadeh, Arsalan;Nejati, Hamid Reza;Mohammadi, Mokhtar;Ibrahim, Hawkar Hashim;Mohammed, Adil Hussein;Rashidi, Shima
    • Geomechanics and Engineering
    • /
    • 제31권3호
    • /
    • pp.265-279
    • /
    • 2022
  • Tunnel convergence prediction is essential for the safe construction and design of tunnels. This study proposes five machine learning models of deep neural network (DNN), K-nearest neighbors (KNN), Gaussian process regression (GPR), support vector regression (SVR), and decision trees (DT) to predict the convergence phenomenon during or shortly after the excavation of tunnels. In this respect, a database including 650 datasets (440 for training, 110 for validation, and 100 for test) was gathered from the previously constructed tunnels. In the database, 12 effective parameters on the tunnel convergence and a target of tunnel wall convergence were considered. Both 5-fold and hold-out cross validation methods were used to analyze the predicted outcomes in the ML models. Finally, the DNN method was proposed as the most robust model. Also, to assess each parameter's contribution to the prediction problem, the backward selection method was used. The results showed that the highest and lowest impact parameters for tunnel convergence are tunnel depth and tunnel width, respectively.

콘크리트 탄산화 및 열효과에 의한 경년열화 예측을 위한 기계학습 모델의 정확성 검토 (Accuracy Evaluation of Machine Learning Model for Concrete Aging Prediction due to Thermal Effect and Carbonation)

  • 김현수
    • 한국공간구조학회논문집
    • /
    • 제23권4호
    • /
    • pp.81-88
    • /
    • 2023
  • Numerous factors contribute to the deterioration of reinforced concrete structures. Elevated temperatures significantly alter the composition of the concrete ingredients, consequently diminishing the concrete's strength properties. With the escalation of global CO2 levels, the carbonation of concrete structures has emerged as a critical challenge, substantially affecting concrete durability research. Assessing and predicting concrete degradation due to thermal effects and carbonation are crucial yet intricate tasks. To address this, multiple prediction models for concrete carbonation and compressive strength under thermal impact have been developed. This study employs seven machine learning algorithms-specifically, multiple linear regression, decision trees, random forest, support vector machines, k-nearest neighbors, artificial neural networks, and extreme gradient boosting algorithms-to formulate predictive models for concrete carbonation and thermal impact. Two distinct datasets, derived from reported experimental studies, were utilized for training these predictive models. Performance evaluation relied on metrics like root mean square error, mean square error, mean absolute error, and coefficient of determination. The optimization of hyperparameters was achieved through k-fold cross-validation and grid search techniques. The analytical outcomes demonstrate that neural networks and extreme gradient boosting algorithms outshine the remaining five machine learning approaches, showcasing outstanding predictive performance for concrete carbonation and thermal effect modeling.

Multihazard capacity optimization of an NPP using a multi-objective genetic algorithm and sampling-based PSA

  • Eujeong Choi;Shinyoung Kwag;Daegi Hahm
    • Nuclear Engineering and Technology
    • /
    • 제56권2호
    • /
    • pp.644-654
    • /
    • 2024
  • After the Tohoku earthquake and tsunami (Japan, 2011), regulatory efforts to mitigate external hazards have increased both the safety requirements and the total capital cost of nuclear power plants (NPPs). In these circumstances, identifying not only disaster robustness but also cost-effective capacity setting of NPPs has become one of the most important tasks for the nuclear power industry. A few studies have been performed to relocate the seismic capacity of NPPs, yet the effects of multiple hazards have not been accounted for in NPP capacity optimization. The major challenges in extending this problem to the multihazard dimension are (1) the high computational costs for both multihazard risk quantification and system-level optimization and (2) the lack of capital cost databases of NPPs. To resolve these issues, this paper proposes an effective method that identifies the optimal multihazard capacity of NPPs using a multi-objective genetic algorithm and the two-stage direct quantification of fault trees using Monte Carlo simulation method, called the two-stage DQFM. Also, a capacity-based indirect capital cost measure is proposed. Such a proposed method enables NPP to achieve safety and cost-effectiveness against multi-hazard simultaneously within the computationally efficient platform. The proposed multihazard capacity optimization framework is demonstrated and tested with an earthquake-tsunami example.

A Comprehensive Approach for Tamil Handwritten Character Recognition with Feature Selection and Ensemble Learning

  • Manoj K;Iyapparaja M
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제18권6호
    • /
    • pp.1540-1561
    • /
    • 2024
  • This research proposes a novel approach for Tamil Handwritten Character Recognition (THCR) that combines feature selection and ensemble learning techniques. The Tamil script is complex and highly variable, requiring a robust and accurate recognition system. Feature selection is used to reduce dimensionality while preserving discriminative features, improving classification performance and reducing computational complexity. Several feature selection methods are compared, and individual classifiers (support vector machines, neural networks, and decision trees) are evaluated through extensive experiments. Ensemble learning techniques such as bagging, and boosting are employed to leverage the strengths of multiple classifiers and enhance recognition accuracy. The proposed approach is evaluated on the HP Labs Dataset, achieving an impressive 95.56% accuracy using an ensemble learning framework based on support vector machines. The dataset consists of 82,928 samples with 247 distinct classes, contributed by 500 participants from Tamil Nadu. It includes 40,000 characters with 500 user variations. The results surpass or rival existing methods, demonstrating the effectiveness of the approach. The research also offers insights for developing advanced recognition systems for other complex scripts. Future investigations could explore the integration of deep learning techniques and the extension of the proposed approach to other Indic scripts and languages, advancing the field of handwritten character recognition.