• 제목/요약/키워드: decision trees

검색결과 305건 처리시간 0.025초

New Approaches to Xerostomia with Salivary Flow Rate Based on Machine Learning Algorithm

  • Yeon-Hee Lee;Q-Schick Auh;Hee-Kyung Park
    • Journal of Korean Dental Science
    • /
    • 제16권1호
    • /
    • pp.47-62
    • /
    • 2023
  • Purpose: We aimed to investigate the objective cutoff values of unstimulated flow rates (UFR) and stimulated salivary flow rates (SFR) in patients with xerostomia and to present an optimal machine learning model with a classification and regression tree (CART) for all ages. Materials and Methods: A total of 829 patients with oral diseases were enrolled (591 females; mean age, 59.29±16.40 years; 8~95 years old), 199 patients with xerostomia and 630 patients without xerostomia. Salivary and clinical characteristics were collected and analyzed. Result: Patients with xerostomia had significantly lower levels of UFR (0.29±0.22 vs. 0.41±0.24 ml/min) and SFR (1.12±0.55 vs. 1.39±0.94 ml/min) (P<0.001), respectively, compared to those with non-xerostomia. The presence of xerostomia had a significantly negative correlation with UFR (r=-0.603, P=0.002) and SFR (r=-0.301, P=0.017). In the diagnosis of xerostomia based on the CART algorithm, the presence of stomatitis, candidiasis, halitosis, psychiatric disorder, and hyperlipidemia were significant predictors for xerostomia, and the cutoff ranges for xerostomia for UFR and SFR were 0.03~0.18 ml/min and 0.85~1.6 ml/min, respectively. Conclusion: Xerostomia was correlated with decreases in UFR and SFR, and their cutoff values varied depending on the patient's underlying oral and systemic conditions.

The Effect of Inaccurate Quality Signaling under Information Asymmetry

  • Seung Huh
    • 아태비즈니스연구
    • /
    • 제14권1호
    • /
    • pp.231-246
    • /
    • 2023
  • Purpose - This study attempts to provide a new theoretical perspective on the quality signaling and its impact on a market under information asymmetry, focusing on how the accuracy and the cost of quality signaling affect sellers' and buyers' profit, suggesting appropriate designs of quality signaling methods which mitigates information asymmetry. Design/methodology/approach - In order to examine the effect of quality signaling on strategic interactions within the market, we establish an analytic model where market outcomes are determined by seller's quality claim and price, and buyers are risk-neutral. By investigating this analytic model through relevant game trees, we find the subgame perfect Nash equilibria of the market and predict related market outcomes based on sellers' quality signaling strategy. Findings - Our analytic model shows counterintuitive results that seller profit will be the lowest with inaccurate quality signaling and the highest with no quality signaling, mostly due to the certification cost. Consequently, sellers should proceed with caution if the quality signaling is less than accurate, as it may backfire. We believe that this is due to the fact that the inaccuracy of quality signaling causes some confusion and uncertainty in both sellers and buyers' decision to maximize profit, making it hard for sellers to predict buyers' behavior. Research implications or Originality - Although the sources and types of quality signaling errors have been investigated in the literature, there has not been satisfactory understanding regarding how inaccuracy of quality certification affects specific market outcomes. We expect that our theoretical model would provide important implications on how to utilize quality signaling to solve adverse selection issues in markets under information asymmetry.

Automated Phase Identification in Shingle Installation Operation Using Machine Learning

  • Dutta, Amrita;Breloff, Scott P.;Dai, Fei;Sinsel, Erik W.;Warren, Christopher M.;Wu, John Z.
    • 국제학술발표논문집
    • /
    • The 9th International Conference on Construction Engineering and Project Management
    • /
    • pp.728-735
    • /
    • 2022
  • Roofers get exposed to increased risk of knee musculoskeletal disorders (MSDs) at different phases of a sloped shingle installation task. As different phases are associated with different risk levels, this study explored the application of machine learning for automated classification of seven phases in a shingle installation task using knee kinematics and roof slope information. An optical motion capture system was used to collect knee kinematics data from nine subjects who mimicked shingle installation on a slope-adjustable wooden platform. Four features were used in building a phase classification model. They were three knee joint rotation angles (i.e., flexion, abduction-adduction, and internal-external rotation) of the subjects, and the roof slope at which they operated. Three ensemble machine learning algorithms (i.e., random forests, decision trees, and k-nearest neighbors) were used for training and prediction. The simulations indicate that the k-nearest neighbor classifier provided the best performance, with an overall accuracy of 92.62%, demonstrating the considerable potential of machine learning methods in detecting shingle installation phases from workers knee joint rotation and roof slope information. This knowledge, with further investigation, may facilitate knee MSD risk identification among roofers and intervention development.

  • PDF

Assessment of wall convergence for tunnels using machine learning techniques

  • Mahmoodzadeh, Arsalan;Nejati, Hamid Reza;Mohammadi, Mokhtar;Ibrahim, Hawkar Hashim;Mohammed, Adil Hussein;Rashidi, Shima
    • Geomechanics and Engineering
    • /
    • 제31권3호
    • /
    • pp.265-279
    • /
    • 2022
  • Tunnel convergence prediction is essential for the safe construction and design of tunnels. This study proposes five machine learning models of deep neural network (DNN), K-nearest neighbors (KNN), Gaussian process regression (GPR), support vector regression (SVR), and decision trees (DT) to predict the convergence phenomenon during or shortly after the excavation of tunnels. In this respect, a database including 650 datasets (440 for training, 110 for validation, and 100 for test) was gathered from the previously constructed tunnels. In the database, 12 effective parameters on the tunnel convergence and a target of tunnel wall convergence were considered. Both 5-fold and hold-out cross validation methods were used to analyze the predicted outcomes in the ML models. Finally, the DNN method was proposed as the most robust model. Also, to assess each parameter's contribution to the prediction problem, the backward selection method was used. The results showed that the highest and lowest impact parameters for tunnel convergence are tunnel depth and tunnel width, respectively.

콘크리트 탄산화 및 열효과에 의한 경년열화 예측을 위한 기계학습 모델의 정확성 검토 (Accuracy Evaluation of Machine Learning Model for Concrete Aging Prediction due to Thermal Effect and Carbonation)

  • 김현수
    • 한국공간구조학회논문집
    • /
    • 제23권4호
    • /
    • pp.81-88
    • /
    • 2023
  • Numerous factors contribute to the deterioration of reinforced concrete structures. Elevated temperatures significantly alter the composition of the concrete ingredients, consequently diminishing the concrete's strength properties. With the escalation of global CO2 levels, the carbonation of concrete structures has emerged as a critical challenge, substantially affecting concrete durability research. Assessing and predicting concrete degradation due to thermal effects and carbonation are crucial yet intricate tasks. To address this, multiple prediction models for concrete carbonation and compressive strength under thermal impact have been developed. This study employs seven machine learning algorithms-specifically, multiple linear regression, decision trees, random forest, support vector machines, k-nearest neighbors, artificial neural networks, and extreme gradient boosting algorithms-to formulate predictive models for concrete carbonation and thermal impact. Two distinct datasets, derived from reported experimental studies, were utilized for training these predictive models. Performance evaluation relied on metrics like root mean square error, mean square error, mean absolute error, and coefficient of determination. The optimization of hyperparameters was achieved through k-fold cross-validation and grid search techniques. The analytical outcomes demonstrate that neural networks and extreme gradient boosting algorithms outshine the remaining five machine learning approaches, showcasing outstanding predictive performance for concrete carbonation and thermal effect modeling.

Multihazard capacity optimization of an NPP using a multi-objective genetic algorithm and sampling-based PSA

  • Eujeong Choi;Shinyoung Kwag;Daegi Hahm
    • Nuclear Engineering and Technology
    • /
    • 제56권2호
    • /
    • pp.644-654
    • /
    • 2024
  • After the Tohoku earthquake and tsunami (Japan, 2011), regulatory efforts to mitigate external hazards have increased both the safety requirements and the total capital cost of nuclear power plants (NPPs). In these circumstances, identifying not only disaster robustness but also cost-effective capacity setting of NPPs has become one of the most important tasks for the nuclear power industry. A few studies have been performed to relocate the seismic capacity of NPPs, yet the effects of multiple hazards have not been accounted for in NPP capacity optimization. The major challenges in extending this problem to the multihazard dimension are (1) the high computational costs for both multihazard risk quantification and system-level optimization and (2) the lack of capital cost databases of NPPs. To resolve these issues, this paper proposes an effective method that identifies the optimal multihazard capacity of NPPs using a multi-objective genetic algorithm and the two-stage direct quantification of fault trees using Monte Carlo simulation method, called the two-stage DQFM. Also, a capacity-based indirect capital cost measure is proposed. Such a proposed method enables NPP to achieve safety and cost-effectiveness against multi-hazard simultaneously within the computationally efficient platform. The proposed multihazard capacity optimization framework is demonstrated and tested with an earthquake-tsunami example.

A Comprehensive Approach for Tamil Handwritten Character Recognition with Feature Selection and Ensemble Learning

  • Manoj K;Iyapparaja M
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제18권6호
    • /
    • pp.1540-1561
    • /
    • 2024
  • This research proposes a novel approach for Tamil Handwritten Character Recognition (THCR) that combines feature selection and ensemble learning techniques. The Tamil script is complex and highly variable, requiring a robust and accurate recognition system. Feature selection is used to reduce dimensionality while preserving discriminative features, improving classification performance and reducing computational complexity. Several feature selection methods are compared, and individual classifiers (support vector machines, neural networks, and decision trees) are evaluated through extensive experiments. Ensemble learning techniques such as bagging, and boosting are employed to leverage the strengths of multiple classifiers and enhance recognition accuracy. The proposed approach is evaluated on the HP Labs Dataset, achieving an impressive 95.56% accuracy using an ensemble learning framework based on support vector machines. The dataset consists of 82,928 samples with 247 distinct classes, contributed by 500 participants from Tamil Nadu. It includes 40,000 characters with 500 user variations. The results surpass or rival existing methods, demonstrating the effectiveness of the approach. The research also offers insights for developing advanced recognition systems for other complex scripts. Future investigations could explore the integration of deep learning techniques and the extension of the proposed approach to other Indic scripts and languages, advancing the field of handwritten character recognition.

부스팅 트리에서 적정 트리사이즈의 선택에 관한 연구 (The guideline for choosing the right-size of tree for boosting algorithm)

  • 김아현;김지현;김현중
    • Journal of the Korean Data and Information Science Society
    • /
    • 제23권5호
    • /
    • pp.949-959
    • /
    • 2012
  • 범주형 목표변수를 잘 예측하기 위한 데이터마이닝 방법 중에서 최근에는 여러 단일 분류자를 결합한 앙상블 기법이 많이 활용되고 있다. 앙상블 기법 가운데 부스팅은 재표본 시 분류하기 어려운 관찰치의 가중치를 높여 분류자가 해당 관찰치에 보다 집중할 수 있도록 함으로써 다른 앙상블 기법에 비해 오차를 효과적으로 감소시키는 방법으로 알려져 있다. 부스팅을 구성하는 분류자를 의사결정나무로 둔 부스팅 트리 모형의 경우 각 트리의 사이즈를 결정해야 하는데, 본 연구에서는 자료 별로 부스팅 트리에 가장 적합한 트리사이즈가 서로 다를수 있다고 가정하고, 주어진 자료에 맞는 트리사이즈를 추정하는 문제에 대해 논의하였다. 우선 트리사이즈가 부스팅 트리의 정확도에 중요한 영향을 미치는가를 파악하기 위하여 28개의 자료를 대상으로 실험을 수행하였으며, 그 결과 트리사이즈를 결정하는 문제가 모형 전체의 성능을 결정하는데 상당한 역할을 한다는 것을 확인할 수 있었다. 또한 그 결과를 바탕으로 최적의 트리사이즈에 영향을 미칠 것으로 판단되는 몇 가지 특성 변수를 정의하고, 해당 변수를 이용하여 부스팅 트리에서의 최적 트리사이즈를 설명하는 모형을 구성해 보았다. 자료 별로 고유한 최적의 트리사이즈는 자료의 특성에 의존적일 가능성도 있으므로 본 연구에서 제안하는 추정방법은 최적 트리사이즈를 결정하기 위한 출발점 또는 가이드라인으로 활용하는 것이 적절할 것이다. 기존에는 부스팅 트리의 사이즈에 대한 값으로 목표변수의 범주의 개수를 활용하였는데, 본 모형에서 제안하는 트리사이즈의 추정치로 부스팅 트리를 구축한 경우 기존방법에 비해 분류정확도를 유의미하게 개선하는 것을 확인할 수 있었다.

연관관계 규칙을 이용한 학생 유지율 관리 방안 연구 (A Study on Management of Student Retention Rate Using Association Rule Mining)

  • 김종만;이동철
    • 한국산업정보학회논문지
    • /
    • 제23권6호
    • /
    • pp.67-77
    • /
    • 2018
  • 최근 학령인구 감소에 따라 많은 문제점들이 나타나고 있다. 우리나라는 인구대비 가장 많은 대학을 보유하고 있기 때문에 각 대학의 생존에 필요한 최소한의 학생 유지율 관리가 점점 더 중요해 지고 있다. 따라서 본 연구는 계속되는 학력인구의 감소에 따라 각 대학들이 생존 방안으로 학생 유지율의 적절한 관리 방안을 모색한다. 이를 위하여 특정 대학에 입학한 학생들을 대상으로 성별, 출신고, 출신지역, 성적, 졸업여부 등의 데이타를 분석하여, 학생들이 입학에서 졸업에 이르기까지 지속적으로 유지될 수 있는 학생 유지율을 관리하기 위한 기본적인 방향이 어떤 것인지 알아본다. 또한, 최적의 입력 변수를 파악하고, 최적의 입력 파라메터를 기초로 apriori 알고리즘을 이용하여 연관 분석을 실행하여 유지율 관리에 가장 적합한 자료를 수집할 수 있도록 한다. 이를 바탕으로 각 대학들이 학생들을 모집하고 유지하는데 도움이 되도록 가장 효율이 높은 딥러닝(Deep Learning) 모듈을 개발하기 위한 기초 자료로 만들고자 한다. 의사결정트리를 활용하여 졸업여부를 측정한 결과는 딥러닝의 정확도 보다 낮은 75%로 나타났다. 의사결정트리에서 졸업여부를 결정하는 요인은 일반고를 졸업하고, 도시지역에 거주하면서 여성이면서 성적이 높은 학생들이 졸업확율이 높은 것으로 나타났으며 결과적으로 의사결정트리 보다는 개발된 딥러닝듈이 더 효율적으로 학생들의 졸업여부를 평가할 수 있는 모델로 나타났다.

비용효율적 지능형 침입탐지시스템 구현을 위한 유전자 알고리즘 기반 통합 모형 (An Integrated Model based on Genetic Algorithms for Implementing Cost-Effective Intelligent Intrusion Detection Systems)

  • 이현욱;김지훈;안현철
    • 지능정보연구
    • /
    • 제18권1호
    • /
    • pp.125-141
    • /
    • 2012
  • 본 연구는 최근 그 중요성이 한층 높아지고 있는 침입탐지시스템(IDS, Intrusion Detection System)의 침입탐지모형을 개선하기 위한 방안으로 유전자 알고리즘에 기반한 새로운 통합모형을 제시한다. 본 연구의 제안모형은 서로 상호보완적 관계에 있는 이분류 모형인 로지스틱 회귀분석(LOGIT, Logistic Regression), 의사결정나무(DT, Decision Tree), 인공신경망 (ANN, Artificial Neural Network), 그리고 SVM(Support Vector Machine)의 예측결과에 적절한 가중치를 부여해 최종 예측결과를 산출하도록 하였는데, 이 때 최적 가중치의 탐색을 위한 방법으로는 유전자 알고리즘을 사용한다. 아울러, 본 연구에서는 1차적으로 오탐지율을 최소화하는 최적의 모형을 산출한 뒤, 이어 비대칭 오류비용 개념을 반영해 오탐지로 인해 발생할 수 있는 전체 비용을 최소화할 수 있는 최적 임계치를 탐색, 최종적으로 가장 비용 효율적인 침입탐지모형을 도출하고자 하였다. 본 연구에서는 제안모형의 우수성을 확인하기 위해, 국내 한 공공기관의 보안센서로부터 수집된 로그 데이터를 바탕으로 실증 분석을 수행하였다. 그 결과, 본 연구에서 제안한 유전자 알고리즘 기반 통합모형이 인공신경망이나 SVM만으로 구성된 단일모형에 비해 학습용과 검증용 데이터셋 모두에서 더 우수한 탐지율을 보임을 확인할 수 있었다. 비대칭 오류비용을 고려한 전체 비용의 관점에서도 단일모형으로 된 비교모형에 비해 본 연구의 제안모형이 더 낮은 비용을 나타냄을 확인할 수 있었다. 이렇게 실증적으로 그 효과가 검증된 본 연구의 제안 모형은 앞으로 보다 지능화된 침입탐지시스템을 개발하는데 유용하게 활용될 수 있을 것으로 기대된다.