• 제목/요약/키워드: Classification and Regression tree

검색결과 211건 처리시간 0.025초

PM10 예측 성능 향상을 위한 이진 분류 모델 비교 분석 (Comparative Analysis of the Binary Classification Model for Improving PM10 Prediction Performance)

  • 정용진;이종성;오창헌
    • 한국정보통신학회논문지
    • /
    • 제25권1호
    • /
    • pp.56-62
    • /
    • 2021
  • 미세먼지 예보에 대한 높은 정확도가 요구됨에 따라 기계 학습의 알고리즘을 적용하여 예측 정확도를 높이려는 다양한 시도들이 이루어지고 있다. 그러나 미세먼지의 특성과 불균형적인 농도별 발생 비율에 대한 문제로 예측 모델의 학습 및 예측이 잘 이루어지지 않는다. 이러한 문제를 해결하기 위해 특정 농도를 기준으로 미세먼지를 저농도와 고농도로 구분하여 예측을 수행하는 등 다양한 연구가 진행되고 있다. 본 논문에서는 미세먼지 농도의 불균형 특성으로 인한 예측 성능 향상의 문제를 해결하기 위한 미세먼지 농도의 이진 분류 모델을 제안하였다. 분류 알고리즘 중 logistic regression, decision tree, SVM 및 MLP를 이용하여 PM10에 대한 이진분류 모델들을 설계하였다. 오차 행렬을 통해 성능을 비교한 결과, 4가지 모델 중 MLP 모델이 89.98%의 정확도로 가장 높은 이진 분류 성능을 보였다.

머신러닝 알고리즘 기반의 의료비 예측 모델 개발 (Development of Medical Cost Prediction Model Based on the Machine Learning Algorithm)

  • Han Bi KIM;Dong Hoon HAN
    • Journal of Korea Artificial Intelligence Association
    • /
    • 제1권1호
    • /
    • pp.11-16
    • /
    • 2023
  • Accurate hospital case modeling and prediction are crucial for efficient healthcare. In this study, we demonstrate the implementation of regression analysis methods in machine learning systems utilizing mathematical statics and machine learning techniques. The developed machine learning model includes Bayesian linear, artificial neural network, decision tree, decision forest, and linear regression analysis models. Through the application of these algorithms, corresponding regression models were constructed and analyzed. The results suggest the potential of leveraging machine learning systems for medical research. The experiment aimed to create an Azure Machine Learning Studio tool for the speedy evaluation of multiple regression models. The tool faciliates the comparision of 5 types of regression models in a unified experiment and presents assessment results with performance metrics. Evaluation of regression machine learning models highlighted the advantages of boosted decision tree regression, and decision forest regression in hospital case prediction. These findings could lay the groundwork for the deliberate development of new directions in medical data processing and decision making. Furthermore, potential avenues for future research may include exploring methods such as clustering, classification, and anomaly detection in healthcare systems.

SMOTE와 분류 기법을 활용한 산사태 위험 지역 결정 방법 (Method for Assessing Landslide Susceptibility Using SMOTE and Classification Algorithms)

  • 윤형구
    • 한국지반공학회논문집
    • /
    • 제39권6호
    • /
    • pp.5-12
    • /
    • 2023
  • 산사태 위험 지역을 사전에 조사하여 설정하는 것은 다수의 피해를 줄이기 위해 필요하다. 해당 연구의 목적은 machine learning 기법 중 분류 알고리즘을 활용하여 대상 지반의 안전율 분류를 수행할 수 있는 방법론을 제시하는 것이다. 산사태 위험 지역은 high risk area(HRA) 모델을 적용하였으며, 8개의 지반공학 물성치를 통해 위험 지역을 판단하였다. 분류 알고리즘은 decision tree(DT), K-Nearest Neighbor(KNN), logistic regression(LR) 그리고 random forest(RF)의 4가지가 활용 되었으며, 안전율 1.2~2.0 범위에 8가지 지반공학 물성치의 분류 정확도를 계산하였다. 정확도는 안전율이 1.2~1.7 범위에서 신뢰성 높게 나타났지만, 그 외 범위인 1.8~2.0 사이에서는 상대적으로 낮은 정확도를 보였다. 이를 극복하기 위하여 synthetic minority over-sampling technique(SMOTE) 알고리즘을 적용하여 데이터 개수를 증폭하였으며, 증폭한 데이터를 통해 분류 알고리즘을 적용하면 안전율 1.8~2.0 범위에서 정확도가 평균적으로 약 250% 증가한 것으로 나타났다. 해당 연구 결과는 SMOTE 알고리즘이 데이터 개수를 향상시켜 분류 알고리즘의 정확도가 개선된 것을 보여주며, 타 분야에도 정확도 향상에 적용 가능하다고 판단된다.

The predictability of dentoskeletal factors for soft-tissue chin strain during lip closure

  • Yu, Yun-Hee;Kim, Yae-Jin;Lee, Dong-Yul;Lim, Yong-Kyu
    • 대한치과교정학회지
    • /
    • 제43권6호
    • /
    • pp.279-287
    • /
    • 2013
  • Objective: To investigate the dentoskeletal factors which may predict soft-tissue chin strain during lip closure. Methods: The pretreatment frontal and lateral facial photographs and lateral cephalograms of 209 women (aged 18-30 years) with Angle's Class I or II malocclusion were examined. The subjects were categorized by three examiners into the no-strain and strain groups according to the soft-tissue chin tension or deformation during lip closure. Relationships of the cephalometric measurements with the group classification were analyzed by logistic regression analysis, and a classification and regression tree (CART) model was used to define the predictive variables for the group classification. Results: The lower the value of the overbite depth indicator (ODI) and the higher the values of upper incisor to Nasion-Pogonion (U1-NPog, mm), overjet, and upper incisor to upper lip (U1-upper lip, mm), the more likely was the subject to be classified into the strain group. The CART showed that U1-NPog was the most prominent predictor of soft-tissue chin strain (cut-off value of 14.2 mm), followed by overjet. Conclusions: To minimize strain of the soft-tissue chin, orthodontic treatment should be oriented toward increasing the ODI value while decreasing the U1-NPog, overjet, and U1 upper lip values.

데이터 마이닝을 활용한 장기저장탄약 상태 결정요인 분석 연구 (A Study on Determinants of Stockpile Ammunition using Data Mining)

  • 노유찬;조남욱;이동녁
    • 품질경영학회지
    • /
    • 제48권2호
    • /
    • pp.297-307
    • /
    • 2020
  • Purpose: The purpose of this study is to analyze the factors that affect ammunition performance by applying data mining techniques to the Ammunition Stockpile Reliability Program (ASRP) data of the 155mm propelling charge. Methods: The ASRP data from 1999 to 2017 have been utilized. Logistic regression and decision tree analysis were used to investigate the factors that affect performance of ammunition. The performance evaluation of each model was conducted through comparison with an artificial neural networks(ANN) model. Results: The results of this study are as follows; logistic regression and the decision tree analysis showed that major defect rate of visual inspection is the most significant factor. Also, muzzle velocity by base charge and muzzle velocity by increment charge are also among the significant factors affecting the performance of 155mm propelling charge. To validate the logistic regression and decision tree models, their classification accuracies have been compared with the results of an ANN model. The results indicate that the logistic regression and decision tree models show sufficient performance which conforms the validity of the models. Conclusion: The main contribution of this paper is that, to our best knowledge, it is the first attempt at identifying the significant factors of ASPR data by using data mining techniques. The approaches suggested in the paper could also be extended to other types ammunition data.

Word2vec과 앙상블 분류기를 사용한 효율적 한국어 감성 분류 방안 (Effective Korean sentiment classification method using word2vec and ensemble classifier)

  • 박성수;이건창
    • 디지털콘텐츠학회 논문지
    • /
    • 제19권1호
    • /
    • pp.133-140
    • /
    • 2018
  • 감성 분석에서 정확한 감성 분류는 중요한 연구 주제이다. 본 연구는 최근 많은 연구가 이루어지는 word2vec과 앙상블 방법을 이용하여 효과적으로 한국어 리뷰를 감성 분류하는 방법을 제시한다. 연구는 20 만 개의 한국 영화 리뷰 텍스트에 대해, 품사 기반 BOW 자질과 word2vec를 사용한 자질을 생성하고, 두 개의 자질 표현을 결합한 통합 자질을 생성했다. 감성 분류를 위해 Logistic Regression, Decision Tree, Naive Bayes, Support Vector Machine의 단일 분류기와 Adaptive Boost, Bagging, Gradient Boosting, Random Forest의 앙상블 분류기를 사용하였다. 연구 결과로 형용사와 부사를 포함한 BOW자질과 word2vec자질로 구성된 통합 자질 표현이 가장 높은 감성 분류 정확도를 보였다. 실증결과, 단일 분류기인 SVM이 가장 높은 성능을 나타내었지만, 앙상블 분류기는 단일 분류기와 비슷하거나 약간 낮은 성능을 보였다.

엔트로피 점수를 이용한 감성분석 분류알고리즘의 수행도 평가 (Evaluation of Classification Algorithm Performance of Sentiment Analysis Using Entropy Score)

  • 박만희
    • 한국정보통신학회논문지
    • /
    • 제22권9호
    • /
    • pp.1153-1158
    • /
    • 2018
  • 다양한 온라인 고객 평가 및 소셜 미디어 정보는 고객의 의사결정에 영향을 미치기 때문에 기업에게 매우 중요한 정보 출처라고 할 수 있다. 설문 조사를 통해 고객의 다양한 요구와 불만 사항을 파악하는 데는 많은 비용과 시간적인 제약이 발생하고 있다. 온라인 쇼핑몰의 고객 후기 데이터는 제품에 대한 고객들의 감성을 분석할 수 있는 이상적인 자료를 제공하고 있다. 본 연구에서는 삼성과 애플 스마폰에 대한 감성분석을 위해 아마존 쇼핑몰로부터 고객 리뷰 데이터를 수집하였다. 선행 연구에서 대표적인 감성분석 기법으로 사용된 5가지 분류 알고리즘을 적용하였다. 5가지 분류알고리즘은 support vector machines, bagging, random forest, classification or regression tree, maximum entropy 등이다. 본 연구에서는 분류 알고리즘의 수행도를 종합적으로 평가할 수 있는 entropy score를 제안하였다. Entropy score를 이용하여 5가지 알고리즘을 평가한 결과에 따르면 support vector machines 알고리즘의 entropy score가 가장 높은 것으로 분석되었다.

An Improved Text Classification Method for Sentiment Classification

  • Wang, Guangxing;Shin, Seong Yoon
    • Journal of information and communication convergence engineering
    • /
    • 제17권1호
    • /
    • pp.41-48
    • /
    • 2019
  • In recent years, sentiment analysis research has become popular. The research results of sentiment analysis have achieved remarkable results in practical applications, such as in Amazon's book recommendation system and the North American movie box office evaluation system. Analyzing big data based on user preferences and evaluations and recommending hot-selling books and hot-rated movies to users in a targeted manner greatly improve book sales and attendance rate in movies [1, 2]. However, traditional machine learning-based sentiment analysis methods such as the Classification and Regression Tree (CART), Support Vector Machine (SVM), and k-nearest neighbor classification (kNN) had performed poorly in accuracy. In this paper, an improved kNN classification method is proposed. Through the improved method and normalizing of data, the purpose of improving accuracy is achieved. Subsequently, the three classification algorithms and the improved algorithm were compared based on experimental data. Experiments show that the improved method performs best in the kNN classification method, with an accuracy rate of 11.5% and a precision rate of 20.3%.

A Study on Improving the predict accuracy rate of Hybrid Model Technique Using Error Pattern Modeling : Using Logistic Regression and Discriminant Analysis

  • Cho, Yong-Jun;Hur, Joon
    • Journal of the Korean Data and Information Science Society
    • /
    • 제17권2호
    • /
    • pp.269-278
    • /
    • 2006
  • This paper presents the new hybrid data mining technique using error pattern, modeling of improving classification accuracy. The proposed method improves classification accuracy by combining two different supervised learning methods. The main algorithm generates error pattern modeling between the two supervised learning methods(ex: Neural Networks, Decision Tree, Logistic Regression and so on.) The Proposed modeling method has been applied to the simulation of 10,000 data sets generated by Normal and exponential random distribution. The simulation results show that the performance of proposed method is superior to the existing methods like Logistic regression and Discriminant analysis.

  • PDF

데이터 마이닝을 이용한 입원 암 환자 간호 중증도 예측모델 구축 (An Analysis of Nursing Needs for Hospitalized Cancer Patients;Using Data Mining Techniques)

  • 박선아
    • 종양간호연구
    • /
    • 제5권1호
    • /
    • pp.3-10
    • /
    • 2005
  • Back ground: Nurses now occupy one third of all hospital human resources. Therefore, efficient management of nursing manpower is getting more important. While it is very clear that nursing workload requirement analysis and patient severity classification should be done first for the efficient allocation of nursing workforce, these processes have been conducted manually with ad hoc rule. Purposes: This study was tried to make a predict model for patient classification according to nursing need. We tried to find the easier and faster method to classify nursing patients that can help efficient management of nursing manpower. Methods: The nursing patient classifications data of the hospitalized cancer patients in one of the biggest cancer center in Korea during 2003.1.1-2003.12.31 were assessed by trained nurses. This study developed a prediction model and analyzing nursing needs by data mining techniques. Patients were classified by three different data mining techniques, (Logistic regression, Decision tree and Neural network) and the results were assessed. Results: The data set was created using 165,073 records of 2,228 patients classification database. Main explaining variables were as follows in 3 different data mining techniques. 1) Logistic regression : age, month and section. 2) Decision tree : section, month, age and tumor. 3) Neural network : section, diagnosis, age, sex, metastasis, hospital days and month. Among these three techniques, neural network showed the best prediction power in ROC curve verification. As the result of the patient classification prediction model developed by neural network based on nurse needs, the prediction accuracy was 84.06%. Conclusion: The patient classification prediction model was developed and tested in this study using real patients data. The result can be employed for more accurate calculation of required nursing staff and effective use of labor force.

  • PDF