• 제목/요약/키워드: machine learning for regression

검색결과 570건 처리시간 0.029초

머신러닝 모델을 이용한 석산 개발 발파진동 예측 (Prediction of Blast Vibration in Quarry Using Machine Learning Models)

  • 정다희;최요순
    • 터널과지하공간
    • /
    • 제31권6호
    • /
    • pp.508-519
    • /
    • 2021
  • 본 연구에서는 발파 시 사람과 주변 환경에 영향을 끼치는 발파진동(peak particle velocity, PPV)을 예측하는 모델을 개발하였다. PPV를 예측하기 위해 kNN(k-nearest neighbors), CART(classification and regression tree), SVR(support vector regression), PSO(particle swarm optimization)-SVR 알고리즘을 이용한 4가지 머신러닝 모델을 개발하고 상호 비교하였다. 머신러닝 모델을 훈련하기 위해 경상남도 창원시에 있는 욕망산을 연구지역으로 선정하고 1048개의 발파 데이터를 획득하였다. 발파 데이터는 천공장, 저항선, 공간격, 최대지발장약량, 비장약량, 총공수, 에멀전비율, 이격거리, PPV로 구성되었다. 훈련된 모델들의 성능을 평가하기 위한 지표 값으로 MAE(mean absolute error), MSE(mean squared error), RMSE(root mean squared error)를 사용하였다. 평가결과 PSO-SVR 모델이 MAE, MSE, RMSE가 각각 0.0348, 0.0021, 0.0458으로 가장 우수한 예측 성능을 나타냈다. 마지막으로 개발된 머신러닝 모델을 이용하여 주변 환경에 영향을 끼치는 정도를 예측하는 방법을 제시하였다.

사망사고와 부상사고의 산업재해분류를 위한 기계학습 접근법 (Machine Learning Approach to Classifying Fatal and Non-Fatal Accidents in Industries)

  • 강성식;장성록;서용윤
    • 한국안전학회지
    • /
    • 제36권5호
    • /
    • pp.52-60
    • /
    • 2021
  • As the prevention of fatal accidents is considered an essential part of social responsibilities, both government and individual have devoted efforts to mitigate the unsafe conditions and behaviors that facilitate accidents. Several studies have analyzed the factors that cause fatal accidents and compared them to those of non-fatal accidents. However, studies on mathematical and systematic analysis techniques for identifying the features of fatal accidents are rare. Recently, various industrial fields have employed machine learning algorithms. This study aimed to apply machine learning algorithms for the classification of fatal and non-fatal accidents based on the features of each accident. These features were obtained by text mining literature on accidents. The classification was performed using four machine learning algorithms, which are widely used in industrial fields, including logistic regression, decision tree, neural network, and support vector machine algorithms. The results revealed that the machine learning algorithms exhibited a high accuracy for the classification of accidents into the two categories. In addition, the importance of comparing similar cases between fatal and non-fatal accidents was discussed. This study presented a method for classifying accidents using machine learning algorithms based on the reports on previous studies on accidents.

누설 인덕턴스를 포함한 DAB 컨버터용 고주파 변압기의 머신러닝 활용한 최적 설계 (Machine-Learning Based Optimal Design of A Large-leakage High-frequency Transformer for DAB Converters)

  • 노은총;김길동;이승환
    • 전력전자학회논문지
    • /
    • 제27권6호
    • /
    • pp.507-514
    • /
    • 2022
  • This study proposes an optimal design process for a high-frequency transformer that has a large leakage inductance for dual-active-bridge converters. Notably, conventional design processes have large errors in designing leakage transformers because mathematically modeling the leakage inductance of such transformers is difficult. In this work, the geometric parameters of a shell-type transformer are identified, and finite element analysis(FEA) simulation is performed to determine the magnetization inductance, leakage inductance, and copper loss of various shapes of shell-type transformers. Regression models for magnetization and leakage inductances and copper loss are established using the simulation results and the machine learning technique. In addition, to improve the regression models' performance, the regression models are tuned by adding featured parameters that consider the physical characteristics of the transformer. With the regression models, optimal high-frequency transformer designs and the Pareto front (in terms of volume and loss) are determined using NSGA-II. In the Pareto front, a desirable optimal design is selected and verified by FEA simulation and experimentation. The simulated and measured leakage inductances of the selected design match well, and this result shows the validity of the proposed design process.

Landslide susceptibility assessment using feature selection-based machine learning models

  • Liu, Lei-Lei;Yang, Can;Wang, Xiao-Mi
    • Geomechanics and Engineering
    • /
    • 제25권1호
    • /
    • pp.1-16
    • /
    • 2021
  • Machine learning models have been widely used for landslide susceptibility assessment (LSA) in recent years. The large number of inputs or conditioning factors for these models, however, can reduce the computation efficiency and increase the difficulty in collecting data. Feature selection is a good tool to address this problem by selecting the most important features among all factors to reduce the size of the input variables. However, two important questions need to be solved: (1) how do feature selection methods affect the performance of machine learning models? and (2) which feature selection method is the most suitable for a given machine learning model? This paper aims to address these two questions by comparing the predictive performance of 13 feature selection-based machine learning (FS-ML) models and 5 ordinary machine learning models on LSA. First, five commonly used machine learning models (i.e., logistic regression, support vector machine, artificial neural network, Gaussian process and random forest) and six typical feature selection methods in the literature are adopted to constitute the proposed models. Then, fifteen conditioning factors are chosen as input variables and 1,017 landslides are used as recorded data. Next, feature selection methods are used to obtain the importance of the conditioning factors to create feature subsets, based on which 13 FS-ML models are constructed. For each of the machine learning models, a best optimized FS-ML model is selected according to the area under curve value. Finally, five optimal FS-ML models are obtained and applied to the LSA of the studied area. The predictive abilities of the FS-ML models on LSA are verified and compared through the receive operating characteristic curve and statistical indicators such as sensitivity, specificity and accuracy. The results showed that different feature selection methods have different effects on the performance of LSA machine learning models. FS-ML models generally outperform the ordinary machine learning models. The best FS-ML model is the recursive feature elimination (RFE) optimized RF, and RFE is an optimal method for feature selection.

기계학습을 적용한 자기보고 증상 기반의 어혈 변증 모델 구축 (Machine Learning Approach to Blood Stasis Pattern Identification Based on Self-reported Symptoms)

  • 김현호;양승범;강연석;박영배;김재효
    • Korean Journal of Acupuncture
    • /
    • 제33권3호
    • /
    • pp.102-113
    • /
    • 2016
  • Objectives : This study is aimed at developing and discussing the prediction model of blood stasis pattern of traditional Korean medicine(TKM) using machine learning algorithms: multiple logistic regression and decision tree model. Methods : First, we reviewed the blood stasis(BS) questionnaires of Korean, Chinese, and Japanese version to make a integrated BS questionnaire of patient-reported outcomes. Through a human subject research, patients-reported BS symptoms data were acquired. Next, experts decisions of 5 Korean medicine doctor were also acquired, and supervised learning models were developed using multiple logistic regression and decision tree. Results : Integrated BS questionnaire with 24 items was developed. Multiple logistic regression models with accuracy of 0.92(male) and 0.95(female) validated by 10-folds cross-validation were constructed. By decision tree modeling methods, male model with 8 decision node and female model with 6 decision node were made. In the both models, symptoms of 'recent physical trauma', 'chest pain', 'numbness', and 'menstrual disorder(female only)' were considered as important factors. Conclusions : Because machine learning, especially supervised learning, can reveal and suggest important or essential factors among the very various symptoms making up a pattern identification, it can be a very useful tool in researching diagnostics of TKM. With a proper patient-reported outcomes or well-structured database, it can also be applied to a pre-screening solutions of healthcare system in Mibyoung stage.

Prediction of Postoperative Lung Function in Lung Cancer Patients Using Machine Learning Models

  • Oh Beom Kwon;Solji Han;Hwa Young Lee;Hye Seon Kang;Sung Kyoung Kim;Ju Sang Kim;Chan Kwon Park;Sang Haak Lee;Seung Joon Kim;Jin Woo Kim;Chang Dong Yeo
    • Tuberculosis and Respiratory Diseases
    • /
    • 제86권3호
    • /
    • pp.203-215
    • /
    • 2023
  • Background: Surgical resection is the standard treatment for early-stage lung cancer. Since postoperative lung function is related to mortality, predicted postoperative lung function is used to determine the treatment modality. The aim of this study was to evaluate the predictive performance of linear regression and machine learning models. Methods: We extracted data from the Clinical Data Warehouse and developed three sets: set I, the linear regression model; set II, machine learning models omitting the missing data: and set III, machine learning models imputing the missing data. Six machine learning models, the least absolute shrinkage and selection operator (LASSO), Ridge regression, ElasticNet, Random Forest, eXtreme gradient boosting (XGBoost), and the light gradient boosting machine (LightGBM) were implemented. The forced expiratory volume in 1 second measured 6 months after surgery was defined as the outcome. Five-fold cross-validation was performed for hyperparameter tuning of the machine learning models. The dataset was split into training and test datasets at a 70:30 ratio. Implementation was done after dataset splitting in set III. Predictive performance was evaluated by R2 and mean squared error (MSE) in the three sets. Results: A total of 1,487 patients were included in sets I and III and 896 patients were included in set II. In set I, the R2 value was 0.27 and in set II, LightGBM was the best model with the highest R2 value of 0.5 and the lowest MSE of 154.95. In set III, LightGBM was the best model with the highest R2 value of 0.56 and the lowest MSE of 174.07. Conclusion: The LightGBM model showed the best performance in predicting postoperative lung function.

온라인 무료 샘플 판촉의 효과적 활용을 위한 기계학습 기반 고객분류예측 모형 (A Machine Learning-based Customer Classification Model for Effective Online Free Sample Promotions)

  • 원하람;김무전;안현철
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제27권3호
    • /
    • pp.63-80
    • /
    • 2018
  • Purpose The purpose of this study is to build a machine learning-based customer classification model to promote customer expansion effect of the free sample promotion. Specifically, the proposed model classifies potential target customers who are expected to purchase the products included in the free sample promotion after receiving the free samples. Design/methodology/approach This study proposes to build a customer classification model for determining customers suitable for providing free samples by using various machine learning techniques such as logistic regression, multiple discriminant analysis, case-based reasoning, decision tree, artificial neural network, and support vector machine. To validate the usefulness of the proposed model, we apply it to a real-world free sample-based target marketing case of a Korean major cosmetic retail company. Findings Experimental results show that a machine learning-based customer classification model presents satisfactory accuracy ranging from 70% to 75%. In particular, support vector machine is found to be the most effective machine learning technique for free sample-based target marketing model. Our study sheds a light on customer relationship management strategies using free sample promotions.

차분진화 기반의 Support Vector Clustering (A Differential Evolution based Support Vector Clustering)

  • 전성해
    • 한국지능시스템학회논문지
    • /
    • 제17권5호
    • /
    • pp.679-683
    • /
    • 2007
  • Vapnik의 통계적 학습이론은 분류, 회귀, 그리고 군집화를 위하여 SVM(support vector machine), SVR(support vector regression), 그리고 SVC(support vector clustering)의 3가지 학습 알고리즘을 포함한다. 이들 중에서 SVC는 가우시안 커널함수에 기반한 지지벡터를 이용하여 비교적 우수한 군집화 결과를 제공하고 있다. 하지만 SVM, SVR과 마찬가지로 SVC도 커널모수와 정규화상수에 대한 최적결정이 요구된다 하지만 대부분의 분석작업에서 사용자의 주관적 경험에 의존하거나 격자탐색과 같이 많은 컴퓨팅 시간을 요구하는 전략에 의존하고 있다. 본 논문에서는 SVC에서 사용되는 커널모수와 정규화상수의 효율적인 결정을 위하여 차분진화를 이용한 DESVC(differential evolution based SVC)를 제안한다 UCI Machine Learning repository의 학습데이터와 시뮬레이션 데이터 집합들을 이용한 실험을 통하여 기존의 기계학습 알고리즘과의 성능평가를 수행한다.

기계학습을 이용한 유동가속부식 모델링: 랜덤 포레스트와 비선형 회귀분석과의 비교 (Modeling of Flow-Accelerated Corrosion using Machine Learning: Comparison between Random Forest and Non-linear Regression)

  • 이경근;이은희;김성우;김경모;김동진
    • Corrosion Science and Technology
    • /
    • 제18권2호
    • /
    • pp.61-71
    • /
    • 2019
  • Flow-Accelerated Corrosion (FAC) is a phenomenon in which a protective coating on a metal surface is dissolved by a flow of fluid in a metal pipe, leading to continuous wall-thinning. Recently, many countries have developed computer codes to manage FAC in power plants, and the FAC prediction model in these computer codes plays an important role in predictive performance. Herein, the FAC prediction model was developed by applying a machine learning method and the conventional nonlinear regression method. The random forest, a widely used machine learning technique in predictive modeling led to easy calculation of FAC tendency for five input variables: flow rate, temperature, pH, Cr content, and dissolved oxygen concentration. However, the model showed significant errors in some input conditions, and it was difficult to obtain proper regression results without using additional data points. In contrast, nonlinear regression analysis predicted robust estimation even with relatively insufficient data by assuming an empirical equation and the model showed better predictive power when the interaction between DO and pH was considered. The comparative analysis of this study is believed to provide important insights for developing a more sophisticated FAC prediction model.

Machine Learning Approaches to Corn Yield Estimation Using Satellite Images and Climate Data: A Case of Iowa State

  • Kim, Nari;Lee, Yang-Won
    • 한국측량학회지
    • /
    • 제34권4호
    • /
    • pp.383-390
    • /
    • 2016
  • Remote sensing data has been widely used in the estimation of crop yields by employing statistical methods such as regression model. Machine learning, which is an efficient empirical method for classification and prediction, is another approach to crop yield estimation. This paper described the corn yield estimation in Iowa State using four machine learning approaches such as SVM (Support Vector Machine), RF (Random Forest), ERT (Extremely Randomized Trees) and DL (Deep Learning). Also, comparisons of the validation statistics among them were presented. To examine the seasonal sensitivities of the corn yields, three period groups were set up: (1) MJJAS (May to September), (2) JA (July and August) and (3) OC (optimal combination of month). In overall, the DL method showed the highest accuracies in terms of the correlation coefficient for the three period groups. The accuracies were relatively favorable in the OC group, which indicates the optimal combination of month can be significant in statistical modeling of crop yields. The differences between our predictions and USDA (United States Department of Agriculture) statistics were about 6-8 %, which shows the machine learning approaches can be a viable option for crop yield modeling. In particular, the DL showed more stable results by overcoming the overfitting problem of generic machine learning methods.