• Title/Summary/Keyword: Ensemble Machine Learning Models

Search Result 138, Processing Time 0.02 seconds

Improved Estimation of Hourly Surface Ozone Concentrations using Stacking Ensemble-based Spatial Interpolation (스태킹 앙상블 모델을 이용한 시간별 지상 오존 공간내삽 정확도 향상)

  • KIM, Ye-Jin;KANG, Eun-Jin;CHO, Dong-Jin;LEE, Si-Woo;IM, Jung-Ho
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.25 no.3
    • /
    • pp.74-99
    • /
    • 2022
  • Surface ozone is produced by photochemical reactions of nitrogen oxides(NOx) and volatile organic compounds(VOCs) emitted from vehicles and industrial sites, adversely affecting vegetation and the human body. In South Korea, ozone is monitored in real-time at stations(i.e., point measurements), but it is difficult to monitor and analyze its continuous spatial distribution. In this study, surface ozone concentrations were interpolated to have a spatial resolution of 1.5km every hour using the stacking ensemble technique, followed by a 5-fold cross-validation. Base models for the stacking ensemble were cokriging, multi-linear regression(MLR), random forest(RF), and support vector regression(SVR), while MLR was used as the meta model, having all base model results as additional input variables. The results showed that the stacking ensemble model yielded the better performance than the individual base models, resulting in an averaged R of 0.76 and RMSE of 0.0065ppm during the study period of 2020. The surface ozone concentration distribution generated by the stacking ensemble model had a wider range with a spatial pattern similar with terrain and urbanization variables, compared to those by the base models. Not only should the proposed model be capable of producing the hourly spatial distribution of ozone, but it should also be highly applicable for calculating the daily maximum 8-hour ozone concentrations.

An AutoML-driven Antenna Performance Prediction Model in the Autonomous Driving Radar Manufacturing Process

  • So-Hyang Bak;Kwanghoon Pio Kim
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.12
    • /
    • pp.3330-3344
    • /
    • 2023
  • This paper proposes an antenna performance prediction model in the autonomous driving radar manufacturing process. Our research work is based upon a challenge dataset, Driving Radar Manufacturing Process Dataset, and a typical AutoML machine learning workflow engine, Pycaret open-source Python library. Note that the dataset contains the total 70 data-items, out of which 54 used as input features and 16 used as output features, and the dataset is properly built into resolving the multi-output regression problem. During the data regression analysis and preprocessing phase, we identified several input features having similar correlations and so detached some of those input features, which may become a serious cause of the multicollinearity problem that affect the overall model performance. In the training phase, we train each of output-feature regression models by using the AutoML approach. Next, we selected the top 5 models showing the higher performances in the AutoML result reports and applied the ensemble method so as for the selected models' performances to be improved. In performing the experimental performance evaluation of the regression prediction model, we particularly used two metrics, MAE and RMSE, and the results of which were 0.6928 and 1.2065, respectively. Additionally, we carried out a series of experiments to verify the proposed model's performance by comparing with other existing models' performances. In conclusion, we enhance accuracy for safer autonomous vehicles, reduces manufacturing costs through AutoML-Pycaret and machine learning ensembled model, and prevents the production of faulty radar systems, conserving resources. Ultimately, the proposed model holds significant promise not only for antenna performance but also for improving manufacturing quality and advancing radar systems in autonomous vehicles.

Indoor positioning method using WiFi signal based on XGboost (XGboost 기반의 WiFi 신호를 이용한 실내 측위 기법)

  • Hwang, Chi-Gon;Yoon, Chang-Pyo;Kim, Dae-Jin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.1
    • /
    • pp.70-75
    • /
    • 2022
  • Accurately measuring location is necessary to provide a variety of services. The data for indoor positioning measures the RSSI values from the WiFi device through an application of a smartphone. The measured data becomes the raw data of machine learning. The feature data is the measured RSSI value, and the label is the name of the space for the measured position. For this purpose, the machine learning technique is to study a technique that predicts the exact location only with the WiFi signal by applying an efficient technique to classification. Ensemble is a technique for obtaining more accurate predictions through various models than one model, including backing and boosting. Among them, Boosting is a technique for adjusting the weight of a model through a modeling result based on sampled data, and there are various algorithms. This study uses Xgboost among the above techniques and evaluates performance with other ensemble techniques.

Comparison of the Performance of Machine Learning Models for TOC Prediction Based on Input Variable Composition (입력변수 구성에 따른 총유기탄소(TOC) 예측 머신러닝 모형의 성능 비교)

  • Sohyun Lee;Jungsu Park
    • Journal of the Korea Organic Resources Recycling Association
    • /
    • v.32 no.3
    • /
    • pp.19-29
    • /
    • 2024
  • Total organic carbon (TOC) represents the total amount of organic carbon contained in water and is a key water quality parameter used, along with biochemical oxygen demand (BOD) and chemical oxygen demand (COD), to quantify the amount of organic matter in water. In this study, a model to predict TOC was developed using XGBoost (XGB), a representative ensemble machine learning algorithm. Independent variables for model construction included water temperature, pH, electrical conductivity, dissolved oxygen concentration, BOD, COD, suspended solids, total nitrogen, total phosphorus, and discharge. To quantitatively analyze the impact of various water quality parameters used in model construction, the feature importance of input variables was calculated. Based on the results of feature importance analysis, items with low importance were sequentially excluded to observe changes in model performance. When built by sequentially excluding items with low importance, the performance of the model showed a root mean squared error-observation standard deviation ratio (RSR) range of 0.53 to 0.55. The model that applied all input variables showed the best performance with an RSR value of 0.53. To enhance the model's field applicability, models using relatively easily measurable parameters were also built, and the performance changes were analyzed. The results showed that a model constructed using only the relatively easily measurable parameters of water temperature, electrical conductivity, pH, dissolved oxygen concentration, and suspended solids had an RSR of 0.72. This indicates that stable performance can be achieved using relatively easily measurable field water quality parameters.

A Study on the Prediction of Cabbage Price Using Ensemble Voting Techniques (앙상블 Voting 기법을 활용한 배추 가격 예측에 관한 연구)

  • Lee, Chang-Min;Song, Sung-Kwang;Chung, Sung-Wook
    • Journal of Convergence for Information Technology
    • /
    • v.12 no.3
    • /
    • pp.1-10
    • /
    • 2022
  • Vegetables such as cabbage are greatly affected by natural disasters, so price fluctuations increase due to disasters such as heavy rain and disease, which affects the farm economy. Various efforts have been made to predict the price of agricultural products to solve this problem, but it is difficult to predict extreme price prediction fluctuations. In this study, cabbage prices were analyzed using the ensemble Voting technique, a method of determining the final prediction results through various classifiers by combining a single classifier. In addition, the results were compared with LSTM, a time series analysis method, and XGBoost and RandomForest, a boosting technique. Daily data was used for price data, and weather information and price index that affect cabbage prices were used. As a result of the study, the RMSE value showing the difference between the actual value and the predicted value is about 236. It is expected that this study can be used to select other time series analysis research models such as predicting agricultural product prices

Predicting the Number of People for Meals of an Institutional Foodservice by Applying Machine Learning Methods: S City Hall Case (기계학습방법을 활용한 대형 집단급식소의 식수 예측: S시청 구내직원식당의 실데이터를 기반으로)

  • Jeon, Jongshik;Park, Eunju;Kwon, Ohbyung
    • Journal of the Korean Dietetic Association
    • /
    • v.25 no.1
    • /
    • pp.44-58
    • /
    • 2019
  • Predicting the number of meals in a foodservice organization is an important decision-making process that is essential for successful food production, such as reducing the amount of residue, preventing menu quality deterioration, and preventing rising costs. Compared to other demand forecasts, the menu of dietary personnel includes diverse menus, and various dietary supplements include a range of side dishes. In addition to the menus, diverse subjects for prediction are very difficult problems. Therefore, the purpose of this study was to establish a method for predicting the number of meals including predictive modeling and considering various factors in addition to menus which are actually used in the field. For this purpose, 63 variables in eight categories such as the daily available number of people for the meals, the number of people in the time series, daily menu details, weekdays or seasons, days before or after holidays, weather and temperature, holidays or year-end, and events were identified as decision variables. An ensemble model using six prediction models was then constructed to predict the number of meals. As a result, the prediction error rate was reduced from 10%~11% to approximately 6~7%, which was expected to reduce the residual amount by approximately 40%.

Sintering process optimization of ZnO varistor materials by machine learning based metamodel (기계학습 기반의 메타모델을 활용한 ZnO 바리스터 소결 공정 최적화 연구)

  • Kim, Boyeol;Seo, Ga Won;Ha, Manjin;Hong, Youn-Woo;Chung, Chan-Yeup
    • Journal of the Korean Crystal Growth and Crystal Technology
    • /
    • v.31 no.6
    • /
    • pp.258-263
    • /
    • 2021
  • ZnO varistor is a semiconductor device which can serve to protect the circuit from surge voltage because its non-linear I-V characteristics by controlling the microstructure of grain and grain boundaries. In order to obtain desired electrical properties, it is important to control microstructure evolution during the sintering process. In this research, we defined a dataset composed of process conditions of sintering and relative permittivity of sintered body, and collected experimental dataset with DOE. Meta-models can predict permittivity were developed by learning the collected experimental dataset on various machine learning algorithms. By utilizing the meta-model, we can derive optimized sintering conditions that could show the maximum permittivity from the numerical-based HMA (Hybrid Metaheuristic Algorithm) optimization algorithm. It is possible to search the optimal process conditions with minimum number of experiments if meta-model-based optimization is applied to ceramic processing.

AutoML and CNN-based Soft-voting Ensemble Classification Model For Road Traffic Emerging Risk Detection (도로교통 이머징 리스크 탐지를 위한 AutoML과 CNN 기반 소프트 보팅 앙상블 분류 모델)

  • Jeon, Byeong-Uk;Kang, Ji-Soo;Chung, Kyungyong
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.7
    • /
    • pp.14-20
    • /
    • 2021
  • Most accidents caused by road icing in winter lead to major accidents. Because it is difficult for the driver to detect the road icing in advance. In this work, we study how to accurately detect road traffic emerging risk using AutoML and CNN's ensemble model that use both structured and unstructured data. We train CNN-based road traffic emerging risk classification model using images that are unstructured data and AutoML-based road traffic emerging risk classification model using weather data that is structured data, respectively. After that the ensemble model is designed to complement the CNN-based classification model by inputting probability values derived from of each models. Through this, improves road traffic emerging risk classification performance and alerts drivers more accurately and quickly to enable safe driving.

An Empirical Analysis of Boosing of Neural Networks for Bankruptcy Prediction (부스팅 인공신경망학습의 기업부실예측 성과비교)

  • Kim, Myoung-Jong;Kang, Dae-Ki
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.1
    • /
    • pp.63-69
    • /
    • 2010
  • Ensemble is one of widely used methods for improving the performance of classification and prediction models. Two popular ensemble methods, Bagging and Boosting, have been applied with great success to various machine learning problems using mostly decision trees as base classifiers. This paper performs an empirical comparison of Boosted neural networks and traditional neural networks on bankruptcy prediction tasks. Experimental results on Korean firms indicated that the boosted neural networks showed the improved performance over traditional neural networks.

Improving Efficiency of Food Hygiene Surveillance System by Using Machine Learning-Based Approaches (기계학습을 이용한 식품위생점검 체계의 효율성 개선 연구)

  • Cho, Sanggoo;Cho, Seung Yong
    • The Journal of Bigdata
    • /
    • v.5 no.2
    • /
    • pp.53-67
    • /
    • 2020
  • This study employees a supervised learning prediction model to detect nonconformity in advance of processed food manufacturing and processing businesses. The study was conducted according to the standard procedure of machine learning, such as definition of objective function, data preprocessing and feature engineering and model selection and evaluation. The dependent variable was set as the number of supervised inspection detections over the past five years from 2014 to 2018, and the objective function was to maximize the probability of detecting the nonconforming companies. The data was preprocessed by reflecting not only basic attributes such as revenues, operating duration, number of employees, but also the inspections track records and extraneous climate data. After applying the feature variable extraction method, the machine learning algorithm was applied to the data by deriving the company's risk, item risk, environmental risk, and past violation history as feature variables that affect the determination of nonconformity. The f1-score of the decision tree, one of ensemble models, was much higher than those of other models. Based on the results of this study, it is expected that the official food control for food safety management will be enhanced and geared into the data-evidence based management as well as scientific administrative system.