• 제목/요약/키워드: stacking regression

검색결과 27건 처리시간 0.025초

Robustness of model averaging methods for the violation of standard linear regression assumptions

  • Lee, Yongsu;Song, Juwon
    • Communications for Statistical Applications and Methods
    • /
    • 제28권2호
    • /
    • pp.189-204
    • /
    • 2021
  • In a regression analysis, a single best model is usually selected among several candidate models. However, it is often useful to combine several candidate models to achieve better performance, especially, in the prediction viewpoint. Model combining methods such as stacking and Bayesian model averaging (BMA) have been suggested from the perspective of averaging candidate models. When the candidate models include a true model, it is expected that BMA generally gives better performance than stacking. On the other hand, when candidate models do not include the true model, it is known that stacking outperforms BMA. Since stacking and BMA approaches have different properties, it is difficult to determine which method is more appropriate under other situations. In particular, it is not easy to find research papers that compare stacking and BMA when regression model assumptions are violated. Therefore, in the paper, we compare the performance among model averaging methods as well as a single best model in the linear regression analysis when standard linear regression assumptions are violated. Simulations were conducted to compare model averaging methods with the linear regression when data include outliers and data do not include them. We also compared them when data include errors from a non-normal distribution. The model averaging methods were applied to the water pollution data, which have a strong multicollinearity among variables. Simulation studies showed that the stacking method tends to give better performance than BMA or standard linear regression analysis (including the stepwise selection method) in the sense of risks (see (3.1)) or prediction error (see (3.2)) when typical linear regression assumptions are violated.

다단적재 복합들기 작업에 대한 NIOSH 단순들기 수식의 적용 모형 개발 (Development of an Application Model of Simple NIOSH Lifting Equation to Multi-stacking Complex Lifting Tasks)

  • 박재희
    • 한국안전학회지
    • /
    • 제24권2호
    • /
    • pp.76-82
    • /
    • 2009
  • The NIOSH lifting equation has been used as a dominant tool in evaluating the hazard levels of lifting tasks. Although it provides two different ways for each simple and complex lifting task, the NIOSH simple lifting equation is almost used for not only simple tasks but also complex tasks. However, most of lifting tasks in industries are in the form of complex lifting. Therefore some errors occur inevitably in the evaluation of complex lifting tasks. Among complex lifting tasks, a multi-stacking task is the most popular in lifting tasks. To compensate the error in the evaluation of multi-stacking tasks by using the NIOSH simple lifting equation, a set of calculations for finding LIs(Lifting Indices) was performed for the systematically varying multi-stacking tasks. Then a regression model which finds the equivalent height in simple lifting task for multi-stacking task was established. By using this model, multi-stacking tasks can be evaluated with less error. To validate this model, some real multi-stacking tasks were evaluated as examples.

Predicting movie audience with stacked generalization by combining machine learning algorithms

  • Park, Junghoon;Lim, Changwon
    • Communications for Statistical Applications and Methods
    • /
    • 제28권3호
    • /
    • pp.217-232
    • /
    • 2021
  • The Korea film industry has matured and the number of movie-watching per capita has reached the highest level in the world. Since then, movie industry growth rate is decreasing and even the total sales of movies per year slightly decreased in 2018. The number of moviegoers is the first factor of sales in movie industry and also an important factor influencing additional sales. Thus it is important to predict the number of movie audiences. In this study, we predict the cumulative number of audiences of films using stacking, an ensemble method. Stacking is a kind of ensemble method that combines all the algorithms used in the prediction. We use box office data from Korea Film Council and web comment data from Daum Movie (www.movie.daum.net). This paper describes the process of collecting and preprocessing of explanatory variables and explains regression models used in stacking. Final stacking model outperforms in the prediction of test set in terms of RMSE.

Estimation of lightweight aggregate concrete characteristics using a novel stacking ensemble approach

  • Kaloop, Mosbeh R.;Bardhan, Abidhan;Hu, Jong Wan;Abd-Elrahman, Mohamed
    • Advances in nano research
    • /
    • 제13권5호
    • /
    • pp.499-512
    • /
    • 2022
  • This study investigates the efficiency of ensemble machine learning for predicting the lightweight-aggregate concrete (LWC) characteristics. A stacking ensemble (STEN) approach was proposed to estimate the dry density (DD) and 28 days compressive strength (Fc-28) of LWC using two meta-models called random forest regressor (RFR) and extra tree regressor (ETR), and two novel ensemble models called STEN-RFR and STEN-ETR, were constructed. Four standalone machine learning models including artificial neural network, gradient boosting regression, K neighbor regression, and support vector regression were used to compare the performance of the proposed models. For this purpose, a sum of 140 LWC mixtures with 21 influencing parameters for producing LWC with a density less than 1000 kg/m3, were used. Based on the experimental results with multiple performance criteria, it can be concluded that the proposed STEN-ETR model can be used to estimate the DD and Fc-28 of LWC. Moreover, the STEN-ETR approach was found to be a significant technique in prediction DD and Fc-28 of LWC with minimal prediction error. In the validation phase, the accuracy of the proposed STEN-ETR model in predicting DD and Fc-28 was found to be 96.79% and 81.50%, respectively. In addition, the significance of cement, water-cement ratio, silica fume, and aggregate with expanded glass variables is efficient in modeling DD and Fc-28 of LWC.

스태킹 앙상블 모델을 이용한 시간별 지상 오존 공간내삽 정확도 향상 (Improved Estimation of Hourly Surface Ozone Concentrations using Stacking Ensemble-based Spatial Interpolation)

  • 김예진;강은진;조동진;이시우;임정호
    • 한국지리정보학회지
    • /
    • 제25권3호
    • /
    • pp.74-99
    • /
    • 2022
  • 지상 오존은 차량 및 산업 현장에서 배출된 질소화합물(Nitrogen oxides; NOx)과 휘발성 유기화합물(Volatile Organic Compounds; VOCs)의 광화학 반응을 통해 생성되어 식생 및 인체에 악영향을 끼친다. 국내에서는 실시간 오존 모니터링을 수행하고 있지만 관측소 기반으로, 미관측 지역의 공간 분포 분석에 어려움이 있다. 본 연구에서는 스태킹 앙상블 기법을 활용하여 매시간 남한 지역의 지상 오존 농도를 1.5km의 공간해상도로 공간내삽하였고, 5-fold 교차검증을 수행하였다. 스태킹 앙상블의 베이스 모델로는 코크리깅(Cokriging), 다중 선형 회귀(Multi-Linear Regression; MLR), 랜덤 포레스트(Random Forest; RF), 서포트 벡터 회귀(Support Vector Regression; SVR)를 사용하였다. 각 모델의 정확도 비교 평가 결과, 스태킹 앙상블 모델이 연구 기간 내 시간별 평균 R 및 RMSE이 0.76, 0.0065ppm으로 가장 높은 성능을 보여주었다. 스태킹 앙상블 모델의 지상 오존 농도 지도는 복잡한 지형 및 도시화 변수의 특징이 잘 드러나며 더 넓은 농도 범위를 보여주었다. 개발된 모델은 매시간 공간적으로 연속적인 공간 지도를 산출할 수 있을 뿐만 아니라 8시간 평균치 산출 및 시계열 분석에 있어서도 활용 가능성이 클 것으로 기대된다.

스마트 폰의 터치 스트로크 지속적 인증을 위한 스태킹 커널 릿지 리그레션 네트워크 (Stacking Kernel Ridge Regression Network for Smart Phone's Touch-Stroke Continuous Authentication)

  • 장인호;앤드류 테오뱅진
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2018년도 춘계학술발표대회
    • /
    • pp.381-383
    • /
    • 2018
  • 이 논문은 스마트 폰에서 터치 스트로크를 이용하여 지속적 인증을 할 수 있는 딥 러닝 네트워크인 스태킹 커널 릿지 리그레션 네트워크 (Stacking Kernel Ridge Regression Network: SKRRN)에 대한 연구이다. SKRRN 은 여러 개의 커널 릿지 리그레션 (Kernel Ridge Regression: KRR) 으로 구성되어있고, 계층적이며 모든 KRR 은 해석적이고 독립적으로 훈련된다. SKRRN 은 다른 딥 러닝 네트워크와는 다르게 비가공 터치 스트로크 데이터로부터 특징을 배우지 않고 Hand-Crafted 피처와 같이 추출된 데이터로부터 재학습을 한다. 이러한 재학습은 기존 데이터 셋을 더 구별 하기 쉽고 풍부하게 만들어준다. SKRRN 은 HMOG 데이터 셋을 사용하여 4.295%의 동일 오류율을 달성하였다.

Effect of FRP parameters in strengthening the tubular joint for offshore structures

  • Prashob, P.S.;Shashikala, A.P.;Somasundaran, T.P.
    • Ocean Systems Engineering
    • /
    • 제8권4호
    • /
    • pp.409-426
    • /
    • 2018
  • This paper presents the strengthening of tubular joint by wrapping Carbon fiber reinforced polymer (CFRP) and glass fiber reinforced polymer (GFRP). In this study, total number of layers, stacking sequence and length of wrapping are the different parameters involved when fiber reinforced polymers (FRP) composites are used for strengthening. For this, parameters where varied and results were compared with the reference joint. The best stacking sequence was identified which has the highest value in ultimate load with lesser deflections. For determining the best stacking sequence, numerical investigation was performed on CFRP composites; length of wrapping and number of layers were fixed. Later, the studies were focused on CFRP and GFRP strengthened joint by varying the total number of layers and length of wrapping. An attempt was done to propose a parametric equation from multiple regression analysis, which can be used for CFRP strengthened joints. Hashin failure criteria was used to check the failure of composites. Results revealed that FRP was having a greater influence in the load bearing capacity of joints, and in reducing the deflections and stresses of joint under axial compressive loads. It was also seen that, CFRP was far better than GFRP in reducing the stresses and deflection.

On successive machine learning process for predicting strength and displacement of rectangular reinforced concrete columns subjected to cyclic loading

  • Bu-seog Ju;Shinyoung Kwag;Sangwoo Lee
    • Computers and Concrete
    • /
    • 제32권5호
    • /
    • pp.513-525
    • /
    • 2023
  • Recently, research on predicting the behavior of reinforced concrete (RC) columns using machine learning methods has been actively conducted. However, most studies have focused on predicting the ultimate strength of RC columns using a regression algorithm. Therefore, this study develops a successive machine learning process for predicting multiple nonlinear behaviors of rectangular RC columns. This process consists of three stages: single machine learning, bagging ensemble, and stacking ensemble. In the case of strength prediction, sufficient prediction accuracy is confirmed even in the first stage. In the case of displacement, although sufficient accuracy is not achieved in the first and second stages, the stacking ensemble model in the third stage performs better than the machine learning models in the first and second stages. In addition, the performance of the final prediction models is verified by comparing the backbone curves and hysteresis loops obtained from predicted outputs with actual experimental data.

배깅 및 스태킹 기반 앙상블 기계학습법을 이용한 고성능 콘크리트 압축강도 예측모델 개발 (Development of a High-Performance Concrete Compressive-Strength Prediction Model Using an Ensemble Machine-Learning Method Based on Bagging and Stacking)

  • 곽윤지;고채연;곽신영;임승현
    • 한국전산구조공학회논문집
    • /
    • 제36권1호
    • /
    • pp.9-18
    • /
    • 2023
  • 고성능 콘크리트(HPC) 압축강도는 추가적인 시멘트질 재료의 사용으로 인해 예측하기 어렵고, 개선된 예측 모델의 개발이 필수적이다. 따라서, 본 연구의 목적은 배깅과 스태킹을 결합한 앙상블 기법을 사용하여 HPC 압축강도 예측 모델을 개발하는 것이다. 이 논문의 핵심적 기여는 기존 앙상블 기법인 배깅과 스태킹을 통합하여 새로운 앙상블 기법을 제시하고, 단일 기계학습 모델의 문제점을 해결하여 모델 예측 성능을 높이고자 한다. 단일 기계학습법으로 비선형 회귀분석, 서포트 벡터 머신, 인공신경망, 가우시안 프로세스 회귀를 사용하고, 앙상블 기법으로 배깅, 스태킹을 이용하였다. 결과적으로 본 연구에서 제안된 모델이 단일 기계학습 모델, 배깅 및 스태킹 모델보다 높은 정확도를 보였다. 이는 대표적인 4가지 성능 지표 비교를 통해 확인하였고, 제안된 방법의 유효성을 검증하였다.

NSGA-II 를 통한 송풍기 블레이드의 다중목적함수 최적화 (Multi-Objective Optimization of a Fan Blade Using NSGA-II)

  • 이기상;김광용;압두스사마드
    • 대한기계학회:학술대회논문집
    • /
    • 대한기계학회 2007년도 춘계학술대회B
    • /
    • pp.2690-2695
    • /
    • 2007
  • This work presents numerical optimization for design of a blade stacking line of a low speed axial flow fan with a fast and elitist Non-Dominated Sorting of Genetic Algorithm (NSGA-II) of multi-objective optimization using three-dimensional Navier-Stokes analysis. Reynolds-averaged Navier-Stokes (RANS) equations with ${\kappa}-{\varepsilon}$ turbulence model are discretized with finite volume approximations and solved on unstructured grids. Regression analysis is performed to get second order polynomial response which is used to generate Pareto optimal front with help of NSGA-II and local search strategy with weighted sum approach to refine the result obtained by NSGA-II to get better Pareto optimal front. Four geometric variables related to spanwise distributions of sweep and lean of blade stacking line are chosen as design variables to find higher performed fan blade. The performance is measured in terms of the objectives; total efficiency, total pressure and torque. Hence the motive of the optimization is to enhance total efficiency and total pressure and to reduce torque.

  • PDF