• 제목/요약/키워드: regression analysis.

검색결과 23,697건 처리시간 0.049초

Robustness of model averaging methods for the violation of standard linear regression assumptions

  • Lee, Yongsu;Song, Juwon
    • Communications for Statistical Applications and Methods
    • /
    • 제28권2호
    • /
    • pp.189-204
    • /
    • 2021
  • In a regression analysis, a single best model is usually selected among several candidate models. However, it is often useful to combine several candidate models to achieve better performance, especially, in the prediction viewpoint. Model combining methods such as stacking and Bayesian model averaging (BMA) have been suggested from the perspective of averaging candidate models. When the candidate models include a true model, it is expected that BMA generally gives better performance than stacking. On the other hand, when candidate models do not include the true model, it is known that stacking outperforms BMA. Since stacking and BMA approaches have different properties, it is difficult to determine which method is more appropriate under other situations. In particular, it is not easy to find research papers that compare stacking and BMA when regression model assumptions are violated. Therefore, in the paper, we compare the performance among model averaging methods as well as a single best model in the linear regression analysis when standard linear regression assumptions are violated. Simulations were conducted to compare model averaging methods with the linear regression when data include outliers and data do not include them. We also compared them when data include errors from a non-normal distribution. The model averaging methods were applied to the water pollution data, which have a strong multicollinearity among variables. Simulation studies showed that the stacking method tends to give better performance than BMA or standard linear regression analysis (including the stepwise selection method) in the sense of risks (see (3.1)) or prediction error (see (3.2)) when typical linear regression assumptions are violated.

Multivariate statistical analysis of the comparative antioxidant activity of the total phenolics and tannins in the water and ethanol extracts of dried goji berry (Lycium chinense) fruits

  • Kim, Joo-Shin;Kimm, Haklin Alex
    • 한국식품과학회지
    • /
    • 제51권3호
    • /
    • pp.227-236
    • /
    • 2019
  • Antioxidant activity in water and ethanol extracts of dried Lycium chinense fruit, as a result of the total phenolic and tannin content, was measured using a number of chemical and biochemical assays for radical scavenging and inhibition of lipid peroxidation, with the analysis being extended by applying a bootstrapping statistical method. Previous statistical analyses mostly provided linear correlation and regression analyses between antioxidant activity and increasing concentrations of phenolics and tannins in a concentration-dependent mode. The present study showed that multiple component or multivariate analysis by applying multiple regression analysis or regression planes proved more informative than linear regression analysis of the relationship between the concentration of individual components and antioxidant activity. In this paper, we represented the multivariate analysis of antioxidant activities of both phenolic and tannin contents combined in the water and ethanol extracts, which revealed the hidden observations that were not evident from linear statistical analysis.

작품 가격 추정을 위한 기계 학습 기법의 응용 및 가격 결정 요인 분석 (Price Determinant Factors of Artworks and Prediction Model Based on Machine Learning)

  • 장동률;박민재
    • 품질경영학회지
    • /
    • 제47권4호
    • /
    • pp.687-700
    • /
    • 2019
  • Purpose: The purpose of this study is to investigate the interaction effects between price determinants of artworks. We expand the methodology in art market by applying machine learning techniques to estimate the price of artworks and compare linear regression and machine learning in terms of prediction accuracy. Methods: Moderated regression analysis was performed to verify the interaction effects of artistic characteristics on price. The moderating effects were studied by confirming the significance level of the interaction terms of the derived regression equation. In order to derive price estimation model, we use multiple linear regression analysis, which is a parametric statistical technique, and k-nearest neighbor (kNN) regression, which is a nonparametric statistical technique in machine learning methods. Results: Mostly, the influences of the price determinants of art are different according to the auction types and the artist 's reputation. However, the auction type did not control the influence of the genre of the work on the price. As a result of the analysis, the kNN regression was superior to the linear regression analysis based on the prediction accuracy. Conclusion: It provides a theoretical basis for the complexity that exists between pricing determinant factors of artworks. In addition, the nonparametric models and machine learning techniques as well as existing parameter models are implemented to estimate the artworks' price.

A Study on the Influence of a Sewage Treatment Plant's Operational Parameters using the Multiple Regression Analysis Model

  • Lee, Seung-Pil;Min, Sang-Yun;Kim, Jin-Sik;Park, Jong-Un;Kim, Man-Soo
    • Environmental Engineering Research
    • /
    • 제19권1호
    • /
    • pp.31-36
    • /
    • 2014
  • In this study, the influence of the control and operational parameters within a sewage treatment plant were reviewed by performing multiple regression analysis on the effluent quality of the sewage treatment. The data used for this review are based on the actual data from a sewage treatment plant using the media process within the year 2012. The prediction models of chemical oxygen demand ($COD_{Mn}$) and total nitrogen (T-N) within the effluent of the 2nd settling tank based on the multiple regression analysis yielded the prediction accuracy measurements of 0.93 and 0.84, respectively; and it was concluded that the model was accurately predicting the variances of the actual observed values. If the data on the energy spent on each operating condition can be collected, then the operating parameter that conserves energy without violating the effluent quality standards of COD and T-N can be determined using the regression model and the standardized regression coefficients. These results can provide appropriate operation guidelines to conserve energy to the operators at sewage treatment plants that consume a lot of energy.

실험적 연구를 통한 비정형롤판재성형 예측 모델 개발 (Development of Prediction Model for Flexibly-reconfigurable Roll Forming based on Experimental Study)

  • 박지우;길민규;윤준석;강범수;이경훈
    • 소성∙가공
    • /
    • 제26권6호
    • /
    • pp.341-347
    • /
    • 2017
  • Flexibly-reconfigurable roll forming (FRRF) is a novel sheet metal forming technology conducive to produce multi-curvature surfaces by controlling strain distribution along longitudinal direction. Reconfigurable rollers could be arranged to implement a kind of punch die set. By utilizing these reconfigurable rollers, desired curved surface can be formed. In FRRF process, three-dimensional surface is formed from two-dimensional curve. Thus, it is difficult to predict the forming result. In this study, a regression analysis was suggested to construct a predictive model for a longitudinal curvature of FRRF process. To facilitate investigation, input parameters affecting the longitudinal curvature of FRRF were determined as maximum compression value, curvature radius in the transverse direction, and initial blank width. Three-factor three-level full factorial experimental design was utilized and 27 experiments using FRRF apparatus were performed to obtain sample data of the regression model. Regression analysis was carried out using experimental results as sample data. The model used for regression analysis was a quadratic nonlinear regression model. Determination factor and root mean square root error were calculated to confirm the conformity of this model. Through goodness of fit test, this regression predictive model was verified.

유역 토지이용과 저수지 수질의 상관관계 분석 (Correlation Analysis of Water Quality According to Land Use Types of Reservoir Watershed)

  • 윤동균;정상옥
    • 한국농공학회:학술대회논문집
    • /
    • 한국농공학회 2005년도 학술발표논문집
    • /
    • pp.614-619
    • /
    • 2005
  • The object of this study was to presented regression equations for obtaining simply and quickly values of water quality items, BOD, COD, T-N, and T-P. Regression equations obtained to analyze relationships for water quality items to land use types in agricultural reservoir watersheds. In order to derive regression equations, a multiple linear regression analysis was used in this studying reservoirs. In this regression analysis, a independent values used land used types and dependent values used BOD, COD, T-N, T-P values in water quality items. The results showed that numbers of regression equation ranging above 0.90 in a multiple correlation coefficient (MCC) was not found, ranging from 0.70 to 0.90 in the MCC was 6, ranging from 0.40 to 0.70 in the MCC was 20, and ranging from 0.20 to 0.40 in the MCC was 4. The results of this study can be used as a basic information for evaluating simply and quickly water quality for proposing and designing steps in water quality policy.

  • PDF

단침보강 세라믹 공구를 이용한 플라스틱 금형강(STAVAX)의 선삭가공 (Turning of Plastic Mold Steel(STAVAX) using Whisker Reinforced Ceramic)

  • 배명일;이이선
    • 한국기계가공학회지
    • /
    • 제11권6호
    • /
    • pp.36-41
    • /
    • 2012
  • In this study, we turning plastic mold steel (STAVAX) against cutting speed, depth of cut, feed rate using whisker reinforced ceramic tool (WA1). To predict cutting force, analyze principal, radial, feed force with multi-regression analysis. Results are follows: From the analysis of variance, affected factor to cutting force feed rate, depth of cut, cutting speed in order and cutting speed was very small affect to cutting force. From multi-regression analysis, we extracted regression equation and the coefficient of determination$(R^2)$ was 0.9, 0.88, 0.856 at principal, radial and feed force. It means regression equation is significant. From the experimental verification, it was confirmed that principal, radial and feed force was predictable by regression equation.

FORECASTING THE COST AND DURATION OF SCHOOL RECONSTRUCTION PROJECTS USING REGRESSION ANALYSIS

  • Wei Tong Chen;Ying-Hua Huang;Shen-Li Liao
    • 국제학술발표논문집
    • /
    • The 1th International Conference on Construction Engineering and Project Management
    • /
    • pp.892-896
    • /
    • 2005
  • This paper collected 132 schools reconstruction projects in central Taiwan, which received the most serious damage from the Chi-Chi Earthquake. Regression analysis was implemented to build the prediction model of the cost and the duration for the collected projects. It is found that the cubic regression models are capable for predicting the cost and the duration of the projects contracted by the central agency of which the contracting awarding approach was based on the most advantageous tendering (MAT) approach. On the other hand, power regression models are capable for predicting the cost and the duration of the projects contracted through the low bid tendering (LBT) approach. It is also found that the performance of the regression prediction model differs in accordance with organizations that contracted the reconstruction projects.

  • PDF

건설협력업체 핵심역량의 퍼지회귀분석 (Fuzzy Regression Analysis for Core Competency of Construction Subcontractors)

  • 김성일;황승국
    • 한국지능시스템학회논문지
    • /
    • 제25권3호
    • /
    • pp.203-209
    • /
    • 2015
  • 본 논문은 건설협력업체의 핵심역량 대하여 일반회귀분석과 퍼지회귀분석을 실시하였다. 이것은 핵심역량이 건설협력업체의 등급에 어느 정도 영향을 주는 지에 대하여 두 종류의 회귀분석으로 확인하기 위한 것이다. 건설협력업체 평가등급에 대한 일반회귀분석 결과에서 건설협력업체의 등급결정에 영향을 주는 핵심역량은 경영과 업체기여도 임을 알 수 있다. 건설협력업체 평가등급에 대한 퍼지회귀분석은 Min, Max, Conjunction 중에서 Min, Conjunction 문제는 100%의 신뢰성이 있어 활용이 가능함을 알 수 있었다. 이상으로부터 일반회귀분석은 종속변수인 건설협력업체의 평가등급에 영향을 주는 핵심역량의 파악이 가능하며, 퍼지회귀분석은 주어진 퍼지출력데이터를 완전히 포함하던지 걸쳐지던지 한 점에 일치하는 건설협력업체 평가등급의 추정치를 보여 주고 있다는 것을 알 수 있다.

로지스틱회귀분석 모델을 활용한 도시철도 사상사고 사고예측모형 개발에 대한 연구 (Study on Accident Prediction Models in Urban Railway Casualty Accidents Using Logistic Regression Analysis Model)

  • 진수봉;이종우
    • 한국철도학회논문집
    • /
    • 제20권4호
    • /
    • pp.482-490
    • /
    • 2017
  • 본 연구는 사고심각도 분류 및 예측을 위한 철도사고조사 통계기법에 관한 연구이다. 그동안의 선형 회귀분석은 사고 심각도 분석에 어려움이 있었으나 로지스틱회귀분석은 이를 보완할 수 있었다. 데이터마이닝 기법인 로지스틱회귀분석을 활용, 서울지하철(5~8호선) 역사 내 전도사고 중 에스컬레이터 전도사고 발생에 영향을 주는 사고예측 모형 변수는 사고자 연령, 음주여부, 사고 당시상황 및 행동, 핸드레일 잡음 여부였다. 분석의 정확도는 76.7%로 설명되었고 분석방법 결과에 따르면 정확도와 유의수준 측에서 로지스틱회귀분석 방법이 도시철도 사상사고 예측모형을 개발하는데 유용한 데이터마이닝 기법으로 판단된다.