• 제목/요약/키워드: Non-linear regression model

검색결과 272건 처리시간 0.025초

An Additive Sparse Penalty for Variable Selection in High-Dimensional Linear Regression Model

  • Lee, Sangin
    • Communications for Statistical Applications and Methods
    • /
    • 제22권2호
    • /
    • pp.147-157
    • /
    • 2015
  • We consider a sparse high-dimensional linear regression model. Penalized methods using LASSO or non-convex penalties have been widely used for variable selection and estimation in high-dimensional regression models. In penalized regression, the selection and prediction performances depend on which penalty function is used. For example, it is known that LASSO has a good prediction performance but tends to select more variables than necessary. In this paper, we propose an additive sparse penalty for variable selection using a combination of LASSO and minimax concave penalties (MCP). The proposed penalty is designed for good properties of both LASSO and MCP.We develop an efficient algorithm to compute the proposed estimator by combining a concave convex procedure and coordinate descent algorithm. Numerical studies show that the proposed method has better selection and prediction performances compared to other penalized methods.

시스템 시뮬레이션을 통한 원자재 가격 및 운송 운임 모델 (A System Dynamics Model for Basic Material Price and Fare Analysis and Forecasting)

  • 정재헌
    • 한국시스템다이내믹스연구
    • /
    • 제10권1호
    • /
    • pp.61-76
    • /
    • 2009
  • We try to use system dynamics to forecast the demand/supply and price, also transportation fare for iron ore. Iron ore is very important mineral resource for industrial production. The structure for this system dynamics shows non-linear pattern and we anticipated the system dynamic method will catch this non-linear reality better than the regression analysis. Our model is calibrated and tested for the past 6 year monthly data (2003-2008) and used for next 6 year monthly data(2008-2013) forecasting. The test results show that our system dynamics approach fits the real data with higher accuracy than the regression one. And we have run the simulations for scenarios made by possible future changes in demand or supply and fare related variables. This simulations imply some meaningful price and fare change patterns.

  • PDF

Prediction of Solvent Effects on Rate Constant of [2+2] Cycloaddition Reaction of Diethyl Azodicarboxylate with Ethyl Vinyl Ether Using Artificial Neural Networks

  • Habibi-Yangjeh, Aziz;Nooshyar, Mahdi
    • Bulletin of the Korean Chemical Society
    • /
    • 제26권1호
    • /
    • pp.139-145
    • /
    • 2005
  • Artificial neural networks (ANNs), for a first time, were successfully developed for the modeling and prediction of solvent effects on rate constant of [2+2] cycloaddition reaction of diethyl azodicarboxylate with ethyl vinyl ether in various solvents with diverse chemical structures using quantitative structure-activity relationship. The most positive charge of hydrogen atom (q$^+$), dipole moment ($\mu$), the Hildebrand solubility parameter (${\delta}_H^2$) and total charges in molecule (q$_t$) are inputs and output of ANN is log k$_2$ . For evaluation of the predictive power of the generated ANN, the optimized network with 68 various solvents as training set was used to predict log k$_2$ of the reaction in 16 solvents in the prediction set. The results obtained using ANN was compared with the experimental values as well as with those obtained using multi-parameter linear regression (MLR) model and showed superiority of the ANN model over the regression model. Mean square error (MSE) of 0.0806 for the prediction set by MLR model should be compared with the value of 0.0275 for ANN model. These improvements are due to the fact that the reaction rate constant shows non-linear correlations with the descriptors.

회귀분석을 이용한 UCP 기반 소프트웨어 개발 노력 추정 모델 (Software Cost Estimation Model Based on Use Case Points by using Regression Model)

  • 박주석;양해술
    • 한국콘텐츠학회논문지
    • /
    • 제9권8호
    • /
    • pp.147-157
    • /
    • 2009
  • 최근 객체지향 개발 방법론을 적용하는 소프트웨어 개발 프로젝트에서 개발 노력 추정 기법으로 사용사례점수(Use Case Point, UCP)에 대한 연구가 계속되고 있다. 기존의 연구는 기술적 요인과 환경적 요인을 적용한 AUCP(Adjusted Use Case Point)에 상수를 곱하여 개발 노력을 추정하는 선형모델을 제안하고 있다. 그러나 소프트웨어 규모가 증가하면 개발기간은 기하급수적으로 증가함으로서 비선형 회귀모델이 적합하다는 사실과 UCP 계산과정에서 TCF(Technical Complexity Factor)와 EF(Environmental Factor)를 적용함에 따른 FP(Function Point) 오차가 발생함으로서 AUCP로 규모를 추정하는 것은 비현실적이다. 이 논문은 사용사례점수 기반의 기존 연구의 문제점을 제시하고, 기존 연구의 문제점인 TCF와 EF를 고려하지 않고 직접 UUCP로 부터 개발 노력을 추정할 수 있는 모델(선형, 로그형, 다항식, 거듭제곱, 지수형)을 도출하고 평가한다. 그 결과, 기존의 선행 모델보다 비선형모델인 지수형 모델이 우수한 결과를 보였다. 따라서 개발될 소프트웨어 시스템의 UUCP를 계산한 후 제안된 모델을 이용하여 개발 노력을 추정함으로서 개발에 소요되는 직접비용 산정이 가능하다.

의사우도추정법에 의한 분산함수를 고려한 수위-유량 관계 곡선 산정법 개선 (Improvement of Rating Curve Fitting Considering Variance Function with Pseudo-likelihood Estimation)

  • 이우석;김상욱;정은성;이길성
    • 한국수자원학회논문집
    • /
    • 제41권8호
    • /
    • pp.807-823
    • /
    • 2008
  • 수위-유량 관계 곡선을 나타내는 곡선식에 포함되어 있는 매개변수의 추정을 위해 많이 사용되는 로그선형 회귀분석은 잔차의 비등분산성(heteroscedasticity)을 고려하지 못하므로 본 연구에서는 의사우도추정법(pseudolikelihood estimation, P-LE)에 의해 분산함수를 추정하고 이와 함께 회귀계수를 추정할 수 있는 방법을 제시하였다. 이 과정에서 제시된 회귀잔차를 최소화하기 위하여 SA(simulated annealing)이라는 전역 최적화 알고리즘을 적용하였다. 또한 수위-유량 관계 곡선은 단면 등의 영향으로 인해 구간에 따라 각각 다르게 구축되어져야 하므로 이를 보다 객관적으로 판단하고 분리 위치를 추정하기 위하여 Heaviside 함수를 의사우도함수에 포함시켜 결과를 추정하도록 하였으며, 2개의 구간을 가지는 유량자료를 이용하여 제시된 방법의 합리성을 통계적으로 실험하였다. 이와 같이 통계적 실험을 통해 제시된 방법들이 기존 방법과 비교하여 가질 수 있는 장점을 파악하였으며, 제시된 방법들을 금강유역 5개 지점에서 대해 수행하여 효율성을 검증하였다.

Efficient Prediction in the Semi-parametric Non-linear Mixed effect Model

  • So, Beong-Soo
    • Journal of the Korean Statistical Society
    • /
    • 제28권2호
    • /
    • pp.225-234
    • /
    • 1999
  • We consider the following semi-parametric non-linear mixed effect regression model : y\ulcorner=f($\chi$\ulcorner;$\beta$)+$\sigma$$\mu$($\chi$\ulcorner)+$\sigma$$\varepsilon$\ulcorner,i=1,…,n,y*=f($\chi$;$\beta$)+$\sigma$$\mu$($\chi$) where y'=(y\ulcorner,…,y\ulcorner) is a vector of n observations, y* is an unobserved new random variable of interest, f($\chi$;$\beta$) represents fixed effect of known functional form containing unknown parameter vector $\beta$\ulcorner=($\beta$$_1$,…,$\beta$\ulcorner), $\mu$($\chi$) is a random function of mean zero and the known covariance function r(.,.), $\varepsilon$'=($\varepsilon$$_1$,…,$\varepsilon$\ulcorner) is the set of uncorrelated measurement errors with zero mean and unit variance and $\sigma$ is an unknown dispersion(scale) parameter. On the basis of finite-sample, small-dispersion asymptotic framework, we derive an absolute lower bound for the asymptotic mean squared errors of prediction(AMSEP) of the regular-consistent non-linear predictors of the new random variable of interest y*. Then we construct an optimal predictor of y* which attains the lower bound irrespective of types of distributions of random effect $\mu$(.) and measurement errors $\varepsilon$.

  • PDF

Model selection algorithm in Gaussian process regression for computer experiments

  • Lee, Youngsaeng;Park, Jeong-Soo
    • Communications for Statistical Applications and Methods
    • /
    • 제24권4호
    • /
    • pp.383-396
    • /
    • 2017
  • The model in our approach assumes that computer responses are a realization of a Gaussian processes superimposed on a regression model called a Gaussian process regression model (GPRM). Selecting a subset of variables or building a good reduced model in classical regression is an important process to identify variables influential to responses and for further analysis such as prediction or classification. One reason to select some variables in the prediction aspect is to prevent the over-fitting or under-fitting to data. The same reasoning and approach can be applicable to GPRM. However, only a few works on the variable selection in GPRM were done. In this paper, we propose a new algorithm to build a good prediction model among some GPRMs. It is a post-work of the algorithm that includes the Welch method suggested by previous researchers. The proposed algorithms select some non-zero regression coefficients (${\beta}^{\prime}s$) using forward and backward methods along with the Lasso guided approach. During this process, the fixed were covariance parameters (${\theta}^{\prime}s$) that were pre-selected by the Welch algorithm. We illustrated the superiority of our proposed models over the Welch method and non-selection models using four test functions and one real data example. Future extensions are also discussed.

자원 수급 및 가격 예측 -니켈 사례를 중심으로- (Resource Demand/Supply and Price Forecasting -A Case of Nickel-)

  • 정재헌
    • 한국시스템다이내믹스연구
    • /
    • 제9권1호
    • /
    • pp.125-141
    • /
    • 2008
  • It is very difficult to predict future demand/supply, price for resources with acceptable accuracy using regression analysis. We try to use system dynamics to forecast the demand/supply and price for nickel. Nickel is very expensive mineral resource used for stainless production or other industrial production like battery, alloy making. Recent nickel price trend showed non-linear pattern and we anticipated the system dynamic method will catch this non-linear pattern better than the regression analysis. Our model has been calibrated for the past 6 year quarterly data (2002-2007) and tested for next 5 year quarterly data(2008-2012). The results were acceptable and showed higher accuracy than the results obtained from the regression analysis. And we ran the simulations for scenarios made by possible future changes in demand or supply related variables. This simulations implied some meaningful price change patterns.

  • PDF

고소성 해성점토지반의 압축지수에 대한 비교 연구 (A Comparison Study on Compression Index of Marine Clay with High-Plasticity)

  • 정길수;박병수;홍영길;유남재
    • 산업기술연구
    • /
    • 제25권A호
    • /
    • pp.57-65
    • /
    • 2005
  • In this paper, for the highly plastic marine soft clay distributed in west and southern coast of Korean peninsula of Kwangyang and Busan New Port areas, correlation between compression index and other indices representing geotechnical engineering properties such as liquid limit, void ratio and natural water content were analyzed. Appropriate empirical equations of being able to estimate the compressibility of clays in the specific areas were proposed and compared with other existing empirical ones. For analyses of the data and test results, data for marine clays were used from areas of the South Container Port of the Busan New Port, East Breakwater, Passenger Quay, Jungma Reclamation and Reclamation Containment in the 3rd stage in Kwangyang. In order to find the best regression model by using the commercially available software, MS EXCEL 2000, results obtained from the simple linear regression analysis, using the values of liquid limit, initial void ratio and natural water content as independent variables, were compared with the existing empirical equations. Multiple linear regression was also performed to find the best fit regression curves for compression index and other soil properties by combining those independent variables. On the other hands, another software of SPSS for non-linear regression was used to analyze the correlations between compression index and other soil properties.

  • PDF

기계학습을 이용한 유동가속부식 모델링: 랜덤 포레스트와 비선형 회귀분석과의 비교 (Modeling of Flow-Accelerated Corrosion using Machine Learning: Comparison between Random Forest and Non-linear Regression)

  • 이경근;이은희;김성우;김경모;김동진
    • Corrosion Science and Technology
    • /
    • 제18권2호
    • /
    • pp.61-71
    • /
    • 2019
  • Flow-Accelerated Corrosion (FAC) is a phenomenon in which a protective coating on a metal surface is dissolved by a flow of fluid in a metal pipe, leading to continuous wall-thinning. Recently, many countries have developed computer codes to manage FAC in power plants, and the FAC prediction model in these computer codes plays an important role in predictive performance. Herein, the FAC prediction model was developed by applying a machine learning method and the conventional nonlinear regression method. The random forest, a widely used machine learning technique in predictive modeling led to easy calculation of FAC tendency for five input variables: flow rate, temperature, pH, Cr content, and dissolved oxygen concentration. However, the model showed significant errors in some input conditions, and it was difficult to obtain proper regression results without using additional data points. In contrast, nonlinear regression analysis predicted robust estimation even with relatively insufficient data by assuming an empirical equation and the model showed better predictive power when the interaction between DO and pH was considered. The comparative analysis of this study is believed to provide important insights for developing a more sophisticated FAC prediction model.