• Title/Summary/Keyword: least squares linear regression

Search Result 134, Processing Time 0.026 seconds

Bayesian quantile regression analysis of private education expenses for high scool students in Korea (일반계 고등학생 사교육비 지출에 대한 베이지안 분위회귀모형 분석)

  • Oh, Hyun Sook
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.6
    • /
    • pp.1457-1469
    • /
    • 2017
  • Private education expenses is one of the key issues in Korea and there have been many discussions about it. Academically, most of previous researches for private education expenses have used multiple regression linear model based on ordinary least squares (OLS) method. However, if the data do not satisfy the basic assumptions of the OLS method such as the normality and homoscedasticity, there is a problem with the reliability of estimations of parameters. In this case, quantile regression model is preferred to OLS model since it does not depend on the assumptions of nonnormality and heteroscedasticity for the data. In the present study, the data from a survey on private education expenses, conducted by Statistics Korea in 2015 has been analyzed for investigation of the impacting factors for private education expenses. Since the data do not satisfy the OLS assumptions, quantile regression model has been employed in Bayesian approach by using gibbs sampling method. The analysis results show that the gender of the student, parent's age, and the time and cost of participating after school are not significant. Household income is positively significant in proportion to the same size for all levels (quantiles) of private education expenses. Spending on private education in Seoul is higher than other regions and the regional difference grows as private education expenditure increases. Total time for private education and student's achievement have positive effect on the lower quantiles than the higher quantiles. Education level of father is positively significant for midium-high quantiles only, but education level of mother is for all but low quantiles. Participating after school is positively significant for the lower quantiles but EBS textbook cost is positively significant for the higher quantiles.

Procedure for the Selection of Principal Components in Principal Components Regression (주성분회귀분석에서 주성분선정을 위한 새로운 방법)

  • Kim, Bu-Yong;Shin, Myung-Hee
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.5
    • /
    • pp.967-975
    • /
    • 2010
  • Since the least squares estimation is not appropriate when multicollinearity exists among the regressors of the linear regression model, the principal components regression is used to deal with the multicollinearity problem. This article suggests a new procedure for the selection of suitable principal components. The procedure is based on the condition index instead of the eigenvalue. The principal components corresponding to the indices are removed from the model if any condition indices are larger than the upper limit of the cutoff value. On the other hand, the corresponding principal components are included if any condition indices are smaller than the lower limit. The forward inclusion method is employed to select proper principal components if any condition indices are between the upper limit and the lower limit. The limits are obtained from the linear model which is constructed on the basis of the conjoint analysis. The procedure is evaluated by Monte Carlo simulation in terms of the mean square error of estimator. The simulation results indicate that the proposed procedure is superior to the existing methods.

Selecting Significant Wavelengths to Predict Chlorophyll Content of Grafted Cucumber Seedlings Using Hyperspectral Images

  • Jang, Sung Hyuk;Hwang, Yong Kee;Lee, Ho Jun;Lee, Jae Su;Kim, Yong Hyeon
    • Korean Journal of Remote Sensing
    • /
    • v.34 no.4
    • /
    • pp.681-692
    • /
    • 2018
  • This study was performed to select the significant wavelengths for predicting the chlorophyll content of grafted cucumber seedlings using hyperspectral images. The visible and near-infrared (VNIR) images and the short-wave infrared images of cucumber cotyledon samples were measured by two hyperspectral cameras. A correlation coefficient spectrum (CCS), a stepwise multiple linear regression (SMLR), and partial least squares (PLS) regression were used to determine significant wavelengths. Some wavelengths at 501, 505, 510, 543, 548, 619, 718, 723, and 727 nm were selected by CCS, SMLR, and PLS as significant wavelengths for estimating chlorophyll content. The results from the calibration models built by SMLR and PLS showed fair relationship between measured and predicted chlorophyll concentration. It was concluded that the hyperspectral imaging technique in the VNIR region is suggested effective for estimating the chlorophyll content of grafted cucumber leaves, non-destructively.

Measurement of Soil Organic Matter Using Near Infra-Red Reflectance (근적외선 반사도를 이용한 토양 유기물 함량 측정)

  • 조성인;배영민;양희성;최상현
    • Journal of Biosystems Engineering
    • /
    • v.26 no.5
    • /
    • pp.475-480
    • /
    • 2001
  • Sensing soil organic matter is crucial for precision farming and environment friendly agriculture. Near infra-red(NIR) was utilized to measure the soil organic matter. Multivariate calibration methods, including stepwise multiple linear regression(MLR), principal components recession(PCR) and partial least squares regression(PLS), were applied to soil spectral reflectance data to predict the organic matter content. The effect of soil particle size and water content was studied. The range of soil organic matter contents was from 0.5 to 11%. Near infrared (NIR) region from 700 to 2,500nm was applied. For uniform soil particle size, result had good correlation (R$\^$2/ = 0.984, standard error of prediction= 0.596). The effect of soil particle size could be eliminated with 1st order derivative of the NIR signal. However. moist soil had a little lower correlation. R$\^$2/ was 0.95 and standard error of prediction was 0.94% using the PLS method. The results showed the possibility of soil organic matter measurement using NIR reflectance on the field.

  • PDF

Influential observations on variable selection in linear regression model (선형회귀모형에서 변수 선택에 영향을 미치는 관측점에 관한 연구)

  • 최지훈;구자흥;이재준;전홍석
    • The Korean Journal of Applied Statistics
    • /
    • v.6 no.2
    • /
    • pp.421-433
    • /
    • 1993
  • Few ovservation can influence in model building procedure and can dominate the least squares fit of a selected model. An observation, however, may not have the same impact on all aspects of regression analysis. We introduce a statistic which measures the impact of individual cases on the overall goodness-of-fit statistics. We also propose an influence measure for variable selection problem. The property of uncorrelatedness between fitted values and residuals has been used to develop the influence measure. The performance of the measures are used to develop the influence measure. The performance of the measures are compared with other widely used influence measures by the analysis of real data.

  • PDF

Bigdata Analysis of Fine Dust Theme Stock Price Volatility According to PM10 Concentration Change (PM10 농도변화에 따른 미세먼지 테마주 주가변동 빅데이터 분석)

  • Kim, Mu Jeong;Lim, Gyoo Gun
    • Journal of Service Research and Studies
    • /
    • v.10 no.1
    • /
    • pp.55-67
    • /
    • 2020
  • Fine dust has recently become one of the greatest concerns of Korean people and has been a target of considerable efforts by governments and local governments. In the academic world, many researches have been carried out in relation to fine dust, but the research on the economic field has been relatively few. So we wanted to know how fine dust affects the economy. Big data of PM10 concentration for fine dust and fine dust theme stock price were collected for five years from 2013 to 2017. Regression analysis was performed using the linear regression model, the generalized least squares method. As a result, the change in the fine dust concentration was found to have a effect on the related theme stocks' price. When the fine dust concentration increased compared to the previous day, the fine dust theme stocks' price also showed a tendency to increase. Also, according to the analysis of stock price change from 2013 to 2017 based on fine dust theme stocks, companies with large regression coefficients were changed every year. Among them, the regression coefficients of Monalisa were repeatedly high in 2014, 2015, 2017, Samil Pharmaceutical in 2015, 2016 and 2017, and Welcron in 2016 and 2017, and the companies were judged to be sensitive to the concentration of fine dust. The companies that responded the most in the past 5 years were Wokong, Welcron, Dongsung Pharmaceutical, Samil Pharmaceutical, and Monalisa. If PM2.5 measurement data are accumulated enough, it would be meaningful to compare and analyze PM2.5 concentration with independent variables. In this study, only the fine dust concentration is used as an independent variable. However, it is expected that a more clear and well-explained result can be found by adding appropriate additional variables to increase the explanatory power.

Effects of Well Parameters Analysis Techniques on Evaluation of Well Efficiency in Step-Drawdown Test (단계양수시험 해석시 우물상수 산정 방법이 우물효율에 미치는 영향)

  • Chung, Sang-Yong;Kim, Byung-Woo;Kim, Gyoo-Bum;Kweon, Hae-Woo
    • The Journal of Engineering Geology
    • /
    • v.19 no.1
    • /
    • pp.71-79
    • /
    • 2009
  • Step-drawdown tests were conducted at four pumping Wells, two in porous media and two in fractured rocks, respectively. In general, P = 2.0 suggested by Jacob (1947) is applied to porous media and fractured rocks in terms of drawdowns of step-drawdown test. In an attempt to review problems of linear model (Jacob's graphic method) in interpreting the step-draw down test, the outcomes of well parameters (aquifer loss coefficient (B), well loss coefficient (C) and well loss exponent (P)) calculated from linear and nonlinear model (Labadie and Helweg's least-squares method) were compared and analyzed. The values of C and P calculated from linear and nonlinear models differed according to permeability of aquifer and the conditions of pumping well. The value C obtained from nonlinear models in porous media and fractured rocks is about $10^0{\sim}10^{-2}$ and $10^{-3}{\sim}10^{-6}$ times lower than in their linear models, respectively. The value P of porous media obtained from nonlinear model ranged from 2.123 to 2.775, while it ranged from 3.459 to 5.635 for fractured rocks. In case of nonlinear model, well loss highly depends on the value P. At this time, well efficiencies calculated from linear and nonlinear models were $1.56{\sim}14.89%$ for porous media and $8.73{\sim}24.71%$ for fractured rocks, showing a significant error according to chosen models. In nonlinear model, it was found that the regression analysis using the least squares method was very useful to interpret step-drawdown test in all aquifer.

Prediction of movie audience numbers using hybrid model combining GLS and Bass models (GLS와 Bass 모형을 결합한 하이브리드 모형을 이용한 영화 관객 수 예측)

  • Kim, Bokyung;Lim, Changwon
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.4
    • /
    • pp.447-461
    • /
    • 2018
  • Domestic film industry sales are increasing every year. Theaters are the primary sales channels for movies and the number of audiences using the theater affects additional selling rights. Therefore, the number of audiences using the theater is an important factor directly linked to movie industry sales. In this paper we consider a hybrid model that combines a multiple linear regression model and the Bass model to predict the audience numbers for a specific day. By combining the two models, the predictive value of the regression analysis was corrected to that of the Bass model. In the analysis, three films with different release dates were used. All subset regression method is used to generate all possible combinations and 5-fold cross validation to estimate the model 5 times. In this case, the predicted value is obtained from the model with the smallest root mean square error and then combined with the predicted value of the Bass model to obtain the final predicted value. With the existence of past data, it was confirmed that the weight of the Bass model increases and the compensation is added to the predicted value.

Predicting Organic Matter content in Korean Soils Using Regression rules on Visible-Near Infrared Diffuse Reflectance Spectra

  • Chun, Hyen-Chung;Hong, Suk-Young;Song, Kwan-Cheol;Kim, Yi-Hyun;Hyun, Byung-Keun;Minasny, Budiman
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.45 no.4
    • /
    • pp.497-502
    • /
    • 2012
  • This study investigates the prediction of soil OM on Korean soils using the Visible-Near Infrared (Vis-NIR) spectroscopy. The ASD Field Spec Pro was used to acquire the reflectance of soil samples to visible to near-infrared radiation (350 to 2500 nm). A total of 503 soil samples from 61 Korean soil series were scanned using the instrument and OM was measured using the Walkley and Black method. For data analysis, the spectra were resampled from 500-2450 nm with 4 nm spacing and converted to the $1^{st}$ derivative of absorbance (log (1/R)). Partial least squares regression (PLSR) and regression rules model (Cubist) were applied to predict soil OM. Regression rules model estimates the target value by building conditional rules, and each rule contains a linear expression predicting OM from selected absorbance values. The regression rules model was shown to give a better prediction compared to PLSR. Although the prediction for Andisols had a larger error, soil order was not found to be useful in stratifying the prediction model. The stratification used by Cubist was mainly based on absorbance at wavelengths of 850 and 2320 nm, which corresponds to the organic absorption bands. These results showed that there could be more information on soil properties useful to classify or group OM data from Korean soils. In conclusion, this study shows it is possible to develop good prediction model of OM from Korean soils and provide data to reexamine the existing prediction models for more accurate prediction.

Improvement of Rating Curve Fitting Considering Variance Function with Pseudo-likelihood Estimation (의사우도추정법에 의한 분산함수를 고려한 수위-유량 관계 곡선 산정법 개선)

  • Lee, Woo-Seok;Kim, Sang-Ug;Chung, Eun-Sung;Lee, Kil-Seong
    • Journal of Korea Water Resources Association
    • /
    • v.41 no.8
    • /
    • pp.807-823
    • /
    • 2008
  • This paper presents a technique for estimating discharge rating curve parameters. In typical practical applications, the original non-linear rating curve is transformed into a simple linear regression model by log-transforming the measurement without examining the effect of log transformation. The model of pseudo-likelihood estimation is developed in this study to deal with heteroscedasticity of residuals in the original non-linear model. The parameters of rating curves and variance functions of errors are simultaneously estimated by the pseudo-likelihood estimation(P-LE) method. Simulated annealing, a global optimization technique, is adapted to minimize the log likelihood of the weighted residuals. The P-LE model was then applied to a hypothetical site where stage-discharge data were generated by incorporating various errors. Results of the P-LE model show reduced error values and narrower confidence intervals than those of the common log-transform linear least squares(LT-LR) model. Also, the limit of water levels for segmentation of discharge rating curve is estimated in the process of P-LE using the Heaviside function. Finally, model performance of the conventional log-transformed linear regression and the developed model, P-LE are computed and compared. After statistical simulation, the developed method is then applied to the real data sets from 5 gauge stations in the Geum River basin. It can be suggested that this developed strategy is applied to real sites to successfully determine weights taking into account error distributions from the observed discharge data.