• Title/Summary/Keyword: Regression Study

Search Result 28,756, Processing Time 0.05 seconds

Influence Comparison of Customer Satisfaction Factor using Quantile Regression Model (분위회귀모형을 이용한 고객만족도 요인의 영향력 비교)

  • Kim, Seong-Yoon;Kim, Yong-Tae;Lee, Sang-Jun
    • Journal of Digital Convergence
    • /
    • v.13 no.6
    • /
    • pp.125-132
    • /
    • 2015
  • It is current situation that a number of issues are being raised how the weight is calculated from customer satisfaction survey. This study investigated how the weight of satisfaction for each quantile is different by comparing ordinary least square regression model to quantile regression model and carried out bootstrap verification to find the influence difference of regression coefficient for each quantile. As the analysis result of using R(Quantreg package) that is open software, it appeared that there was the influence size of satisfaction factor along study result and quantile and there was the significant difference statistically regarding regression coefficient for each quantile. So, to use quantile regression model that offers the influence of satisfaction factor for each customer group along satisfaction level would contribute to plan the quantitative convergence policy for customer satisfaction.

Comparisons of Kruglyak and Lander's Nonparametric Linkage Test and Weighted Regression Incorporating Replications (KRUGLYAK과 LANDER의 유전연관성 비모수 방법과 반복 자료를 고려한 가중 회귀분석법의 비교)

  • Choi, Eun-Kyeong;Song, Hae-Hiang
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.1
    • /
    • pp.1-17
    • /
    • 2008
  • The ordinary least squares regression method of Haseman and Elston(1972) is most widely used in genetic linkage studies for continuous traits of sib pairs. Kruglyak and Lander(1995) suggested a statistic which appears to be a nonparametric counterpart to the Haseman and Elston(1972)'s regression method, but in fact these two methods are quite different. In this paper the relationships between these two methods are described and will be compared by simulation studies. One of the characteristics of the sib-pair linkage study is that the explanatory variable has only three different values and thus dependent variable is heavily replicated in each value of the explanatory variable. We propose a weighted least squares regression method which is more appropriate to this situation and the efficiency of the weighted regression in genetic linkage study was explored with normal and non-normal simulated continuous traits data. Simulation studies demonstrated that the weighted regression is more powerful than other tests.

A comparison study of various robust regression estimators using simulation (시뮬레이션을 통한 다양한 로버스트 회귀추정량의 비교 연구)

  • Jang, Soohee;Yoon, Jungyeon;Chun, Heuiju
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.3
    • /
    • pp.471-485
    • /
    • 2016
  • Least squares (LS) regression is a classic method for regression that is optimal under assumptions of regression and usual observations. However, the presence of unusual data in the LS method leads to seriously distorted estimates. Therefore, various robust estimation methods are proposed to circumvent the limitations of traditional LS regression. Among these, there are M-estimators based on maximum likelihood estimation (MLE), L-estimators based on linear combinations of order statistics and R-estimators based on a linear combinations of the ordered residuals. In this paper, robust regression estimators with high breakdown point and/or with high efficiency are compared under several simulated situations. The paper analyses and compares distributions of estimates as well as relative efficiencies calculated from mean squared errors (MSE) in the simulation study. We conclude that MM-estimators or GR-estimators are a good choice for the real data application.

A Flexible Statistical Growth Model for Describing Plant Disease Progress (식물병(植物病) 진전(進展)의 한 유연적(柔軟的)인 통계적(統計的) 생장(生長) 모델)

  • Kim, Choong-Hoe
    • Korean journal of applied entomology
    • /
    • v.26 no.1 s.70
    • /
    • pp.31-36
    • /
    • 1987
  • A piecewise linear regression model able to describe disease progress curves with simplicity and flexibility was developed in this study. The model divides whole epidemic into several pieces of simple linear regression based on changes in pattern of disease progress in the epidemic and then incorporates the pieces of linear regression into a single mathematical function using indicator variables. When twelve epidemic data obtained from the field experiments were fitted to the piecewise linear regression model, logistic model and Gompertz model to compare statistical fit, goodness of fit was greatly improved with piecewise linear regression compared to other two models. Simplicity, flexibility, accuracy and ease in parameter estimation of the piece-wise linear regression model were described with examples of real epidemic data. The result in this study suggests that piecewise linear regression model is an useful technique for modeling plant disease epidemic.

  • PDF

Comparisons of Imputation Methods for Wave Nonresponse in Panel Surveys (패널조사 웨이브 무응답의 대체방법 비교)

  • Kim, Kyu-Seong;Park, In-Ho
    • Survey Research
    • /
    • v.11 no.1
    • /
    • pp.1-18
    • /
    • 2010
  • We compare various imputation methods for compensating wave nonresponse that are commonly adopted in many panel surveys. Unlike the cross-sectional survey, the panel survey is involved a time-effect in nonresponse in a sense that nonresponse may happen for some but not all waves. Thus, responses in neighboring waves can be used as powerful predictors for imputing wave nonresponse such as in longitudinal regression imputation, carry-over imputation, nearest neighborhood regression imputation and row-column imputation method. For comparison, we carry out a simulation study on a few income data from the Korean Welfare Panel Study based on two performance criteria: predictive accuracy and estimation accuracy. Our simulation shows that the ratio and row-column imputation methods are much more effective in terms of both criteria. Regression, longitudinal regression and carry-over imputation methods performed better in predictive accuracy, but less in estimation accuracy. On the other hand, nearest neighborhood, nearest neighbor regression and hot-deck imputation show higher performance in estimation accuracy but lower predictive accuracy. Finally, the mean imputation shows much lower performance in both criteria.

  • PDF

A study on bias effect of LASSO regression for model selection criteria (모형 선택 기준들에 대한 LASSO 회귀 모형 편의의 영향 연구)

  • Yu, Donghyeon
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.4
    • /
    • pp.643-656
    • /
    • 2016
  • High dimensional data are frequently encountered in various fields where the number of variables is greater than the number of samples. It is usually necessary to select variables to estimate regression coefficients and avoid overfitting in high dimensional data. A penalized regression model simultaneously obtains variable selection and estimation of coefficients which makes them frequently used for high dimensional data. However, the penalized regression model also needs to select the optimal model by choosing a tuning parameter based on the model selection criterion. This study deals with the bias effect of LASSO regression for model selection criteria. We numerically describes the bias effect to the model selection criteria and apply the proposed correction to the identification of biomarkers for lung cancer based on gene expression data.

A Study on Stochastic Estimation of Monthly Runoff by Multiple Regression Analysis (다중회귀분석에 의한 하천 월 유출량의 추계학적 추정에 관한 연구)

  • 김태철;정하우
    • Magazine of the Korean Society of Agricultural Engineers
    • /
    • v.22 no.3
    • /
    • pp.75-87
    • /
    • 1980
  • Most hydro]ogic phenomena are the complex and organic products of multiple causations like climatic and hydro-geological factors. A certain significant correlation on the run-off in river basin would be expected and foreseen in advance, and the effect of each these causual and associated factors (independant variables; present-month rainfall, previous-month run-off, evapotranspiration and relative humidity etc.) upon present-month run-off(dependent variable) may be determined by multiple regression analysis. Functions between independant and dependant variables should be treated repeatedly until satisfactory and optimal combination of independant variables can be obtained. Reliability of the estimated function should be tested according to the result of statistical criterion such as analysis of variance, coefficient of determination and significance-test of regression coefficients before first estimated multiple regression model in historical sequence is determined. But some error between observed and estimated run-off is still there. The error arises because the model used is an inadequate description of the system and because the data constituting the record represent only a sample from a population of monthly discharge observation, so that estimates of model parameter will be subject to sampling errors. Since this error which is a deviation from multiple regression plane cannot be explained by first estimated multiple regression equation, it can be considered as a random error governed by law of chance in nature. This unexplained variance by multiple regression equation can be solved by stochastic approach, that is, random error can be stochastically simulated by multiplying random normal variate to standard error of estimate. Finally hybrid model on estimation of monthly run-off in nonhistorical sequence can be determined by combining the determistic component of multiple regression equation and the stochastic component of random errors. Monthly run-off in Naju station in Yong-San river basin is estimated by multiple regression model and hybrid model. And some comparisons between observed and estimated run-off and between multiple regression model and already-existing estimation methods such as Gajiyama formula, tank model and Thomas-Fiering model are done. The results are as follows. (1) The optimal function to estimate monthly run-off in historical sequence is multiple linear regression equation in overall-month unit, that is; Qn=0.788Pn+0.130Qn-1-0.273En-0.1 About 85% of total variance of monthly runoff can be explained by multiple linear regression equation and its coefficient of determination (R2) is 0.843. This means we can estimate monthly runoff in historical sequence highly significantly with short data of observation by above mentioned equation. (2) The optimal function to estimate monthly runoff in nonhistorical sequence is hybrid model combined with multiple linear regression equation in overall-month unit and stochastic component, that is; Qn=0. 788Pn+0. l30Qn-1-0. 273En-0. 10+Sy.t The rest 15% of unexplained variance of monthly runoff can be explained by addition of stochastic process and a bit more reliable results of statistical characteristics of monthly runoff in non-historical sequence are derived. This estimated monthly runoff in non-historical sequence shows up the extraordinary value (maximum, minimum value) which is not appeared in the observed runoff as a random component. (3) "Frequency best fit coefficient" (R2f) of multiple linear regression equation is 0.847 which is the same value as Gaijyama's one. This implies that multiple linear regression equation and Gajiyama formula are theoretically rather reasonable functions.

  • PDF

Relationship between Urbanization and Cancer Incidence in Iran Using Quantile Regression

  • Momenyan, Somayeh;Sadeghifar, Majid;Sarvi, Fatemeh;Khodadost, Mahmoud;Mosavi-Jarrahi, Alireza;Ghaffari, Mohammad Ebrahim;Sekhavati, Eghbal
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.17 no.sup3
    • /
    • pp.113-117
    • /
    • 2016
  • Quantile regression is an efficient method for predicting and estimating the relationship between explanatory variables and percentile points of the response distribution, particularly for extreme percentiles of the distribution. To study the relationship between urbanization and cancer morbidity, we here applied quantile regression. This cross-sectional study was conducted for 9 cancers in 345 cities in 2007 in Iran. Data were obtained from the Ministry of Health and Medical Education and the relationship between urbanization and cancer morbidity was investigated using quantile regression and least square regression. Fitting models were compared using AIC criteria. R (3.0.1) software and the Quantreg package were used for statistical analysis. With the quantile regression model all percentiles for breast, colorectal, prostate, lung and pancreas cancers demonstrated increasing incidence rate with urbanization. The maximum increase for breast cancer was in the 90th percentile (${\beta}$=0.13, p-value<0.001), for colorectal cancer was in the 75th percentile (${\beta}$=0.048, p-value<0.001), for prostate cancer the 95th percentile (${\beta}$=0.55, p-value<0.001), for lung cancer was in 95th percentile (${\beta}$=0.52, p-value=0.006), for pancreas cancer was in 10th percentile (${\beta}$=0.011, p-value<0.001). For gastric, esophageal and skin cancers, with increasing urbanization, the incidence rate was decreased. The maximum decrease for gastric cancer was in the 90th percentile(${\beta}$=0.003, p-value<0.001), for esophageal cancer the 95th (${\beta}$=0.04, p-value=0.4) and for skin cancer also the 95th (${\beta}$=0.145, p-value=0.071). The AIC showed that for upper percentiles, the fitting of quantile regression was better than least square regression. According to the results of this study, the significant impact of urbanization on cancer morbidity requirs more effort and planning by policymakers and administrators in order to reduce risk factors such as pollution in urban areas and ensure proper nutrition recommendations are made.

Unmanned AerialVehicles Images Based Tidal Flat Surface Sedimentary Facies Mapping Using Regression Kriging (회귀 크리깅을 이용한 무인기 영상 기반의 갯벌 표층 퇴적상 분포도 작성)

  • Geun-Ho Kwak;Keunyong Kim;Jingyo Lee;Joo-Hyung Ryu
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_1
    • /
    • pp.537-549
    • /
    • 2023
  • The distribution characteristics of tidal flat sediment components are used as an essential data for coastal environment analysis and environmental impact assessment. Therefore, a reliable classification map of surface sedimentary facies is essential. This study evaluated the applicability of regression kriging to generate a classification map of the sedimentary facies of tidal flats. For this aim, various factors such as the number of field survey data and remote sensing-based auxiliary data, the effect of regression models on regression kriging, and the comparison with other prediction methods (univariate kriging and regression analysis) on surface sedimentary facies classification were investigated. To evaluate the applicability of regression kriging, a case study using unmanned aerial vehicle (UAV) data was conducted on the Hwang-do tidal flat located at Anmyeon-do, Taean-gun, Korea. As a result of the case study, it was most important to secure an appropriate amount of field survey data and to use topographic elevation and channel density as auxiliary data to produce a reliable tidal flat surface sediment facies classification map. In addition, regression kriging, which can consider detailed characteristics of the sediment distributions using ultra-high resolution UAV data, had the best prediction performance compared to other prediction methods. It is expected that this result can be used as a guideline to produce the tidal flat surface sedimentary facies classification map.

A Study on the Socio-economic Characteristics of the Angler Population and the Estimation of A Fishing Frequency Function (유어낚시인구의 사회경제학적 특성과 출조빈도함수의 추정에 관한 연구)

  • Park Cheol-Hyung
    • The Journal of Fisheries Business Administration
    • /
    • v.36 no.1 s.67
    • /
    • pp.81-101
    • /
    • 2005
  • This article is to estimate the fishing frequency function in Korean recreational fishery with respect to socio-economic characteristics of anglers. First, the study described the characteristics of the entire angler population on the view points of 9 socio-economic variables. And then, the study divided the total angler population into three groups of in-land, sea, and mixed angler populations in order to investigate the differences in their characteristics. The study could confirm the existence of differences in regions, size of regions, and educational levels between the in - land and the sea angler populations by testing heterogeneity in the frequency table. The fishing frequency function is estimated using Poisson regression model in order to accomodate the count data(non-negative discrete random variable) aspects of the fishing frequency. However, the model specification error is found due to overdispersion of data. The model exhibits the lack of goodness of fit. The negative binomial regression model is adopted to cure the overdispersion of the data as an alternative estimation methodology. Finally, the study can confirm overdispersion does not exist in the model any more and the goodness of fit improved significantly to the reasonable level. The results of estimation of fishing frequency population modeled by the negative binomial regression models are following. The three variables of region, sex, and education have effects on the decision making process of fishing frequency in the case of in-land recreation fishery. On the other hand, the three variables of sex, age, and marriage status do the same job in the case of sea angler population. Among the left-over variables, both income and use of Internet variables now affect on the process in mixed angler population. Finally, the results of whole angler population show that all of the previous variables are proven to be statistically significant due to the summation of data with all three sub-groups of angler population.

  • PDF