Search | Korea Science

Residuals Plots for Repeated Measures Data

PARK TAESUNG
- Proceedings of the Korean Statistical Society Conference
- /
- 2000.11a
- /
- pp.187-191
- /
- 2000
In the analysis of repeated measurements, multivariate regression models that account for the correlations among the observations from the same subject are widely used. Like the usual univariate regression models, these multivariate regression models also need some model diagnostic procedures. In this paper, we propose a simple graphical method to detect outliers and to investigate the goodness of model fit in repeated measures data. The graphical method is based on the quantile-quantile(Q-Q) plots of the $X^2$ distribution and the standard normal distribution. We also propose diagnostic measures to detect influential observations. The proposed method is illustrated using two examples.
PDF

Value at Risk Forecasting Based on Quantile Regression for GARCH Models

Lee, Sang-Yeol;Noh, Jung-Sik
- The Korean Journal of Applied Statistics
- /
- v.23 no.4
- /
- pp.669-681
- /
- 2010
Value-at-Risk(VaR) is an important part of risk management in the financial industry. This paper present a VaR forecasting for financial time series based on the quantile regression for GARCH models recently developed by Lee and Noh (2009). The proposed VaR forecasting features the direct conditional quantile estimation for GARCH models that is well connected with the model parameters. Empirical performance is measured by several backtesting procedures, and is reported in comparison with existing methods using sample quantiles.
https://doi.org/10.5351/KJAS.2010.23.4.669 인용 PDF KSCI

Wage Determinants Analysis by Quantile Regression Tree

Chang, Young-Jae
- Communications for Statistical Applications and Methods
- /
- v.19 no.2
- /
- pp.293-301
- /
- 2012
Quantile regression proposed by Koenker and Bassett (1978) is a statistical technique that estimates conditional quantiles. The advantage of using quantile regression is the robustness in response to large outliers compared to ordinary least squares(OLS) regression. A regression tree approach has been applied to OLS problems to fit flexible models. Loh (2002) proposed the GUIDE algorithm that has a negligible selection bias and relatively low computational cost. Quantile regression can be regarded as an analogue of OLS, therefore it can also be applied to GUIDE regression tree method. Chaudhuri and Loh (2002) proposed a nonparametric quantile regression method that blends key features of piecewise polynomial quantile regression and tree-structured regression based on adaptive recursive partitioning. Lee and Lee (2006) investigated wage determinants in the Korean labor market using the Korean Labor and Income Panel Study(KLIPS). Following Lee and Lee, we fit three kinds of quantile regression tree models to KLIPS data with respect to the quantiles, 0.05, 0.2, 0.5, 0.8, and 0.95. Among the three models, multiple linear piecewise quantile regression model forms the shortest tree structure, while the piecewise constant quantile regression model has a deeper tree structure with more terminal nodes in general. Age, gender, marriage status, and education seem to be the determinants of the wage level throughout the quantiles; in addition, education experience appears as the important determinant of the wage level in the highly paid group.
https://doi.org/10.5351/CKSS.2012.19.2.293 인용 PDF KSCI

Comparison of Regression Models for Estimating Ventilation Rate of Mechanically Ventilated Swine Farm (강제환기식 돈사의 환기량 추정을 위한 회귀모델의 비교)

Jo, Gwanggon;Ha, Taehwan;Yoon, Sanghoo;Jang, Yuna;Jung, Minwoong
- Journal of The Korean Society of Agricultural Engineers
- /
- v.62 no.1
- /
- pp.61-70
- /
- 2020
To estimate the ventilation volume of mechanically ventilated swine farms, various regression models were applied, and errors were compared to select the regression model that can best simulate actual data. Linear regression, linear spline, polynomial regression (degrees 2 and 3), logistic curve, generalized additive model (GAM), and gompertz curve were compared. Overfitting models were excluded even when the error rate was small. The evaluation criteria were root mean square error (RMSE) and mean absolute percentage error (MAPE). The evaluation results indicated that degree 3 exhibited the lowest error rate; however, an overestimation contradiction was observed in a certain section. The logistic curve was the most stable and superior to all the models. In the estimation of ventilation volume by all of the models, the estimated ventilation volume of the logistic curve was the smallest except for the model with a large error rate and the overestimated model.
https://doi.org/10.5389/KSAE.2020.62.1.061 인용 PDF KSCI

A Comparative Study of Estimation by Analogy using Data Mining Techniques

Nagpal, Geeta;Uddin, Moin;Kaur, Arvinder
- Journal of Information Processing Systems
- /
- v.8 no.4
- /
- pp.621-652
- /
- 2012
Software Estimations provide an inclusive set of directives for software project developers, project managers, and the management in order to produce more realistic estimates based on deficient, uncertain, and noisy data. A range of estimation models are being explored in the industry, as well as in academia, for research purposes but choosing the best model is quite intricate. Estimation by Analogy (EbA) is a form of case based reasoning, which uses fuzzy logic, grey system theory or machine-learning techniques, etc. for optimization. This research compares the estimation accuracy of some conventional data mining models with a hybrid model. Different data mining models are under consideration, including linear regression models like the ordinary least square and ridge regression, and nonlinear models like neural networks, support vector machines, and multivariate adaptive regression splines, etc. A precise and comprehensible predictive model based on the integration of GRA and regression has been introduced and compared. Empirical results have shown that regression when used with GRA gives outstanding results; indicating that the methodology has great potential and can be used as a candidate approach for software effort estimation.
https://doi.org/10.3745/JIPS.2012.8.4.621 인용 PDF KSCI

Prediction on Busan's Gross Product and Employment of Major Industry with Logistic Regression and Machine Learning Model (로지스틱 회귀모형과 머신러닝 모형을 활용한 주요산업의 부산 지역총생산 및 고용 효과 예측)

Chae-Deug Yi
- Korea Trade Review
- /
- v.47 no.2
- /
- pp.69-88
- /
- 2022
This paper aims to predict Busan's regional product and employment using the logistic regression models and machine learning models. The following are the main findings of the empirical analysis. First, the OLS regression model shows that the main industries such as electricity and electronics, machine and transport, and finance and insurance affect the Busan's income positively. Second, the binomial logistic regression models show that the Busan's strategic industries such as the future transport machinery, life-care, and smart marine industries contribute on the Busan's income in large order. Third, the multinomial logistic regression models show that the Korea's main industries such as the precise machinery, transport equipment, and machinery influence the Busan's economy positively. And Korea's exports and the depreciation can affect Busan's economy more positively at the higher employment level. Fourth, the voting ensemble model show the higher predictive power than artificial neural network model and support vector machine models. Furthermore, the gradient boosting model and the random forest show the higher predictive power than the voting model in large order.
https://doi.org/10.22659/KTRA.2022.47.2.69 인용 PDF

Development and Evaluation of Electronic Health Record Data-Driven Predictive Models for Pressure Ulcers (전자건강기록 데이터 기반 욕창 발생 예측모델의 개발 및 평가)

Park, Seul Ki;Park, Hyeoun-Ae;Hwang, Hee
- Journal of Korean Academy of Nursing
- /
- v.49 no.5
- /
- pp.575-585
- /
- 2019
Purpose: The purpose of this study was to develop predictive models for pressure ulcer incidence using electronic health record (EHR) data and to compare their predictive validity performance indicators with that of the Braden Scale used in the study hospital. Methods: A retrospective case-control study was conducted in a tertiary teaching hospital in Korea. Data of 202 pressure ulcer patients and 14,705 non-pressure ulcer patients admitted between January 2015 and May 2016 were extracted from the EHRs. Three predictive models for pressure ulcer incidence were developed using logistic regression, Cox proportional hazards regression, and decision tree modeling. The predictive validity performance indicators of the three models were compared with those of the Braden Scale. Results: The logistic regression model was most efficient with a high area under the receiver operating characteristics curve (AUC) estimate of 0.97, followed by the decision tree model (AUC 0.95), Cox proportional hazards regression model (AUC 0.95), and the Braden Scale (AUC 0.82). Decreased mobility was the most significant factor in the logistic regression and Cox proportional hazards models, and the endotracheal tube was the most important factor in the decision tree model. Conclusion: Predictive validity performance indicators of the Braden Scale were lower than those of the logistic regression, Cox proportional hazards regression, and decision tree models. The models developed in this study can be used to develop a clinical decision support system that automatically assesses risk for pressure ulcers to aid nurses.
https://doi.org/10.4040/jkan.2019.49.5.575 인용 PDF KSCI

FUZZY REGRESSION ANALYSIS WITH NON-SYMMETRIC FUZZY COEFFICIENTS BASED ON QUADRATIC PROGRAMMING APPROACH

Lee, Haekwan;Hideo Tanaka
- Proceedings of the Korean Institute of Intelligent Systems Conference
- /
- 1998.06a
- /
- pp.63-68
- /
- 1998
This paper proposes fuzzy regression analysis with non-symmetric fuzzy coefficients. By assuming non-symmetric triangular fuzzy coefficients and applying the quadratic programming fomulation, the center of the obtained fuzzy regression model attains more central tendency compared to the one with symmetric triangular fuzzy coefficients. For a data set composed of crisp inputs-fuzzy outputs, two approximation models called an upper approximation model and a lower approximation model are considered as the regression models. Thus, we also propose an integrated quadratic programming problem by which the upper approximation model always includes the lower approximation model at any threshold level under the assumption of the same centers in the two approximation models. Sensitivities of Weight coefficients in the proposed quadratic programming approaches are investigated through real data.
PDF

A Comparative Study on the Performance of Bayesian Partially Linear Models

Woo, Yoonsung;Choi, Taeryon;Kim, Wooseok
- Communications for Statistical Applications and Methods
- /
- v.19 no.6
- /
- pp.885-898
- /
- 2012
In this paper, we consider Bayesian approaches to partially linear models, in which a regression function is represented by a semiparametric additive form of a parametric linear regression function and a nonparametric regression function. We make a comparative study on the performance of widely used Bayesian partially linear models in terms of empirical analysis. Specifically, we deal with three Bayesian methods to estimate the nonparametric regression function, one method using Fourier series representation, the other method based on Gaussian process regression approach, and the third method based on the smoothness of the function and differencing. We compare the numerical performance of three methods by the root mean squared error(RMSE). For empirical analysis, we consider synthetic data with simulation studies and real data application by fitting each of them with three Bayesian methods and comparing the RMSEs.
https://doi.org/10.5351/CKSS.2012.19.6.885 인용 PDF KSCI

Forecasting Energy Consumption of Steel Industry Using Regression Model (회귀 모델을 활용한 철강 기업의 에너지 소비 예측)

Sung-Ho KANG;Hyun-Ki KIM
- Journal of Korea Artificial Intelligence Association
- /
- v.1 no.2
- /
- pp.21-25
- /
- 2023
The purpose of this study was to compare the performance using multiple regression models to predict the energy consumption of steel industry. Specific independent variables were selected in consideration of correlation among various attributes such as CO2 concentration, NSM, Week Status, Day of week, and Load Type, and preprocessing was performed to solve the multicollinearity problem. In data preprocessing, we evaluated linear and nonlinear relationships between each attribute through correlation analysis. In particular, we decided to select variables with high correlation and include appropriate variables in the final model to prevent multicollinearity problems. Among the many regression models learned, Boosted Decision Tree Regression showed the best predictive performance. Ensemble learning in this model was able to effectively learn complex patterns while preventing overfitting by combining multiple decision trees. Consequently, these predictive models are expected to provide important information for improving energy efficiency and management decision-making at steel industry. In the future, we plan to improve the performance of the model by collecting more data and extending variables, and the application of the model considering interactions with external factors will also be considered.
https://doi.org/10.24225/jkaia.2023.1.2.21 인용 PDF

Search Result 3,638, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)