• Title/Summary/Keyword: Multivariate Regression

Search Result 1,491, Processing Time 0.031 seconds

Matrix Formation in Univariate and Multivariate General Linear Models

  • Arwa A. Alkhalaf
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.4
    • /
    • pp.44-50
    • /
    • 2024
  • This paper offers an overview of matrix formation and calculation techniques within the framework of General Linear Models (GLMs). It takes a sequential approach, beginning with a detailed exploration of matrix formation and calculation methods in regression analysis and univariate analysis of variance (ANOVA). Subsequently, it extends the discussion to cover multivariate analysis of variance (MANOVA). The primary objective of this study was to provide a clear and accessible explanation of the underlying matrices that play a crucial role in GLMs. Through linking, essentially different statistical methods, by fundamental principles and algebraic foundations that underpin the GLM estimation. Insights presented here aim to assist researchers, statisticians, and data analysts in enhancing their understanding of GLMs and their practical implementation in diverse research domains. This paper contributes to a better comprehension of the matrix-based techniques that can be extended to GLMs.

Predicting Landslide Damaged Area According to Climate Change Scenarios (기후변화 시나리오를 적용한 산사태 피해면적 변화 예측)

  • Song Eu
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.25 no.4
    • /
    • pp.376-386
    • /
    • 2023
  • Due to climate changes, landslide hazards in the Republic of Korea (hereafter South Korea) continuously increase. To establish the effective landslide mitigation strategies, such as erosion control works, landslide hazard estimation in the long-term perspective should be proceeded considering the influence of climate changes. In this study, we examined the change in landslide-damaged areas in South Korea responding to climate change scenarios using the multivariate regression method. Data on landslide-damaged areas and rainfall from 1981-2010 were used as a training dataset. Sev en indices were deriv ed from rainfall data as the model's input data, corresponding to rainfall indices provided from two SSP scenarios for South Korea: SSP1-2.6 and SSP5-8.5. Prior to the multivariate regression analysis, we conducted the VIF test and the dimension analysis of regression model using PCA. Based on the result of PCA, we developed a regression model for landslide damaged area estimation with two principal components, which cov ered about 93% of total v ariance. With climate change scenarios, we simulated landslide-damaged areas in 2030-2100 using the regression model. As a result, the landslide-damaged area will be enlarged more than the double of current annual mean landslide damaged area of 1981-2010; It infers that landslide mitigation strategies should be reinforced considering the future climate condition.

Detecting Influential Observations in Multivariate Statistical Analysis of Incomplete Data by PCA (주성분분석에 의한 결손 자료의 영향값 검출에 대한 연구)

  • 김현정;문승호;신재경
    • The Korean Journal of Applied Statistics
    • /
    • v.13 no.2
    • /
    • pp.383-392
    • /
    • 2000
  • Since late 1970, methods of influence or sensitivity analysis for detecting influential observations have been studied not only in regression and related methods but also in various multivariate methods. If results of multivariate analyses sometimes depend heavily on a small number of observations, we should be very careful to draw a conclusion. Similar phenomena may also occur in the case of incomplete data. In this research we try to study such influential observations in multivariate statistical analysis of incomplete data. Case of principal component analysis is studied with a numerical example.

  • PDF

Evaluating Variable Selection Techniques for Multivariate Linear Regression (다중선형회귀모형에서의 변수선택기법 평가)

  • Ryu, Nahyeon;Kim, Hyungseok;Kang, Pilsung
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.42 no.5
    • /
    • pp.314-326
    • /
    • 2016
  • The purpose of variable selection techniques is to select a subset of relevant variables for a particular learning algorithm in order to improve the accuracy of prediction model and improve the efficiency of the model. We conduct an empirical analysis to evaluate and compare seven well-known variable selection techniques for multiple linear regression model, which is one of the most commonly used regression model in practice. The variable selection techniques we apply are forward selection, backward elimination, stepwise selection, genetic algorithm (GA), ridge regression, lasso (Least Absolute Shrinkage and Selection Operator) and elastic net. Based on the experiment with 49 regression data sets, it is found that GA resulted in the lowest error rates while lasso most significantly reduces the number of variables. In terms of computational efficiency, forward/backward elimination and lasso requires less time than the other techniques.

An estimator of the mean of the squared functions for a nonparametric regression

  • Park, Chun-Gun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.3
    • /
    • pp.577-585
    • /
    • 2009
  • So far in a nonparametric regression model one of the interesting problems is estimating the error variance. In this paper we propose an estimator of the mean of the squared functions which is the numerator of SNR (Signal to Noise Ratio). To estimate SNR, the mean of the squared function should be firstly estimated. Our focus is on estimating the amplitude, that is the mean of the squared functions, in a nonparametric regression using a simple linear regression model with the quadratic form of observations as the dependent variable and the function of a lag as the regressor. Our method can be extended to nonparametric regression models with multivariate functions on unequally spaced design points or clustered designed points.

  • PDF

Estimation of Water Quality of Fish Farms using Multivariate Statistical Analysis

  • Ceong, Hee-Taek;Kim, Hae-Ran
    • Journal of information and communication convergence engineering
    • /
    • v.9 no.4
    • /
    • pp.475-482
    • /
    • 2011
  • In this research, we have attempted to estimate the water quality of fish farms in terms of parameters such as water temperature, dissolved oxygen, pH, and salinity by employing observational data obtained from a coastal ocean observatory of a national institution located close to the fish farm. We requested and received marine data comprising nine factors including water temperature from Korea Hydrographic and Oceanographic Administration. For verifying our results, we also established an experimental fish farm in which we directly placed the sensor module of an optical mode, YSI-6920V2, used for self-cleaning inside fish tanks and used the data measured and recorded by a environment monitoring system that was communicating serially with the sensor module. We investigated the differences in water temperature and salinity among three areas - Goheung Balpo, Yeosu Odongdo, and the experimental fish farm, Keumho. Water temperature did not exhibit significant differences but there was a difference in salinity (significance <5%). Further, multiple regression analysis was performed to estimate the water quality of the fish farm at Keumho based on the data of Goheung Balpo. The water temperature and dissolved-oxygen estimations had multiple regression linear relationships with coefficients of determination of 98% and 89%, respectively. However, in the case of the pH and salinity estimated using the oceanic environment with nine factors, the adjusted coefficient of determination was very low at less than 10%, and it was therefore difficult to predict the values. We plotted the predicted and measured values by employing the estimated regression equation and found them to fit very well; the values were close to the regression line. We have demonstrated that if statistical model equations that fit well are used, the expense of fish-farm sensor and system installations, maintenances, and repairs, which is a major issue with existing environmental information monitoring systems of marine farming areas, can be reduced, thereby making it easier for fish farmers to monitor aquaculture and mariculture environments.

On Bivariate-t Significance Tests of Linear Regression Coefficients (線型回歸係數의 二變量 t 有意性 檢定)

  • Kim, Kang Kyun
    • Journal of the Korean Statistical Society
    • /
    • v.5 no.1
    • /
    • pp.3-18
    • /
    • 1976
  • To test simultaneous significance of more than two linear regression coefficients, we can consider multivariate-t tests with critical regions in t-space instead of F-tests where t-values are t-statistics of significance tests of one coefficient. In this paper bivariate-t distributions and bivariate-t tests of two coefficients such as maxmod, minmod, one-tailed maxmod and one-tailed minmod tests are studied. Through the calculation of powers of test, it is learned that in some cases bivariate-t test are more powerful than F-tests.

  • PDF

Canonical Correlation: Permutation Tests and Regression

  • Yoo, Jae-Keun;Kim, Hee-Youn;Um, Hye-Yeon
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.3
    • /
    • pp.471-478
    • /
    • 2012
  • In this paper, we present a permutation test to select the number of pairs of canonical variates in canonical correlation analysis. The existing chi-squared test is known to be limited to normality in use. We compare the existing test with the proposed permutation test and study their asymptotic behaviors through numerical studies. In addition, we connect canonical correlation analysis to regression and we we show that certain inferences in regression can be done through canonical correlation analysis. A regression analysis of real data through canonical correlation analysis is illustrated.

A Short Note on Empirical Penalty Term Study of BIC in K-means Clustering Inverse Regression

  • Ahn, Ji-Hyun;Yoo, Jae-Keun
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.3
    • /
    • pp.267-275
    • /
    • 2011
  • According to recent studies, Bayesian information criteria(BIC) is proposed to determine the structural dimension of the central subspace through sliced inverse regression(SIR) with high-dimensional predictors. The BIC may be useful in K-means clustering inverse regression(KIR) with high-dimensional predictors. However, the direct application of the BIC to KIR may be problematic, because the slicing scheme in SIR is not the same as that of KIR. In this paper, we present empirical penalty term studies of BIC in KIR to identify the most appropriate one. Numerical studies and real data analysis are presented.

Prediction of Length of ICU Stay Using Data-mining Techniques: an Example of Old Critically Ill Postoperative Gastric Cancer Patients

  • Zhang, Xiao-Chun;Zhang, Zhi-Dan;Huang, De-Sheng
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.13 no.1
    • /
    • pp.97-101
    • /
    • 2012
  • Objective: With the background of aging population in China and advances in clinical medicine, the amount of operations on old patients increases correspondingly, which imposes increasing challenges to critical care medicine and geriatrics. The study was designed to describe information on the length of ICU stay from a single institution experience of old critically ill gastric cancer patients after surgery and the framework of incorporating data-mining techniques into the prediction. Methods: A retrospective design was adopted to collect the consecutive data about patients aged 60 or over with a gastric cancer diagnosis after surgery in an adult intensive care unit in a medical university hospital in Shenyang, China, from January 2010 to March 2011. Characteristics of patients and the length their ICU stay were gathered for analysis by univariate and multivariate Cox regression to examine the relationship with potential candidate factors. A regression tree was constructed to predict the length of ICU stay and explore the important indicators. Results: Multivariate Cox analysis found that shock and nutrition support need were statistically significant risk factors for prolonged length of ICU stay. Altogether, eight variables entered the regression model, including age, APACHE II score, SOFA score, shock, respiratory system dysfunction, circulation system dysfunction, diabetes and nutrition support need. The regression tree indicated comorbidity of two or more kinds of shock as the most important factor for prolonged length of ICU stay in the studied sample. Conclusions: Comorbidity of two or more kinds of shock is the most important factor of length of ICU stay in the studied sample. Since there are differences of ICU patient characteristics between wards and hospitals, consideration of the data-mining technique should be given by the intensivists as a length of ICU stay prediction tool.