• 제목/요약/키워드: Multivariate Regression

검색결과 1,491건 처리시간 0.024초

Matrix Formation in Univariate and Multivariate General Linear Models

  • Arwa A. Alkhalaf
    • International Journal of Computer Science & Network Security
    • /
    • 제24권4호
    • /
    • pp.44-50
    • /
    • 2024
  • This paper offers an overview of matrix formation and calculation techniques within the framework of General Linear Models (GLMs). It takes a sequential approach, beginning with a detailed exploration of matrix formation and calculation methods in regression analysis and univariate analysis of variance (ANOVA). Subsequently, it extends the discussion to cover multivariate analysis of variance (MANOVA). The primary objective of this study was to provide a clear and accessible explanation of the underlying matrices that play a crucial role in GLMs. Through linking, essentially different statistical methods, by fundamental principles and algebraic foundations that underpin the GLM estimation. Insights presented here aim to assist researchers, statisticians, and data analysts in enhancing their understanding of GLMs and their practical implementation in diverse research domains. This paper contributes to a better comprehension of the matrix-based techniques that can be extended to GLMs.

기후변화 시나리오를 적용한 산사태 피해면적 변화 예측 (Predicting Landslide Damaged Area According to Climate Change Scenarios)

  • 유송
    • 한국농림기상학회지
    • /
    • 제25권4호
    • /
    • pp.376-386
    • /
    • 2023
  • 기후변화로 인해 우리나라의 산사태 피해는 지속적으로 증가하고 있다. 사방사업 등 산사태 피해저감을 효과적으로 수립하기 위해서는 기후변화 영향을 고려하여 장기간의 산사태 위험도를 추정할 필요가 있다. 이 연구에서는 다변량 회귀분석을 통해 기후변화에 따른 산사태 피해면적의 변화를 예측하였다. 1980-2010 년의 산사태 피해면적과 강우관측자료를 학습자료로 적용하여 다변량 회귀모형을 구축하였다. 이때 강우관측자료를 통해 SSP 시나리오에서는 제공하는 7가지 강우인자를 추출하였다. 이후 분산팽창지수로 다중공선성을 검정하고 주성분 분석을 통해 차원을 축소하여 2개의 주성분을 독립변인으로 하여 산사태 피해면적 추정 모형을 도출하였다. 기후변화 시나리오를 활용하여 2030-2100년까지의 산사태 피해면적 변화를 추정한 결과, 산사태 피해면적은 1981년-2010년의 연평균 산사태 면적의 최대 2배 이상으로 증가하는 것으로 나타났다. 이 연구의 결과는 미래 기후변화를 고려한 산사태 피해저감 대책 수립 및 보강의 필요성을 제시하는 기초자료로 활용 가능할 것으로 보인다.

주성분분석에 의한 결손 자료의 영향값 검출에 대한 연구 (Detecting Influential Observations in Multivariate Statistical Analysis of Incomplete Data by PCA)

  • 김현정;문승호;신재경
    • 응용통계연구
    • /
    • 제13권2호
    • /
    • pp.383-392
    • /
    • 2000
  • 1970년대 후반부터 영향력이 있는 관측값을 검출하기 위해서 회귀분석을 포함한 다양한 다변량 해석법에서의 영향분석 및 감도분석에 대한 연구가 진행되어 왔다. 결손 값이 포함된 불완전한 자료에 관해서도 이러한 연구가 필요하다. 이와 관련하여 Kim et al.(1998)등은 평균벡터와 분산공분산행렬에 대한 최우추정값에 초점을 두고 불완전한 자료에 대한 다변량 해석법에서의 감도분석에 관한 방법적 연구를 다루었다. Kim et al.(1998)에서는 Cook’s D 통계량을 이용하였으나, 본 논문에서는 결손값이 있는 다변량 자료에 대해서 주성분을 이용하여 영향력이 있는 관측값을 검출하는 방법에 대해서 살펴보았다. 이 때, 결손값은 EM알고리즘에 의해 대치하여 PCA 통계량을 유도하였다.

  • PDF

다중선형회귀모형에서의 변수선택기법 평가 (Evaluating Variable Selection Techniques for Multivariate Linear Regression)

  • 류나현;김형석;강필성
    • 대한산업공학회지
    • /
    • 제42권5호
    • /
    • pp.314-326
    • /
    • 2016
  • The purpose of variable selection techniques is to select a subset of relevant variables for a particular learning algorithm in order to improve the accuracy of prediction model and improve the efficiency of the model. We conduct an empirical analysis to evaluate and compare seven well-known variable selection techniques for multiple linear regression model, which is one of the most commonly used regression model in practice. The variable selection techniques we apply are forward selection, backward elimination, stepwise selection, genetic algorithm (GA), ridge regression, lasso (Least Absolute Shrinkage and Selection Operator) and elastic net. Based on the experiment with 49 regression data sets, it is found that GA resulted in the lowest error rates while lasso most significantly reduces the number of variables. In terms of computational efficiency, forward/backward elimination and lasso requires less time than the other techniques.

An estimator of the mean of the squared functions for a nonparametric regression

  • Park, Chun-Gun
    • Journal of the Korean Data and Information Science Society
    • /
    • 제20권3호
    • /
    • pp.577-585
    • /
    • 2009
  • So far in a nonparametric regression model one of the interesting problems is estimating the error variance. In this paper we propose an estimator of the mean of the squared functions which is the numerator of SNR (Signal to Noise Ratio). To estimate SNR, the mean of the squared function should be firstly estimated. Our focus is on estimating the amplitude, that is the mean of the squared functions, in a nonparametric regression using a simple linear regression model with the quadratic form of observations as the dependent variable and the function of a lag as the regressor. Our method can be extended to nonparametric regression models with multivariate functions on unequally spaced design points or clustered designed points.

  • PDF

Estimation of Water Quality of Fish Farms using Multivariate Statistical Analysis

  • Ceong, Hee-Taek;Kim, Hae-Ran
    • Journal of information and communication convergence engineering
    • /
    • 제9권4호
    • /
    • pp.475-482
    • /
    • 2011
  • In this research, we have attempted to estimate the water quality of fish farms in terms of parameters such as water temperature, dissolved oxygen, pH, and salinity by employing observational data obtained from a coastal ocean observatory of a national institution located close to the fish farm. We requested and received marine data comprising nine factors including water temperature from Korea Hydrographic and Oceanographic Administration. For verifying our results, we also established an experimental fish farm in which we directly placed the sensor module of an optical mode, YSI-6920V2, used for self-cleaning inside fish tanks and used the data measured and recorded by a environment monitoring system that was communicating serially with the sensor module. We investigated the differences in water temperature and salinity among three areas - Goheung Balpo, Yeosu Odongdo, and the experimental fish farm, Keumho. Water temperature did not exhibit significant differences but there was a difference in salinity (significance <5%). Further, multiple regression analysis was performed to estimate the water quality of the fish farm at Keumho based on the data of Goheung Balpo. The water temperature and dissolved-oxygen estimations had multiple regression linear relationships with coefficients of determination of 98% and 89%, respectively. However, in the case of the pH and salinity estimated using the oceanic environment with nine factors, the adjusted coefficient of determination was very low at less than 10%, and it was therefore difficult to predict the values. We plotted the predicted and measured values by employing the estimated regression equation and found them to fit very well; the values were close to the regression line. We have demonstrated that if statistical model equations that fit well are used, the expense of fish-farm sensor and system installations, maintenances, and repairs, which is a major issue with existing environmental information monitoring systems of marine farming areas, can be reduced, thereby making it easier for fish farmers to monitor aquaculture and mariculture environments.

線型回歸係數의 二變量 t 有意性 檢定 (On Bivariate-t Significance Tests of Linear Regression Coefficients)

  • 김강균
    • Journal of the Korean Statistical Society
    • /
    • 제5권1호
    • /
    • pp.3-18
    • /
    • 1976
  • To test simultaneous significance of more than two linear regression coefficients, we can consider multivariate-t tests with critical regions in t-space instead of F-tests where t-values are t-statistics of significance tests of one coefficient. In this paper bivariate-t distributions and bivariate-t tests of two coefficients such as maxmod, minmod, one-tailed maxmod and one-tailed minmod tests are studied. Through the calculation of powers of test, it is learned that in some cases bivariate-t test are more powerful than F-tests.

  • PDF

Canonical Correlation: Permutation Tests and Regression

  • Yoo, Jae-Keun;Kim, Hee-Youn;Um, Hye-Yeon
    • Communications for Statistical Applications and Methods
    • /
    • 제19권3호
    • /
    • pp.471-478
    • /
    • 2012
  • In this paper, we present a permutation test to select the number of pairs of canonical variates in canonical correlation analysis. The existing chi-squared test is known to be limited to normality in use. We compare the existing test with the proposed permutation test and study their asymptotic behaviors through numerical studies. In addition, we connect canonical correlation analysis to regression and we we show that certain inferences in regression can be done through canonical correlation analysis. A regression analysis of real data through canonical correlation analysis is illustrated.

A Short Note on Empirical Penalty Term Study of BIC in K-means Clustering Inverse Regression

  • Ahn, Ji-Hyun;Yoo, Jae-Keun
    • Communications for Statistical Applications and Methods
    • /
    • 제18권3호
    • /
    • pp.267-275
    • /
    • 2011
  • According to recent studies, Bayesian information criteria(BIC) is proposed to determine the structural dimension of the central subspace through sliced inverse regression(SIR) with high-dimensional predictors. The BIC may be useful in K-means clustering inverse regression(KIR) with high-dimensional predictors. However, the direct application of the BIC to KIR may be problematic, because the slicing scheme in SIR is not the same as that of KIR. In this paper, we present empirical penalty term studies of BIC in KIR to identify the most appropriate one. Numerical studies and real data analysis are presented.

Prediction of Length of ICU Stay Using Data-mining Techniques: an Example of Old Critically Ill Postoperative Gastric Cancer Patients

  • Zhang, Xiao-Chun;Zhang, Zhi-Dan;Huang, De-Sheng
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제13권1호
    • /
    • pp.97-101
    • /
    • 2012
  • Objective: With the background of aging population in China and advances in clinical medicine, the amount of operations on old patients increases correspondingly, which imposes increasing challenges to critical care medicine and geriatrics. The study was designed to describe information on the length of ICU stay from a single institution experience of old critically ill gastric cancer patients after surgery and the framework of incorporating data-mining techniques into the prediction. Methods: A retrospective design was adopted to collect the consecutive data about patients aged 60 or over with a gastric cancer diagnosis after surgery in an adult intensive care unit in a medical university hospital in Shenyang, China, from January 2010 to March 2011. Characteristics of patients and the length their ICU stay were gathered for analysis by univariate and multivariate Cox regression to examine the relationship with potential candidate factors. A regression tree was constructed to predict the length of ICU stay and explore the important indicators. Results: Multivariate Cox analysis found that shock and nutrition support need were statistically significant risk factors for prolonged length of ICU stay. Altogether, eight variables entered the regression model, including age, APACHE II score, SOFA score, shock, respiratory system dysfunction, circulation system dysfunction, diabetes and nutrition support need. The regression tree indicated comorbidity of two or more kinds of shock as the most important factor for prolonged length of ICU stay in the studied sample. Conclusions: Comorbidity of two or more kinds of shock is the most important factor of length of ICU stay in the studied sample. Since there are differences of ICU patient characteristics between wards and hospitals, consideration of the data-mining technique should be given by the intensivists as a length of ICU stay prediction tool.