• 제목/요약/키워드: Linear and multiple regression

검색결과 1,728건 처리시간 0.034초

MULTIPLE DELETION MEASURES OF TEST STATISTICS IN MULTIVARIATE REGRESSION

  • Jung, Kang-Mo
    • Journal of applied mathematics & informatics
    • /
    • 제26권3_4호
    • /
    • pp.679-688
    • /
    • 2008
  • In multivariate regression analysis there exist many influence measures on the regression estimates. However it seems to be few of influence diagnostics on test statistics in hypothesis testing. Case-deletion approach is fundamental for investigating influence of observations on estimates or statistics. Tang and Fung (1997) derived single case-deletion of the Wilks' ratio, Lawley-Hotelling trace, Pillai's trace for testing a general linear hypothesis of the regression coefficients in multivariate regression. In this paper we derived more extended form of those measures to deal with joint influence among observations. A numerical example is given to illustrate the effect of joint influence on the test statistics.

  • PDF

미계측 관측 강수 자료 생성을 통한 제주도 지역의 수문총량 추정 (Estimating the Total Precipitation Amount with Simulated Precipitation for Ungauged Stations in Jeju Island)

  • 김남원;엄명진;정일문;허준행
    • 한국수자원학회논문집
    • /
    • 제45권9호
    • /
    • pp.875-885
    • /
    • 2012
  • 본 연구에서는 미계측 강수자료를 생성하여 공간 해석함으로써 제주도의 정확한 수문총량을 산정하였다. 미계측 강수자료는 본 연구에서 제시된 수정된 다중회귀선형 모형으로 생성하였으며 공간강수량은 PRISM을 적용하여 구하였다. 수정된 다중선형회귀 모형에 의한 미계측 강수자료의 추정 값들은 기존의 강수 패턴과 유사한 양상을 나타내어 모형의 정확도가 우수한 것으로 나타났으며, 공간강수량의 해석결과는 Case 1(원자료)과 Case 2(미계측 강수자료를 보완한 자료)의 연평균 강수량이 약 1.5%의 미미한 차이를나타내었으나 고도별 연평균 강수량 차이는 최대 37.4%가 증가하는 것으로 산정되었다. 따라서 본 연구에서 제안한 미계측 관측 자료 생성방법은 현재 관측소의 밀도가 낮은 곳과 국지적으로 강수량의 변화가 큰 곳에서의 수문총량의 산정시 유용할 것으로 판단된다.

Ensemble variable selection using genetic algorithm

  • Seogyoung, Lee;Martin Seunghwan, Yang;Jongkyeong, Kang;Seung Jun, Shin
    • Communications for Statistical Applications and Methods
    • /
    • 제29권6호
    • /
    • pp.629-640
    • /
    • 2022
  • Variable selection is one of the most crucial tasks in supervised learning, such as regression and classification. The best subset selection is straightforward and optimal but not practically applicable unless the number of predictors is small. In this article, we propose directly solving the best subset selection via the genetic algorithm (GA), a popular stochastic optimization algorithm based on the principle of Darwinian evolution. To further improve the variable selection performance, we propose to run multiple GA to solve the best subset selection and then synthesize the results, which we call ensemble GA (EGA). The EGA significantly improves variable selection performance. In addition, the proposed method is essentially the best subset selection and hence applicable to a variety of models with different selection criteria. We compare the proposed EGA to existing variable selection methods under various models, including linear regression, Poisson regression, and Cox regression for survival data. Both simulation and real data analysis demonstrate the promising performance of the proposed method.

유역 토지이용과 저수지 수질의 상관관계 분석 (Correlation Analysis of Water Quality According to Land Use Types of Reservoir Watershed)

  • 윤동균;정상옥
    • 한국농공학회:학술대회논문집
    • /
    • 한국농공학회 2005년도 학술발표논문집
    • /
    • pp.614-619
    • /
    • 2005
  • The object of this study was to presented regression equations for obtaining simply and quickly values of water quality items, BOD, COD, T-N, and T-P. Regression equations obtained to analyze relationships for water quality items to land use types in agricultural reservoir watersheds. In order to derive regression equations, a multiple linear regression analysis was used in this studying reservoirs. In this regression analysis, a independent values used land used types and dependent values used BOD, COD, T-N, T-P values in water quality items. The results showed that numbers of regression equation ranging above 0.90 in a multiple correlation coefficient (MCC) was not found, ranging from 0.70 to 0.90 in the MCC was 6, ranging from 0.40 to 0.70 in the MCC was 20, and ranging from 0.20 to 0.40 in the MCC was 4. The results of this study can be used as a basic information for evaluating simply and quickly water quality for proposing and designing steps in water quality policy.

  • PDF

Procedures for Detecting Multiple Outliers in Linear Regression Using R

  • Kwon, Soon-Sun;Lee, Gwi-Hyun;Park, Sung-Hyun
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2005년도 추계 학술발표회 논문집
    • /
    • pp.13-17
    • /
    • 2005
  • In recent years, many people use R as a statistics system. R is frequently updated by many R project teams. We are interested in the method of multiple outlier detection and know that R is not supplied the method of multiple outlier detection. In this talk, we review these procedures for detecting multiple outliers and provide more efficient procedures combined with direct methods and indirect methods using R.

  • PDF

Interpretation of Relationship Between Sesame Yield and It's components under Early Sowing Cropping Condition

  • Shim Kang-Bo;Kang Churl-Whan;Seong Jae-Duck;Hwang Chung-Dong;Suh Duck-Yong
    • 한국작물학회지
    • /
    • 제51권4호
    • /
    • pp.269-273
    • /
    • 2006
  • Multiple linear regression analysis was conducted to interpretate the relationship between sesame grain yield and its components under early sowing cropping condition. The t test showed that stem length, number of capsules per plant, 1000 seeds weight and seed weight per plant gave significant contribution to sesame grain yield, therefore those variables were assumed to mostly influenced components to grain yield of sesame. In the stepwise regression analysis, the predicted equation for sesame grain yield per square meter (Y) was Y = -7.900 + 0.150X1 + 0.461X5 + 15.553X6 + 8.543X7. Meanwhile, F value showed that stem length, number of capsules per plant and seed weight per plant gave significant contribution to sesame grain yield, while 1000 seeds weight did not significantly show. Based on the results, it is reasonable to assume that high yield. potential of sesame under early sowing cropping condition would be obtained by selecting breeding lines with long stem length, number of capsules per plant, and seed weight per plant, which was different result at the late sowing cropping condition in which days to flowering and maturity were assumed to be more affected factors to the sesame grain yield.

커터수명지수 예측을 위한 다중선형회귀분석과 트리 기반 머신러닝 기법 적용 (Application of Multiple Linear Regression Analysis and Tree-Based Machine Learning Techniques for Cutter Life Index(CLI) Prediction)

  • 홍주표;고태영
    • 터널과지하공간
    • /
    • 제33권6호
    • /
    • pp.594-609
    • /
    • 2023
  • TBM 공법은 굴착면 안정성 확보 및 주변환경에 비치는 영향을 최소화하기 때문에 도심지나 하·해저터널 등에서 적용 사례가 증가하는 추세이다. 디스크 커터의 수명을 예측하는 대표적인 모델 중 NTNU모델은 커터수명지수(Cutter Life Index, CLI)를 주요 매개 변수로 활용하지만 복잡한 시험절차와 시험장비의 희귀성으로 측정에 어려움이 있다. 본 연구에서는 다중선형회귀분석과 트리 기반의 머신러닝 기법으로 암석물성을 활용하여 CLI를 예측하였다. 문헌 조사를 통해 암석의 일축압축강도, 압열인장강도, 등 가석영함량과 세르샤 마모지수 등을 포함한 데이터베이스를 구축하였고 파생변수를 계산하여 추가하였다. 다중선형회귀분석은 통계적 유의성과 다중공선성을 고려하여 입력 변수를 선정하였고 머신러닝 예측 모델은 변수 중요도를 기반으로 입력 변수를 선정하였다. 학습용과 검증용 데이터를 8:2로 나누어 모델 간 예측 성능을 비교한 결과 XGBoost가 최적의 모델로 선정되었다. 본 연구에서 도출된 다중선형회귀모델과 XGBoost모델을 선행 연구와 예측 성능을 비교하여 타당성을 확인하였다.

회귀 모델을 활용한 철강 기업의 에너지 소비 예측 (Forecasting Energy Consumption of Steel Industry Using Regression Model)

  • Sung-Ho KANG;Hyun-Ki KIM
    • Journal of Korea Artificial Intelligence Association
    • /
    • 제1권2호
    • /
    • pp.21-25
    • /
    • 2023
  • The purpose of this study was to compare the performance using multiple regression models to predict the energy consumption of steel industry. Specific independent variables were selected in consideration of correlation among various attributes such as CO2 concentration, NSM, Week Status, Day of week, and Load Type, and preprocessing was performed to solve the multicollinearity problem. In data preprocessing, we evaluated linear and nonlinear relationships between each attribute through correlation analysis. In particular, we decided to select variables with high correlation and include appropriate variables in the final model to prevent multicollinearity problems. Among the many regression models learned, Boosted Decision Tree Regression showed the best predictive performance. Ensemble learning in this model was able to effectively learn complex patterns while preventing overfitting by combining multiple decision trees. Consequently, these predictive models are expected to provide important information for improving energy efficiency and management decision-making at steel industry. In the future, we plan to improve the performance of the model by collecting more data and extending variables, and the application of the model considering interactions with external factors will also be considered.

중국연변과 한국 여학생소비자의 가치지향성이 재정관리 및 재정 만족도에 미치는 영향 (The Effects of Value Orientations on Financial Management and Financial Satisfaction of Girl Consumers in Yanbian, China and Those in South Korea)

  • 홍은실;양남희;김미라
    • 가정과삶의질연구
    • /
    • 제21권3호
    • /
    • pp.147-155
    • /
    • 2003
  • The purpose of this study is to investigate the effects of 4-value orientations on the financial management and the financial satisfaction of high school girls in Yanbian, China and those in South Korea. The subjects were 466 high school girls in Yanbian, China and 498 high school girls in South Korea. Cronbach'a, 1-test, and multiple regression were used as statistical analysis. The results were summarized as follows : 1) Resulting from the t-test, there were significant differences between 3-value orientations, 3-financial management behaviors, and financial satisfaction of the Yanbian girls and those of the Korean girls. 2) Resulting from multiple regression analysis, the financial management behaviors of school girls had the positive linear relationships with the variables such as 3-value orientations and country variable, The financial satisfaction of school girls had the positive linear relationships with the variables such as 4-value orientations and country variable.

3지와 4지 회전교차로의 사고분석 (Accident Analysis of 3-legged and 4-legged Roundabouts)

  • 박민규;박병호
    • 한국안전학회지
    • /
    • 제27권3호
    • /
    • pp.161-166
    • /
    • 2012
  • This study deals with the accident of roundabout. The objective is to analyze the traffic accidents occurred in 3-legged and 4-legged roundabouts through the developed models. In developing the multiple linear regression models, this study uses the number of traffic accidents as a dependent variable and such the variables as geometric structures, traffic characters and others as the independent variables. The correlation and multicollinearity of variables were analyzed using SPSS17.0. The main results are as follows. First, R-square value of developed models were analyzed to be 0.851(3-leg) and 0.689(4-leg), respectively. Second, the independent variables in the 3-legged roundabout accident model were analyzed to be the traffic volume and number of crosswalk, and the variables in the 4-legged roundabouts were evaluated to be the traffic volume and signal. Finally, the paired t-test shows that the predicted values and observed values are not statistically different.