• Title/Summary/Keyword: 다중 공선성

Search Result 120, Processing Time 0.04 seconds

Prediction of Maximal Oxygen Uptake Ages 18~34 Years (18~34 남성의 최대산소 섭취량 추정)

  • Jeon, Yoo-Joung;Im, Jae-Hyeng;Lee, Byung-Kun;Kim, Chang-Hwan;Kim, Byeong-Wan
    • 한국체육학회지인문사회과학편
    • /
    • v.51 no.3
    • /
    • pp.373-382
    • /
    • 2012
  • The purpose of this study is to predict VO2max with body index and submaximal metabolic responses. The subjects are consisted of 250 male aging from 18 to 34 and we separated them into two groups randomly; 179 for a sample, 71 for a cross-validation group. They went through maximal exercise testing with Bruce protocol, and we measured the metabolic responses in the end of the first(3 minute) and second stage(6 minute). To predict VO2max, we applied multiple regression analysis to the sample with stepwise method. Model 1's variables are weight, 6 minute HR and 6 minute VO2(R=0.64, SEE=4.74, CV=11.7%, p<.01), and the equation is VO2max(ml/kg/min)= 72.256-0.340(Weight)-0.220(6minHR)+0.013(6minVO2). Model 2's variables are weight, 6 minute HR, 6 minute VO2, and 6 minute VCO2(R=0.67, SEE=4.59, CV=11.3%, p<.01), and the equation is VO2max(ml/kg/min)= 68.699-0.277(Weight) -0.206(6minHR)+0.020(6minVO2)-0.009(6minVCO2). And the result did not show multicolinearity for both models. Model 2 demonstrated more correlation compared to Model 1. However, when we conducted cross-validation of those models with 71 men, measured VO2max and estimated VO2 Max had statistical significance with correlation (R=0.53, 0.56, P<.01). Although both models are functional with validity considering their simplicity and utility, Model 2 has more accuracy.

A Study on Factors of Education's Outcome using Decision Trees (의사결정트리를 이용한 교육성과 요인에 관한 연구)

  • Kim, Wan-Seop
    • Journal of Engineering Education Research
    • /
    • v.13 no.4
    • /
    • pp.51-59
    • /
    • 2010
  • In order to manage the lectures efficiently in the university and improve the educational outcome, the process is needed that make diagnosis of the present educational outcome of each classes on a lecture and find factors of educational outcome. In most studies for finding the factors of the efficient lecture, statistical methods such as association analysis, regression analysis are used usually, and recently decision tree analysis is employed, too. The decision tree analysis have the merits that is easy to understand a result model, and to be easy to apply for the decision making, but have the weaknesses that is not strong for characteristic of input data such as multicollinearity. This paper indicates the weaknesses of decision tree analysis, and suggests the experimental solution using multiple decision tree algorithm to supplement these problems. The experimental result shows that the suggested method is more effective in finding the reliable factors of the educational outcome.

  • PDF

Development of Accident Forecasting Models in Freeway Tunnels using Multiple Linear Regression Analysis (다중선형 회귀분석을 이용한 고속도로 터널구간의 교통사고 예측모형 개발)

  • Park, Ju-Hwan;Kim, Sang-Gu
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.11 no.6
    • /
    • pp.145-154
    • /
    • 2012
  • This paper analyzed the characteristics of traffic accidents in all tunnels on nationwide freeways and selected some various independent variables related to accident occurrence in tunnels. The study aims to develop reliable accident forecasting models using the various dependent variables such as the number of accident (no.), no./km, and no./MVK. Finally, reliable multiple linear regression models were proposed in this paper. This study tested the validity verification of developed models through statistics such as $R^2$, F values, multicollinearity, residual analysis. The paper selected the accident forecasting models considering the characteristics of tunnel accidents and two models were finally proposed according to two groups of tunnel length. In the selected models, natural logarithm of ln(no./MVK) is used for the dependent variable and AADT, vertical slope, and tunnel hight are used for the independent variables. The reliability of two models was proved by the comparison analysis between field data and estimating data using RMSE and MAE. These models may be not only effective in evaluating tunnel safety under design and planning phases of tunnel but also useful to reduce traffic accidents in tunnels and to manage the traffic flow of tunnel.

청년창업자의 경영성과에 영향을 미치는 요인

  • Park, Mi-Ryeo;Yang, Yeong-Seok;Kim, Myeong-Suk
    • 한국벤처창업학회:학술대회논문집
    • /
    • 2017.04a
    • /
    • pp.44-44
    • /
    • 2017
  • 본 연구는 청년창업자를 대상으로 청년창업자의 역량과 경영성과 사이에는 어떠한 관계가 있는지에 대해 살펴보고자 하였다. 연구의 자료는 한국노동연구원의 '청년패널조사(2015)' 9차년도 자료를 사용하였다. 본 연구의 표본은 비임금 근로자 중 학력은 전문대졸이상 이며, 가업을 물려받은 경우를 제외한 창업을 한 청년 182명을 최종 분석대상자로 선정하였다. 조사대상자의 일반적 특성을 알아보기 위해 빈도, 백분율, 평균, 표준편차를 산출하였고, 변인들 간의 다중공선성을 살펴보기 위해 상관관계분석을 실시하였다. 또 청년창업자의 경영성과에 미치는 영향요인을 살펴보기 위해 위계적 회귀분석을 실시하였다. 본 연구에 사용된 자료는 IBM SPSS Statistic 22.0을 이용하여 분석하였다. 본 연구는 청년창업자의 경영성과에 영향을 미치는 요인을 분석하기 위해 청년창업가의 역량으로 창업준비역량, 기업가역량, 관리역량 등 결정요인을 도출하고, 이들 요인과 경영성과 간의 가설을 설정하고, 이를 분석하고자 하였다. 본 연구의 결과는 다음과 같다. 청년창업자의 경영성과에 영향을 미친 요인은 교육수준 대비 일수준이 낮을수록, 전공이 일치하지 않을수록, 직무만족이 높을수록, 창업총자본금이 많을수록 경영성과가 높은 것으로 나타났다.

  • PDF

Regression by Least Absolute Value Method with L1-constraint on Parameters

  • 고영현;전치혁
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2003.05a
    • /
    • pp.151-157
    • /
    • 2003
  • OLS로 알려진 기존의 주절 방법은 변수수의 증가에 따라 다중공선성(Multicollinearity)의 문제와 더불어 해석력(Interpretability)이 떨어지는 문제를 가지게 된다. 본 연구에서는 파라미터의 절대값의 크기(L1-Norm)에 제약을 줌으로써 이와 같은 OLS의 문제를 해결할 수 있는 동시에, 잔차의 제곱합대신 절대오차를 사용하는 Least Absolute Value(LAV) 방법을 사용함으로써 이상치에 로버스트한 결과를 주는 방법론을 제안한다. 또한. 본 연구에서 제안하는 방법이 선형계획법에 의해 모델처럼 될 수 있는 특성으로 인해 제약조건이 있는 이차 형태의 최적화 문제보다 수행 속도면에서 뛰어난 결과를 주는 것을 수치예제을 통해 보인다.

  • PDF

The Effects of Fundamental Variables on Stock Returns - Evidence from Panel Data (기본적 변수가 주식수익률에 미치는 영향 - 패널자료로부터의 근거)

  • Lee, Hae-Young;Kam, Hyung-Kyu
    • Proceedings of the KAIS Fall Conference
    • /
    • 2011.12a
    • /
    • pp.21-24
    • /
    • 2011
  • 본 연구는 기업규모, 장부가치/시장가치 비율, 순이익/주가 비율, 현금흐름/주가 비율, 레버리지 등 기본적 변수를 사용하여 주식수익률에 유의적인 변수를 확인하고자 하였다. 이를 위해 본 연구에서는 횡단면 자료와 시계열 자료를 결합한 패널자료(panel data)를 이용하여 패널자료분석방법으로 연구모형을 실증적으로 분석하였다. 일반적으로 패널자료를 사용하면 Hsiao(2003)가 지적한 바와 같이 표본의 크기를 확대시켜 자유도를 증가시키고 이론적으로 설명변수간 다중공선성(muti-collinearity) 문제를 완화할 수 있다. 실증분석결과에 의하면 기업규모(SIZ), 장부가치/시장가치 비율(B/M), 순이익/주가 비율(E/P), 현금흐름/주가 비율(C/P) 등이 주식수익률의 횡단면적 차이를 설명할 수 있는 유의적인 변수라 할 수 있다.

  • PDF

Predicting Export Change Rate using Machine Learning Methods (기계학습을 활용한 수출증감률 예측)

  • Chaerin Ahn;Heonchang Yu
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.536-538
    • /
    • 2023
  • 수출의존도가 높은 한국은 코로나19 팬데믹, 우크라이나-러시아 전쟁 등 대외환경의 변화에 따른 수출 여건에 민감할 수 밖에 없는 환경이다. 이에 발 빠르게 대응하기 위해 정확한 수출증감률 예측이 필요하며 이를 가장 잘 수행할 수 있는 예측모델을 찾고자 한다. 수출에 영향을 끼치는 주요변수 선정 후, min-max 정규화를 시행하고 변수간 상관계수와 다중공선성 확인을 통해 변수를 축소했다. 그리고 머신러닝 예측모델로 많이 사용되는 Linear Regression, Decision Tree, Gradient Boost Regressor, Random Forest 4가지 모델에 대입하여 수출 증감률 예측 정확도를 비교했다. 그 결과, Linear Regression의 MSE가 0.087로 가장 낮아 제일 우수한 모델이라는 결론에 도달했다.

A Study of Physical Environment of Public Golf Course for Golf Popularization (골프 대중화를 위한 대중제 골프장의 물리적 환경에 관한 연구)

  • Kim, Young-Soo;Jang, Won-Yong
    • Journal of Korea Entertainment Industry Association
    • /
    • v.13 no.8
    • /
    • pp.447-456
    • /
    • 2019
  • This study was to examine the physical environment of public golf course for golf popularization. More specifically, this study was try to analyze the effects of physical environment on customer emotional response, golf course's image and recommendation intention of public golf course. This study were analyzed by frequency analysis, exploratory factor analysis, reliability analysis, correlation and multiple regression analysis. The results were as follows. First, among the physical environmental variables of public golf course, facilities' convenience, cleanliness, and aesthetics had positive effects on customers' positive emotion. Second, among the physical environmental variables of public golf course, facilities' cleanliness had effects on customers' negative emotion. Third, physical environment of public golf course had positive effects on golf course's image. Fourth, physical environment of public golf course had positive effects on recommendation intention.

Spatial Hedonic Modeling using Geographically Weighted LASSO Model (GWL을 적용한 공간 헤도닉 모델링)

  • Jin, Chanwoo;Lee, Gunhak
    • Journal of the Korean Geographical Society
    • /
    • v.49 no.6
    • /
    • pp.917-934
    • /
    • 2014
  • Geographically weighted regression(GWR) model has been widely used to estimate spatially heterogeneous real estate prices. The GWR model, however, has some limitations of the selection of different price determinants over space and the restricted number of observations for local estimation. Alternatively, the geographically weighted LASSO(GWL) model has been recently introduced and received a growing interest. In this paper, we attempt to explore various local price determinants for the real estate by utilizing the GWL and its applicability to forecasting the real estate price. To do this, we developed the three hedonic models of OLS, GWR, and GWL focusing on the sales price of apartments in Seoul and compared those models in terms of model fit, prediction, and multicollinearity. As a result, local models appeared to be better than the global OLS on the whole, and in particular, the GWL appeared to be more explanatory and predictable than other models. Moreover, the GWL enabled to provide spatially different sets of price determinants which no multicollinearity exists. The GWL helps select the significant sets of independent variables from a high dimensional dataset, and hence will be a useful technique for large and complex spatial big data.

  • PDF

Corporate Default Prediction Model Using Deep Learning Time Series Algorithm, RNN and LSTM (딥러닝 시계열 알고리즘 적용한 기업부도예측모형 유용성 검증)

  • Cha, Sungjae;Kang, Jungseok
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.1-32
    • /
    • 2018
  • In addition to stakeholders including managers, employees, creditors, and investors of bankrupt companies, corporate defaults have a ripple effect on the local and national economy. Before the Asian financial crisis, the Korean government only analyzed SMEs and tried to improve the forecasting power of a default prediction model, rather than developing various corporate default models. As a result, even large corporations called 'chaebol enterprises' become bankrupt. Even after that, the analysis of past corporate defaults has been focused on specific variables, and when the government restructured immediately after the global financial crisis, they only focused on certain main variables such as 'debt ratio'. A multifaceted study of corporate default prediction models is essential to ensure diverse interests, to avoid situations like the 'Lehman Brothers Case' of the global financial crisis, to avoid total collapse in a single moment. The key variables used in corporate defaults vary over time. This is confirmed by Beaver (1967, 1968) and Altman's (1968) analysis that Deakins'(1972) study shows that the major factors affecting corporate failure have changed. In Grice's (2001) study, the importance of predictive variables was also found through Zmijewski's (1984) and Ohlson's (1980) models. However, the studies that have been carried out in the past use static models. Most of them do not consider the changes that occur in the course of time. Therefore, in order to construct consistent prediction models, it is necessary to compensate the time-dependent bias by means of a time series analysis algorithm reflecting dynamic change. Based on the global financial crisis, which has had a significant impact on Korea, this study is conducted using 10 years of annual corporate data from 2000 to 2009. Data are divided into training data, validation data, and test data respectively, and are divided into 7, 2, and 1 years respectively. In order to construct a consistent bankruptcy model in the flow of time change, we first train a time series deep learning algorithm model using the data before the financial crisis (2000~2006). The parameter tuning of the existing model and the deep learning time series algorithm is conducted with validation data including the financial crisis period (2007~2008). As a result, we construct a model that shows similar pattern to the results of the learning data and shows excellent prediction power. After that, each bankruptcy prediction model is restructured by integrating the learning data and validation data again (2000 ~ 2008), applying the optimal parameters as in the previous validation. Finally, each corporate default prediction model is evaluated and compared using test data (2009) based on the trained models over nine years. Then, the usefulness of the corporate default prediction model based on the deep learning time series algorithm is proved. In addition, by adding the Lasso regression analysis to the existing methods (multiple discriminant analysis, logit model) which select the variables, it is proved that the deep learning time series algorithm model based on the three bundles of variables is useful for robust corporate default prediction. The definition of bankruptcy used is the same as that of Lee (2015). Independent variables include financial information such as financial ratios used in previous studies. Multivariate discriminant analysis, logit model, and Lasso regression model are used to select the optimal variable group. The influence of the Multivariate discriminant analysis model proposed by Altman (1968), the Logit model proposed by Ohlson (1980), the non-time series machine learning algorithms, and the deep learning time series algorithms are compared. In the case of corporate data, there are limitations of 'nonlinear variables', 'multi-collinearity' of variables, and 'lack of data'. While the logit model is nonlinear, the Lasso regression model solves the multi-collinearity problem, and the deep learning time series algorithm using the variable data generation method complements the lack of data. Big Data Technology, a leading technology in the future, is moving from simple human analysis, to automated AI analysis, and finally towards future intertwined AI applications. Although the study of the corporate default prediction model using the time series algorithm is still in its early stages, deep learning algorithm is much faster than regression analysis at corporate default prediction modeling. Also, it is more effective on prediction power. Through the Fourth Industrial Revolution, the current government and other overseas governments are working hard to integrate the system in everyday life of their nation and society. Yet the field of deep learning time series research for the financial industry is still insufficient. This is an initial study on deep learning time series algorithm analysis of corporate defaults. Therefore it is hoped that it will be used as a comparative analysis data for non-specialists who start a study combining financial data and deep learning time series algorithm.