• 제목/요약/키워드: Backward Elimination Method

검색결과 16건 처리시간 0.031초

다중회귀모형에서 전진선택과 후진제거의 기하학적 표현 (Geometrical description based on forward selection & backward elimination methods for regression models)

  • 홍종선;김명진
    • Journal of the Korean Data and Information Science Society
    • /
    • 제21권5호
    • /
    • pp.901-908
    • /
    • 2010
  • 다중회귀모형에서 변수선택법 중에서 전진선택과 후진제거의 과정을 기하학적으로 표현하는 그래픽적 방법을 제안한다. 반지름이 1인 반원의 제1사분면에는 전진선택 과정을, 제2사분면에는 후진제거 과정을 표현한다. 각 단계에서 회귀제곱합을 벡터로 표현하고, 추가제곱합 또는 부분결정계수를 벡터 사이의 각도로 나타내며 벡터의 끝을 연결할 때 통계적으로 유의하면 점선으로 표현하여 부분가설검정의 통계적 분석결과를 인지할 수 있도록 작성한다. 이 방법을 이용하면 전진선택과 후진제거 방법에 의한 최종모형을 비교 분석하고 전체적으로 모형의 적합도를 파악할 수 있다.

한국국민의 가계 금융부채에 대한 체감도 분석 (Analysis of Stress level of Korean Household Members due to Household Debt)

  • 오만숙;현승미
    • 응용통계연구
    • /
    • 제22권2호
    • /
    • pp.297-307
    • /
    • 2009
  • 최근 금융위기의 요인이 되고 있는 가계부채에 대하여 가계구성원이 느끼는 부담감, 즉, 가계부채에 대한 체감도에 가계구성원의 속성들(주택점유형태, 가구주 학력, 가구주 연령, 월소득, 거주지역)이 미치는 영향을 2004년도 국민은행이 조사한 실제자료를 가지고 분석하였다. 체감도를 부채에 대한 부담감이 낮음과 높음의 이항자료로 구분하여 가계구성원의 속성들을 설명변수로 갖는 로지스틱 회귀분석을 수행하였다. 적합도에 대한 우도비 통계량을 이용한 후진제거법을 사용하여 간단하면서도 자료를 잘 적합시키는 모형을 선택한 결과 2개의 2차 교호작용을 갖는 모형이 선택되었다. 선택된 모형에 대한 계수 추정치를 통하여 각 속성이 부채 체감도에 대하여 미치는 영향을 분석하였다. 또한 가계부채의 유무에 대하여 가계구성원의 속성들이 미치는 영향을 로지스틱 회귀모형을 통하여 유사한 방법으로 분석하였다 자가주택일수록, 월소득이 증가할수록, 가구주 학력이 낮을수록 그리고 가구주 연령이 낮아질수록 부채에 대한 체감도가 낮아짐을 알 수 있었다.

의료진단 및 중요 검사 항목 결정 지원 시스템을 위한 랜덤 포레스트 알고리즘 적용 (Application of Random Forest Algorithm for the Decision Support System of Medical Diagnosis with the Selection of Significant Clinical Test)

  • 윤태균;이관수
    • 전기학회논문지
    • /
    • 제57권6호
    • /
    • pp.1058-1062
    • /
    • 2008
  • In clinical decision support system(CDSS), unlike rule-based expert method, appropriate data-driven machine learning method can easily provide the information of individual feature(clinical test) for disease classification. However, currently developed methods focus on the improvement of the classification accuracy for diagnosis. With the analysis of feature importance in classification, one may infer the novel clinical test sets which highly differentiate the specific diseases or disease states. In this background, we introduce a novel CDSS that integrate a classifier and feature selection module together. Random forest algorithm is applied for the classifier and the feature importance measure. The system selects the significant clinical tests discriminating the diseases by examining the classification error during backward elimination of the features. The superior performance of random forest algorithm in clinical classification was assessed against artificial neural network and decision tree algorithm by using breast cancer, diabetes and heart disease data in UCI Machine Learning Repository. The test with the same data sets shows that the proposed system can successfully select the significant clinical test set for each disease.

수정 결정계수를 사용한 로지스틱 회귀모형에서의 변수선택법 (Variable Selection for Logistic Regression Model Using Adjusted Coefficients of Determination)

  • 홍종선;함주형;김호일
    • 응용통계연구
    • /
    • 제18권2호
    • /
    • pp.435-443
    • /
    • 2005
  • 로지스틱 회귀모형에서 결정계수는 선형 회귀모형보다 다양하게 정의되며 그 값들도 매우 작아 로지스틱 회귀모형 평가기준으로 사용되는 통계량이 라고 할 수 없다. Liao와 McGee(2003)는 부적절한 설명변수의 추가 또는 표본크기의 변화에 민감하지 않은 두 종류의 수정 결정계수를 제안하였다. 본 연구에서는 실제자료에 적용한 로지스틱 회귀모형에서 수정 결정계수를 포함한 네 종류의 결정계수들을 변수선택의 기준으로 사용하여 기존의 변수선택 방법인 전진선택, 후진제거, 단계적 선택방법, AIC 통계량 등을 사용한 방법들과 비교하여 그 적절함과 효율성을 토론한다.

학령기 정상 아동의 호흡 특성과 신체 조건에 관한 상관분석 (Analysis of Correlation between Respiratory Characteristics and Physical Factors in Healthy Elementary School Childhood)

  • 이혜영;강동연;김경
    • The Journal of Korean Physical Therapy
    • /
    • 제25권5호
    • /
    • pp.330-336
    • /
    • 2013
  • Purpose: Respiratory is an essential vital component for conservation of life in human, which is controlled by respiratory muscles and its related neuromuscular regulation. The purpose of this study is to assess lung capacity and respiratory pressure in healthy children, and to investigate relationship and predictability between respiratory pressure and other related respiratory functions. Methods: A total of 31 healthy children were recruited for this study. Demographic information and respiratory related factors were assessed in terms of body surface area (BSA), chest mobility, lung capacity, and respiratory pressure. Correlation between respiratory pressure and the rested variables was analyzed, and multiple regression using the stepwise method was performed for prediction of respiratory muscle strength, in terms of respiratory pressure as the dependent variable, and demographic and other respiratory variables as the independent variable. Results: According to the results of correlation analysis, respiratory pressure showed significant correlation with age (r=0.62, p<0.01), BSA (r=0.80, p<0.01), FVC (r=0.80, p<0.01), and FEV1 (r=0.70, p<0.01). In results of multiple regression analysis using the backward elimination method, BSA and FVC were included as significant factors of the predictable statistical model. The statistical model showed a significant explanation power of 71.8%. Conclusion: These findings suggest that respiratory pressure could be a valuable measurement tool for evaluation of respiratory function, because of significant relationship with physical characteristics and lung capacity, and that BSA and FVC could be possible predictable factors to explain the degree of respiratory pressure. These findings will provide useful information for clinical assessment and treatment in healthy children as well as those with pulmonary disease.

소고기 소비성향 변화와 숙성육 인식에 관한 연구 (A Study on the Change of Beef Consumption and Recognition of Aged Meat)

  • 신정섭
    • 한국산학기술학회논문지
    • /
    • 제21권9호
    • /
    • pp.373-379
    • /
    • 2020
  • 본 연구는 소고기 소비자를 대상으로 2012년과 2019년 조사결과를 비교함으로써 소비자 소비성향 변화와 숙성육에 대한 인식에 영향을 미치는 요인에 대해 알아보고자 수행하였다. 육색 및 지방색, 신선도, 등급표시, 브랜드 여부의 품질판단기준과 풍미, 다즙성, 연도, 숙성기간, 마블링의 맛 결정요인, 그리고 근내지방이 건강에 해롭다는 인식에 대한 중요도를 후진소거법을 통한 회귀분석을 이용하여 분석하였다. 분석결과 품질판단기준의 신선도와 맛 결정요인의 다즙성, 연도, 숙성기간, 근내지방이 건강에 해롭다는 인식의 중요도는 증가한 것으로 분석되었다. 한편, 숙성육 구입의향 분석 결과 숙성육 구입의향에는 숙성육 인지여부, 숙성육에 대한 호감도, 신선도, 연도, 숙성기간이 영향을 미치는 것으로 분석되었다. 본 연구는 이러한 소비자 선호 변화에 대응하기 위해 소비자의 소비성향이 어떻게 변화하였는지를 분석하여 소고기 소비성향 및 소비의향 연구의 기초자료를 수집하고, 숙성육 인식에 어떠한 요인이 영향을 미치는지 분석하였다. 이를 통해 향후 소비자 선호의 다변화에 대응하고, 합리적인 생산 및 소비활동에의 반영을 위해 숙성육에 대한 인식을 제고할 필요성이 있을 것으로 여겨진다.

수산기업의 부실화 요인 및 예측에 관한 연구 (A Study on the Distress Prediction in the Fishery Industry)

  • 이윤원;장창익;홍재범
    • 한국수산경영학회:학술대회논문집
    • /
    • 한국수산경영학회 2007년도 추계학술발표회 및 심포지엄
    • /
    • pp.167-184
    • /
    • 2007
  • The objectives of this paper are to identify the causes of the corporate distress and to develop a distress prediction model with the financial information in fishery industry. In this study, the corporate distress is defined as economic failure and technical insolvency. Economic failure occurs by reduction, shut-down, or change of the business and technical insolvency results from failure to pay the financial debt of companies. The 33 distressed firms from 1991 to 2003 were composed by 14 economic failure companies, 15 technical insolvency companies. 4 companies applied to the both cases. The analysis of distress prediction of fishery companies were accomplished according to the distress definition. The analysis was carried out as two steps. The first step was the univariate analysis, which was used for checking the prediction power of individual financial variable. The t-test is used to identify the differences in financial variables between the distressed group and the non-distressed group. The second step was to develop distress prediction model with logistic regression. The variables showed the significant difference in univariate analysis were selected as the prediction variables. The financial ratios, used in the logistic regression model, were selected by backward elimination method. To test stability of the distress prediction model, the whole sample was divided as three sub-samples, period 1(1990$\sim$1993), period 2(1994$\sim$1997), period 3(1998$\sim$2002). The final model built from whole sample appled each three sub-samples. The results of the logistic analysis were as follows. the growth, profitability, stability ratios showed the significant effect on the distress. the some different result was found in the sub-sample (economic failure and technical insolvency). The growth and the profitability were important to predict the economic failure. The profitability and the activity were important to predict technical insolvency. It means that profitability is the really important factor to the fishery companies.

  • PDF

수산기업의 부실화 요인과 그 예측에 관한 연구 (A Study on the Distress Prediction in the Fishery Industry)

  • 장창익;이윤원;홍재범
    • 수산경영론집
    • /
    • 제39권2호
    • /
    • pp.61-79
    • /
    • 2008
  • The objectives of this paper are to identify the causes of the corporate distress and to develop a distress prediction model with the financial information in fishery industry. In this study, the corporate distress is defined as economic failure and technical insolvency. Economic failure occurs by reduction, shut - down, or change of the business and technical insolvency results from failure to pay the financial debt of companies. The 33 distressed firms from 1991 to 2003 were composed by 14 economic failure companies, 15 technical insolvency companies. 4 companies applied to the both cases. The analysis of distress prediction of fishery companies were accomplished according to the distress definition. The analysis was carried out as two steps. The first step was the univariate analysis, which was used for checking the prediction power of individual financial variable. The t - test is used to identify the differences in financial variables between the distressed group and the non - distressed group. The second step was to develop distress prediction model with logistic regression. The variables showed the significant difference in univariate analysis were selected as the prediction variables. The financial ratios, used in the logistic regression model, were selected by backward elimination method. To test stability of the distress prediction model, the whole sample was divided as three sub-samples, period 1(1990 - 1993), period 2(1994 - 1997), period 3(1998 - 2002). The final model built from whole sample appled each three sub - samples. The results of the logistic analysis were as follows. the growth, profitability, stability ratios showed the significant effect on the distress. the some different result was found in the sub - sample (economic failure and technical insolvency). The growth and the profitability were important to predict the economic failure. The profitability and the activity were important to predict technical insolvency. It means that profitability is the really important factor to the fishery companies.

  • PDF

고혈압 환자의 삶의 질 관련 요인: 제 7기 1차년도(2016년) 국민건강영양조사 (Factors Related to Hypertension Patients' Quality of Life: The 7th Korean National Health and Nutrition Examination(1st Year, 2016))

  • 김수이;우상준;정영해
    • 한국학교ㆍ지역보건교육학회지
    • /
    • 제21권1호
    • /
    • pp.61-74
    • /
    • 2020
  • Objectives: This study aims to examine hypertension patients' quality of life by using the data of the 7th Korea National Health and Nutrition Examination Survey (1st year, 2016), identify the factors related to this, and utilize the results as basic data for intervention that can improve hypertension patients' quality of life. Methods: For the research subjects, this study extracted 1,531 patients who were diagnosed with hypertension by a doctor from the total sample of 8,150 participants of the 7th Korea National Health and Nutrition Examination Survey, and selected 1,072 patients with no missing value in the variables to be analyzed as the final research subjects. The SPSS(version25.0) program was used for the analysis of the collected data. Then, this study used a backward elimination multiple regression analysis method that applied complex sample, to examine the factors related with the finally estimated quality of life. Results: The results of this study revealed that hypertension patients' quality of life was related with age, occupation, spouse, household income, weight gain, restriction of activity, subjected health status, perceived stress, and presence of comorbidity. The final model explained 37.0% of the variance (Wald F=30.012 p<.001). Conclusions: When an intervention program is implemented for the improvement of hypertension patients' quality of life in the future, it will be effective to construct the program according to age group, employment, marital status and household income. As for the program operation, patients should get help therefrom to control weight, facilitate activities and relieve their stress, and they should be also motivated to feel healthy. Furthermore, education should be offered so that they appropriately manage their underlying disease at an early stage.

의사결정나무 기법을 이용한 노인들의 자살생각 예측모형 및 의사결정 규칙 개발 (A Development of Suicidal Ideation Prediction Model and Decision Rules for the Elderly: Decision Tree Approach)

  • 김덕현;유동희;정대율
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제28권3호
    • /
    • pp.249-276
    • /
    • 2019
  • Purpose The purpose of this study is to develop a prediction model and decision rules for the elderly's suicidal ideation based on the Korean Welfare Panel survey data. By utilizing this data, we obtained many decision rules to predict the elderly's suicide ideation. Design/methodology/approach This study used classification analysis to derive decision rules to predict on the basis of decision tree technique. Weka 3.8 is used as the data mining tool in this study. The decision tree algorithm uses J48, also known as C4.5. In addition, 66.6% of the total data was divided into learning data and verification data. We considered all possible variables based on previous studies in predicting suicidal ideation of the elderly. Finally, 99 variables including the target variable were used. Classification analysis was performed by introducing sampling technique through backward elimination and data balancing. Findings As a result, there were significant differences between the data sets. The selected data sets have different, various decision tree and several rules. Based on the decision tree method, we derived the rules for suicide prevention. The decision tree derives not only the rules for the suicidal ideation of the depressed group, but also the rules for the suicidal ideation of the non-depressed group. In addition, in developing the predictive model, the problem of over-fitting due to the data imbalance phenomenon was directly identified through the application of data balancing. We could conclude that it is necessary to balance the data on the target variables in order to perform the correct classification analysis without over-fitting. In addition, although data balancing is applied, it is shown that performance is not inferior in prediction rate when compared with a biased prediction model.