• Title/Summary/Keyword: Backward Elimination Method

Search Result 16, Processing Time 0.028 seconds

Geometrical description based on forward selection & backward elimination methods for regression models (다중회귀모형에서 전진선택과 후진제거의 기하학적 표현)

  • Hong, Chong-Sun;Kim, Moung-Jin
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.5
    • /
    • pp.901-908
    • /
    • 2010
  • A geometrical description method is proposed to represent the process of the forward selection and backward elimination methods among many variable selection methods for multiple regression models. This graphical method shows the process of the forward selection and backward elimination on the first and second quadrants, respectively, of half circle with a unit radius. At each step, the SSR is represented by the norm of vector and the extra SSR or partial determinant coefficient is represented by the angle between two vectors. Some lines are dotted when the partial F test results are statistically significant, so that statistical analysis could be explored. This geometrical description can be obtained the final regression models based on the forward selection and backward elimination methods. And the goodness-of-fit for the model could be explored.

Analysis of Stress level of Korean Household Members due to Household Debt (한국국민의 가계 금융부채에 대한 체감도 분석)

  • Oh, Man-Suk;Hyun, Seung-Me
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.2
    • /
    • pp.297-307
    • /
    • 2009
  • Korean household debt is one of the main sources of the current financial crisis. This paper studies the impact of household members' attributes such as a type of housing(self-own or rent), education, age, average monthly income of the head of household, and the area of residence, on the stress level of the household members due to household debt. We analyze a real data set collected by KB Kookmin Bank in 2004. We consider low and high stress level as a binary response variable and use a logistic regression model with the attributes of household members as explanatory variables. A simple but well-fitting model is selected by backward elimination method based on the likelihood statistic for goodness-of-fit test, and the impact of the attributes on the stress level is studied from parameter estimates of the selected model. We also perform the similar analysis on a binary response variable which distinguishes households with no debt from the rest. From the analysis, the stress level tends to be low for households with self-own houses, high average monthly income, low education level, and young members.

Application of Random Forest Algorithm for the Decision Support System of Medical Diagnosis with the Selection of Significant Clinical Test (의료진단 및 중요 검사 항목 결정 지원 시스템을 위한 랜덤 포레스트 알고리즘 적용)

  • Yun, Tae-Gyun;Yi, Gwan-Su
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.57 no.6
    • /
    • pp.1058-1062
    • /
    • 2008
  • In clinical decision support system(CDSS), unlike rule-based expert method, appropriate data-driven machine learning method can easily provide the information of individual feature(clinical test) for disease classification. However, currently developed methods focus on the improvement of the classification accuracy for diagnosis. With the analysis of feature importance in classification, one may infer the novel clinical test sets which highly differentiate the specific diseases or disease states. In this background, we introduce a novel CDSS that integrate a classifier and feature selection module together. Random forest algorithm is applied for the classifier and the feature importance measure. The system selects the significant clinical tests discriminating the diseases by examining the classification error during backward elimination of the features. The superior performance of random forest algorithm in clinical classification was assessed against artificial neural network and decision tree algorithm by using breast cancer, diabetes and heart disease data in UCI Machine Learning Repository. The test with the same data sets shows that the proposed system can successfully select the significant clinical test set for each disease.

Variable Selection for Logistic Regression Model Using Adjusted Coefficients of Determination (수정 결정계수를 사용한 로지스틱 회귀모형에서의 변수선택법)

  • Hong C. S.;Ham J. H.;Kim H. I.
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.2
    • /
    • pp.435-443
    • /
    • 2005
  • Coefficients of determination in logistic regression analysis are defined as various statistics, and their values are relatively smaller than those for linear regression model. These coefficients of determination are not generally used to evaluate and diagnose logistic regression model. Liao and McGee (2003) proposed two adjusted coefficients of determination which are robust at the addition of inappropriate predictors and the variation of sample size. In this work, these adjusted coefficients of determination are applied to variable selection method for logistic regression model and compared with results of other methods such as the forward selection, backward elimination, stepwise selection, and AIC statistic.

Analysis of Correlation between Respiratory Characteristics and Physical Factors in Healthy Elementary School Childhood (학령기 정상 아동의 호흡 특성과 신체 조건에 관한 상관분석)

  • Lee, Hye Young;Kang, Dong Yeon;Kim, Kyoung
    • The Journal of Korean Physical Therapy
    • /
    • v.25 no.5
    • /
    • pp.330-336
    • /
    • 2013
  • Purpose: Respiratory is an essential vital component for conservation of life in human, which is controlled by respiratory muscles and its related neuromuscular regulation. The purpose of this study is to assess lung capacity and respiratory pressure in healthy children, and to investigate relationship and predictability between respiratory pressure and other related respiratory functions. Methods: A total of 31 healthy children were recruited for this study. Demographic information and respiratory related factors were assessed in terms of body surface area (BSA), chest mobility, lung capacity, and respiratory pressure. Correlation between respiratory pressure and the rested variables was analyzed, and multiple regression using the stepwise method was performed for prediction of respiratory muscle strength, in terms of respiratory pressure as the dependent variable, and demographic and other respiratory variables as the independent variable. Results: According to the results of correlation analysis, respiratory pressure showed significant correlation with age (r=0.62, p<0.01), BSA (r=0.80, p<0.01), FVC (r=0.80, p<0.01), and FEV1 (r=0.70, p<0.01). In results of multiple regression analysis using the backward elimination method, BSA and FVC were included as significant factors of the predictable statistical model. The statistical model showed a significant explanation power of 71.8%. Conclusion: These findings suggest that respiratory pressure could be a valuable measurement tool for evaluation of respiratory function, because of significant relationship with physical characteristics and lung capacity, and that BSA and FVC could be possible predictable factors to explain the degree of respiratory pressure. These findings will provide useful information for clinical assessment and treatment in healthy children as well as those with pulmonary disease.

A Study on the Change of Beef Consumption and Recognition of Aged Meat (소고기 소비성향 변화와 숙성육 인식에 관한 연구)

  • Shin, Jeong-Seop
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.9
    • /
    • pp.373-379
    • /
    • 2020
  • The purpose of this study is to investigate the factors affecting change in consumers' consumption tendency and the perception of aged meat. This study compared 2012 and 2019 results from beef consumer surveys. The importance of quality judgment criteria, taste determinants, and the perception that marbling is harmful to health were analyzed using regression analysis through the backward elimination method. As a result of the analysis, it was determined that the importance had increased for recognizing freshness, juiciness, tenderness, and ripening period, and for knowing that marbling was harmful to the health. Also shown was that the intention to purchase aged meat had an influence on whether the consumer favorably perceived the freshness, tenderness, and ripening period. This study analyzed how consumers' consumption tendencies changed to cope with these changes in consumer preferences. The basic data of the research analyzed consumption propensity, the intention to consume beef, and what factors influence the perception of mature meat. It is thought that there is a need to raise awareness about aged meat for diversification of consumer preferences and rational production and consumption in the future.

A Study on the Distress Prediction in the Fishery Industry (수산기업의 부실화 요인 및 예측에 관한 연구)

  • Lee, Yun-Won;Jang, Chang-Ik;Hong, Jae-Beom
    • Proceedings of the Fisheries Business Administration Society of Korea Conference
    • /
    • 2007.12a
    • /
    • pp.167-184
    • /
    • 2007
  • The objectives of this paper are to identify the causes of the corporate distress and to develop a distress prediction model with the financial information in fishery industry. In this study, the corporate distress is defined as economic failure and technical insolvency. Economic failure occurs by reduction, shut-down, or change of the business and technical insolvency results from failure to pay the financial debt of companies. The 33 distressed firms from 1991 to 2003 were composed by 14 economic failure companies, 15 technical insolvency companies. 4 companies applied to the both cases. The analysis of distress prediction of fishery companies were accomplished according to the distress definition. The analysis was carried out as two steps. The first step was the univariate analysis, which was used for checking the prediction power of individual financial variable. The t-test is used to identify the differences in financial variables between the distressed group and the non-distressed group. The second step was to develop distress prediction model with logistic regression. The variables showed the significant difference in univariate analysis were selected as the prediction variables. The financial ratios, used in the logistic regression model, were selected by backward elimination method. To test stability of the distress prediction model, the whole sample was divided as three sub-samples, period 1(1990$\sim$1993), period 2(1994$\sim$1997), period 3(1998$\sim$2002). The final model built from whole sample appled each three sub-samples. The results of the logistic analysis were as follows. the growth, profitability, stability ratios showed the significant effect on the distress. the some different result was found in the sub-sample (economic failure and technical insolvency). The growth and the profitability were important to predict the economic failure. The profitability and the activity were important to predict technical insolvency. It means that profitability is the really important factor to the fishery companies.

  • PDF

A Study on the Distress Prediction in the Fishery Industry (수산기업의 부실화 요인과 그 예측에 관한 연구)

  • Jang, Chang-Ick;Lee, Yun-Weon;Hong, Jae-Bum
    • The Journal of Fisheries Business Administration
    • /
    • v.39 no.2
    • /
    • pp.61-79
    • /
    • 2008
  • The objectives of this paper are to identify the causes of the corporate distress and to develop a distress prediction model with the financial information in fishery industry. In this study, the corporate distress is defined as economic failure and technical insolvency. Economic failure occurs by reduction, shut - down, or change of the business and technical insolvency results from failure to pay the financial debt of companies. The 33 distressed firms from 1991 to 2003 were composed by 14 economic failure companies, 15 technical insolvency companies. 4 companies applied to the both cases. The analysis of distress prediction of fishery companies were accomplished according to the distress definition. The analysis was carried out as two steps. The first step was the univariate analysis, which was used for checking the prediction power of individual financial variable. The t - test is used to identify the differences in financial variables between the distressed group and the non - distressed group. The second step was to develop distress prediction model with logistic regression. The variables showed the significant difference in univariate analysis were selected as the prediction variables. The financial ratios, used in the logistic regression model, were selected by backward elimination method. To test stability of the distress prediction model, the whole sample was divided as three sub-samples, period 1(1990 - 1993), period 2(1994 - 1997), period 3(1998 - 2002). The final model built from whole sample appled each three sub - samples. The results of the logistic analysis were as follows. the growth, profitability, stability ratios showed the significant effect on the distress. the some different result was found in the sub - sample (economic failure and technical insolvency). The growth and the profitability were important to predict the economic failure. The profitability and the activity were important to predict technical insolvency. It means that profitability is the really important factor to the fishery companies.

  • PDF

Factors Related to Hypertension Patients' Quality of Life: The 7th Korean National Health and Nutrition Examination(1st Year, 2016) (고혈압 환자의 삶의 질 관련 요인: 제 7기 1차년도(2016년) 국민건강영양조사)

  • Kim, Su I;Woo, Sang Jun;Jung, Young Hae
    • The Journal of Korean Society for School & Community Health Education
    • /
    • v.21 no.1
    • /
    • pp.61-74
    • /
    • 2020
  • Objectives: This study aims to examine hypertension patients' quality of life by using the data of the 7th Korea National Health and Nutrition Examination Survey (1st year, 2016), identify the factors related to this, and utilize the results as basic data for intervention that can improve hypertension patients' quality of life. Methods: For the research subjects, this study extracted 1,531 patients who were diagnosed with hypertension by a doctor from the total sample of 8,150 participants of the 7th Korea National Health and Nutrition Examination Survey, and selected 1,072 patients with no missing value in the variables to be analyzed as the final research subjects. The SPSS(version25.0) program was used for the analysis of the collected data. Then, this study used a backward elimination multiple regression analysis method that applied complex sample, to examine the factors related with the finally estimated quality of life. Results: The results of this study revealed that hypertension patients' quality of life was related with age, occupation, spouse, household income, weight gain, restriction of activity, subjected health status, perceived stress, and presence of comorbidity. The final model explained 37.0% of the variance (Wald F=30.012 p<.001). Conclusions: When an intervention program is implemented for the improvement of hypertension patients' quality of life in the future, it will be effective to construct the program according to age group, employment, marital status and household income. As for the program operation, patients should get help therefrom to control weight, facilitate activities and relieve their stress, and they should be also motivated to feel healthy. Furthermore, education should be offered so that they appropriately manage their underlying disease at an early stage.

A Development of Suicidal Ideation Prediction Model and Decision Rules for the Elderly: Decision Tree Approach (의사결정나무 기법을 이용한 노인들의 자살생각 예측모형 및 의사결정 규칙 개발)

  • Kim, Deok Hyun;Yoo, Dong Hee;Jeong, Dae Yul
    • The Journal of Information Systems
    • /
    • v.28 no.3
    • /
    • pp.249-276
    • /
    • 2019
  • Purpose The purpose of this study is to develop a prediction model and decision rules for the elderly's suicidal ideation based on the Korean Welfare Panel survey data. By utilizing this data, we obtained many decision rules to predict the elderly's suicide ideation. Design/methodology/approach This study used classification analysis to derive decision rules to predict on the basis of decision tree technique. Weka 3.8 is used as the data mining tool in this study. The decision tree algorithm uses J48, also known as C4.5. In addition, 66.6% of the total data was divided into learning data and verification data. We considered all possible variables based on previous studies in predicting suicidal ideation of the elderly. Finally, 99 variables including the target variable were used. Classification analysis was performed by introducing sampling technique through backward elimination and data balancing. Findings As a result, there were significant differences between the data sets. The selected data sets have different, various decision tree and several rules. Based on the decision tree method, we derived the rules for suicide prevention. The decision tree derives not only the rules for the suicidal ideation of the depressed group, but also the rules for the suicidal ideation of the non-depressed group. In addition, in developing the predictive model, the problem of over-fitting due to the data imbalance phenomenon was directly identified through the application of data balancing. We could conclude that it is necessary to balance the data on the target variables in order to perform the correct classification analysis without over-fitting. In addition, although data balancing is applied, it is shown that performance is not inferior in prediction rate when compared with a biased prediction model.