• Title/Summary/Keyword: Stepwise logistic regression

Search Result 97, Processing Time 0.032 seconds

Optimization of Multiclass Support Vector Machine using Genetic Algorithm: Application to the Prediction of Corporate Credit Rating (유전자 알고리즘을 이용한 다분류 SVM의 최적화: 기업신용등급 예측에의 응용)

  • Ahn, Hyunchul
    • Information Systems Review
    • /
    • v.16 no.3
    • /
    • pp.161-177
    • /
    • 2014
  • Corporate credit rating assessment consists of complicated processes in which various factors describing a company are taken into consideration. Such assessment is known to be very expensive since domain experts should be employed to assess the ratings. As a result, the data-driven corporate credit rating prediction using statistical and artificial intelligence (AI) techniques has received considerable attention from researchers and practitioners. In particular, statistical methods such as multiple discriminant analysis (MDA) and multinomial logistic regression analysis (MLOGIT), and AI methods including case-based reasoning (CBR), artificial neural network (ANN), and multiclass support vector machine (MSVM) have been applied to corporate credit rating.2) Among them, MSVM has recently become popular because of its robustness and high prediction accuracy. In this study, we propose a novel optimized MSVM model, and appy it to corporate credit rating prediction in order to enhance the accuracy. Our model, named 'GAMSVM (Genetic Algorithm-optimized Multiclass Support Vector Machine),' is designed to simultaneously optimize the kernel parameters and the feature subset selection. Prior studies like Lorena and de Carvalho (2008), and Chatterjee (2013) show that proper kernel parameters may improve the performance of MSVMs. Also, the results from the studies such as Shieh and Yang (2008) and Chatterjee (2013) imply that appropriate feature selection may lead to higher prediction accuracy. Based on these prior studies, we propose to apply GAMSVM to corporate credit rating prediction. As a tool for optimizing the kernel parameters and the feature subset selection, we suggest genetic algorithm (GA). GA is known as an efficient and effective search method that attempts to simulate the biological evolution phenomenon. By applying genetic operations such as selection, crossover, and mutation, it is designed to gradually improve the search results. Especially, mutation operator prevents GA from falling into the local optima, thus we can find the globally optimal or near-optimal solution using it. GA has popularly been applied to search optimal parameters or feature subset selections of AI techniques including MSVM. With these reasons, we also adopt GA as an optimization tool. To empirically validate the usefulness of GAMSVM, we applied it to a real-world case of credit rating in Korea. Our application is in bond rating, which is the most frequently studied area of credit rating for specific debt issues or other financial obligations. The experimental dataset was collected from a large credit rating company in South Korea. It contained 39 financial ratios of 1,295 companies in the manufacturing industry, and their credit ratings. Using various statistical methods including the one-way ANOVA and the stepwise MDA, we selected 14 financial ratios as the candidate independent variables. The dependent variable, i.e. credit rating, was labeled as four classes: 1(A1); 2(A2); 3(A3); 4(B and C). 80 percent of total data for each class was used for training, and remaining 20 percent was used for validation. And, to overcome small sample size, we applied five-fold cross validation to our dataset. In order to examine the competitiveness of the proposed model, we also experimented several comparative models including MDA, MLOGIT, CBR, ANN and MSVM. In case of MSVM, we adopted One-Against-One (OAO) and DAGSVM (Directed Acyclic Graph SVM) approaches because they are known to be the most accurate approaches among various MSVM approaches. GAMSVM was implemented using LIBSVM-an open-source software, and Evolver 5.5-a commercial software enables GA. Other comparative models were experimented using various statistical and AI packages such as SPSS for Windows, Neuroshell, and Microsoft Excel VBA (Visual Basic for Applications). Experimental results showed that the proposed model-GAMSVM-outperformed all the competitive models. In addition, the model was found to use less independent variables, but to show higher accuracy. In our experiments, five variables such as X7 (total debt), X9 (sales per employee), X13 (years after founded), X15 (accumulated earning to total asset), and X39 (the index related to the cash flows from operating activity) were found to be the most important factors in predicting the corporate credit ratings. However, the values of the finally selected kernel parameters were found to be almost same among the data subsets. To examine whether the predictive performance of GAMSVM was significantly greater than those of other models, we used the McNemar test. As a result, we found that GAMSVM was better than MDA, MLOGIT, CBR, and ANN at the 1% significance level, and better than OAO and DAGSVM at the 5% significance level.

A Study on the Factors Related to the Cognitive Function and Depression Among the Elderly (일부지역 노인들의 인지기능과 우울에 관련된 요인에 관한 연구)

  • Shin, Cheol-Ho;Kim, Soo-Young;Lee, Young-Soo;Cho, Young-Chae;Lee, Tae-Yong;Lee, Dong-Bae
    • Journal of Preventive Medicine and Public Health
    • /
    • v.29 no.2 s.53
    • /
    • pp.199-214
    • /
    • 1996
  • To investigate the factors which affecting the cognitive function and depression of the 65 or more age group, the authors surveyed for the subjects in the region of Taejon and nearby Taejon area. 729 studied subjects were tested for cognitive function with MMSE and depression with GDS. The main results were followings; In the studied subjects, the rate of normal cognitive function was 56.8%, the rate of mildly impaired was 24.1% and the rate of severe impairment was 19.1%. The cognitive function level was closely related to the depression score. As the age increased, the cognitive function was more impaired. Sexual difference was also existed in the cognitive function level and the depression score. After adjusting the effect of age, the variables such as sex, marital status, education level, past job, instrumental ability of daily living, regular physical exercise, frequencies of going out the house, chest discomfort, visual and auditory disturbance, and dizziness had the significant relationship with cognitive function impairment. Among these variables instrumental ADL, age, visual disturbance, and sex showed statistical significance in the logistic regression model. In the multiple stepwise regression, the variables which had significant relationship to depression score were education level, frequencies of going out house, current job and house work activity, regular physical exercise, instrumental ADL, self-rated health and nutritional status, dimness, visual disturbance, and chest pain. In conclusion, main characteristics which had close relationship to the cognitive function and depression symptoms in the studied subjects were physical function and self rated health status.

  • PDF

Factors Associated with Care Burden among Family Caregivers of Terminally Ill Cancer Patients (말기암환자 가족 간병인의 간병 부담과 관련된 요인)

  • Lee, Jee Hye;Park, Hyun Kyung;Hwang, In Cheol;Kim, Hyo Min;Koh, Su-Jin;Kim, Young Sung;Lee, Yong Joo;Choi, Youn Seon;Hwang, Sun Wook;Ahn, Hong Yup
    • Journal of Hospice and Palliative Care
    • /
    • v.19 no.1
    • /
    • pp.61-69
    • /
    • 2016
  • Purpose: It is important to alleviate care burden for terminal cancer patients and their families. This study investigated the factors associated with care burden among family caregivers (FCs) of terminally ill cancer patients. Methods: We analyzed data from 289 FCs of terminal cancer patients who were admitted to palliative care units of seven medical centers in Korea. Care burden was assessed using the Korean version of Caregiver Reaction Assessment (CRA) scale which comprises five domains. A multivariate logistic regression model with stepwise variable selection was used to identify factors associated with care burden. Results: Diverse associating factors were identified in each CRA domain. Emotional factors had broad influence on care burden. FCs with emotional distress were more likely to experience changes to their daily routine (adjusted odds ratio (aOR), 2.54; 95% confidence interval (CI), 1.29~5.02), lack of family support (aOR, 2.27; 95% CI, 1.04~4.97) and health issues (aOR, 5.44; 2.50~11.88). Family functionality clearly reflected a lack of support, and severe family dysfunction was linked to financial issues as well. FCs without religion or comorbid conditions felt more burdened. The caregiving duration and daily caregiving hours significantly predicted FCs' lifestyle changes and physical burden. FCs who were employed, had weak social support or could not visit frequently, had a low self-esteem. Conclusion: This study indicates that it is helpful to understand FCs' emotional status and family functions to assess their care burden. Thus, efforts are needed to lessen their financial burden through social support systems.

Prediction of Improvement of Myocardial Wall Motion after Coronary Artery Bypass Surgery Using Rest T1-201/Dipyridamole Stress Gated Tc-99m-MIBI/24 Hour Delay T1-201 SPECT (휴식기 T1-201/디피리다몰 부하 게이트 Tc-99m-MIBI/24시간 지연 T1-201 SPECT를 이용한 관상동맥 우회로 수술 후 심근벽 운동 호전의 예측)

  • Lee, Dong-Soo;Lee, Won-Woo;Yeo, Jeong-Seok;Kim, Seok-Ki;Kim, Ki-Bong;Chung, June-Key;Lee, Myung-Chul
    • The Korean Journal of Nuclear Medicine
    • /
    • v.32 no.6
    • /
    • pp.497-508
    • /
    • 1998
  • Purpose: Using rest T1-201/dipyridamole stress gated Tc-99m-MIBI/ 24 hour delay T1-201 SPECT, we investigated the predictive values of the markers of the stress-rest reversibility (Rev), T1-201 rest perfusion (Rest), T1-201 24 hour redistribution (Del) and Tc-99m-MIBI gated systolic thickening (Thk) for wall motion improvement after coronary artery bypass surgery. Materials and Methods: In 39 patients (M;F= 34:5, age $58{\pm}8$), preoperative and postoperative (3 months) SPECT were compared. 24 hour delayed SPECT was done in 16 patients having perfusion defects at rest. Perfusion or wall motion was scored from 0 to 3 (0: normal to 3: defect or dyskinesia). Wall motion was abnormal in 142 segments among 585 segments of 99 artery territories which were surgically revascularized. Results: After bypass surgery, ejection fraction increased from $37.8{\pm}9.0%$ to $45.5{\pm}12.3%$ in 22 patients who had decreased ejection fraction preoperatively. Wall motion improved in 103 (72.5%) segments among 142 dysfunctional segments. Positive predictive values (PPV) of Rev, Rest, Del, and Thk were 83%, 76%, 43%, and 69% respectively. Negative predictive values (NPV) of Rev, Rest, Del, and Thk were 48%, 44%, 58%, and 21%, respectively. Rest/gated stress/delay SPECT had PPV of 74% and NPV of 46%. Though univariate logistic regression analysis revealed Rev (p=0.0008) and Rest (p=0.024) as significant predictors, stepwise multivariate test found Rev as the only good predictor (p=0.0008). Conclusion: Among independent predictors obtained by rest T1-201/ stress gated Tc-99m-MIBI/ delayed T1-201 myocardial SPECT for wall motion improvement after bypass surgery, stress-rest reversibility was the single most useful predictor.

  • PDF

Effects of β3-Adrenergic Receptor Polymorphism on the Hyperglycemia in Korean Subjects (베타 3-아드레날린 수용체의 유전자형이 고혈당증 유발에 미치는 영향)

  • 오현희;최선미;양현성;김길수;윤유식
    • Journal of the Korean Society of Food Science and Nutrition
    • /
    • v.33 no.1
    • /
    • pp.83-90
    • /
    • 2004
  • This study was conducted to examine the effects of $\beta$3-adrenergic receptor polymorphism on the blood glucose level and obesity in 530 volunteers, who attended a weight loss program in a local obesity clinic. The age differences in total subjects and the distribution of male and female were 26.55$\pm$0.31 yr, 9.1% (n=48), 90.9% (n=492). The genotype distribution of $\beta$3-AR gene polymorphism were WW type 75%, WR type 22% and RR type 3%. Among many parameters, fasting blood glucose was significantly higher in WR+RR type (p=0.001) compared with WW type. When the subjects were divided into two groups by 6.105 mmol/L of the fasting blood glucose level, the frequency of hyperglycemia was 23.3% in WW type subjects, while there was a increase to 35.6% in WR+RR type subjects (p=0.011, $\chi$$^2$-analysis). When hyperglycemia group was compared with normoglycemia group, obesity index (p=0.044), %body fat (p=0.046) and TG (p=0.000) were significantly higher, and HDL (p=0.006) was significantly lower in the hyperglycemia. When all of the above factors were included in stepwise logistic regression analysis to find risk factors of hyperglycemia, the results were that the odds ratio for hyperglycemia were 2.015 (p=0.011) for WR+RR type of $\beta$3-AR gene, 2.165 (p=0.000) for TG and 0.419 (p=0.059) for HDL cholesterol. There was a significantly positive correlation between the blood glucose vs BMI, WHR, body fat in the WW type (r=0.099, 0.119, 0.082) However, in the WR and RR type there were no significance between the blood glucose vs BMI, WHR, body fat. These data suggest that the WR+RR genotype of $\beta$3-AR has a very strong association with increased blood glucose level and might be a significant risk factor for hyperglycemia among Korean subjects.

Early stress hyperglycemia as independent predictor of increased mortality in preterm infants (미숙아에서 초기 스트레스성 고혈당과 예후 사이의 연관성)

  • Wee, Young Sun;Ahn, Gae Hyun;Yoo, Eun Gyong;Lim, In Sook;Lee, Kyu Hyung
    • Clinical and Experimental Pediatrics
    • /
    • v.51 no.5
    • /
    • pp.474-480
    • /
    • 2008
  • Purpose : Stress hyperglycemia is common in critically ill adult patients. It is known as a predictor of increased mortality, and intensive insulin therapy has been shown to improve the prognosis in such patients. We have investigated the relationship between early stress hyperglycemia and clinical outcomes in preterm infants. Methods : In this study, 141 preterm infants with a gestational age of less than 30 weeks were enrolled. The hyperglycemic group was defined as that having maximum glucose of more than 150 mg/dL (n=61) during the first 48 h of life, and the non-hyperglycemic group was defined as that having maximum glucose of less than 150 mg/dL (n=80). Perinatal history, severity of illness using the Clinical Risk Index for Babies (CRIB) score, clinical outcomes, and mortality of the two groups were compared. Results : There was no significant difference in the gestational age between the two groups, but the birth weight (P<0.001) was significantly lower, and the CRIB score (P<0.001) was significantly higher in the hyperglycemic group. Disseminated intravascular coagulation (P<0.001) and clinically suspected sepsis (P=0.046) were more common in the hyperglycemic group. Mortality was markedly higher in the hyperglycemic group (11.3% vs. 41.0%, P<0.001). On performing a stepwise multiple logistic regression analysis, hyperglycemia (OR 3.787; 95% CI 1.324 to 10.829), the CRIB score (OR 1.252; 95% CI 1.047 to 1.496) and birth weight (OR 0.997; 95% CI 0.994 to 1.000) was independently associated with higher mortality. Conclusion : Stress hyperglycemia within the first 48 h of life is independently related to increased morbidity and mortality in preterm infants.

Factors Related to Serum Vitamin C Level in Terminally Ill Cancer Patients (말기암환자에서 혈청 비타민 C 농도와 연관된 인자들)

  • Kim, Hyung Jun;Hwang, In Cheol;Yeom, Chang Hwan;Ahn, Hong Yup;Choi, Youn Seon;Lee, Jae Jun;Lim, Su Hyuk
    • Journal of Hospice and Palliative Care
    • /
    • v.17 no.4
    • /
    • pp.241-247
    • /
    • 2014
  • Purpose: Serum vitamin C is one of the indicators for antioxidant levels in the body and it is lower in cancer patients compared with the healthy population. However, there have been few studies on the levels of serum vitamin C in terminally ill cancer patients and related factors. Methods: We followed 65 terminal cancer patients who were hospitalized in two palliative care units. We collected data of age, sex, cancer type, functional status, clinical symptoms, history of cancer therapy, and various laboratory findings including serum vitamin C level. Patients were categorized into two groups according to the quartile of serum vitamin C level (Q1-3 vs. Q4), which were compared each other. Stepwise multiple logistic regression analysis was used to identify factors related to serum vitamin C levels. Results: The mean serum vitamin C level was $0.44{\mu}g/mL$, and all patients fell into the category of vitamin C deficiency. Univariate analysis showed that The serum vitamin C level was lower in non-lung cancer patients (P=0.041) and febrile patients (P=0.034). Multivariate analysis adjusted for potential confounders such as lung cancer, fever, dysphagia, dyspnea, C reactive protein, and history of chemotherapy demonstrated that odds for low serum vitamin C level was 3.7 for patients receiving chemotherapy (P=0.046) and 7.22 for febrile patients (P=0.02). Conclusion: Vitamin C deficiency was very severe in terminally ill cancer patients, and it was associated with history of chemotherapy and fever.