• Title/Summary/Keyword: Subset selection

Search Result 203, Processing Time 0.021 seconds

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (부도예측을 위한 KNN 앙상블 모형의 동시 최적화)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.139-157
    • /
    • 2016
  • Bankruptcy involves considerable costs, so it can have significant effects on a country's economy. Thus, bankruptcy prediction is an important issue. Over the past several decades, many researchers have addressed topics associated with bankruptcy prediction. Early research on bankruptcy prediction employed conventional statistical methods such as univariate analysis, discriminant analysis, multiple regression, and logistic regression. Later on, many studies began utilizing artificial intelligence techniques such as inductive learning, neural networks, and case-based reasoning. Currently, ensemble models are being utilized to enhance the accuracy of bankruptcy prediction. Ensemble classification involves combining multiple classifiers to obtain more accurate predictions than those obtained using individual models. Ensemble learning techniques are known to be very useful for improving the generalization ability of the classifier. Base classifiers in the ensemble must be as accurate and diverse as possible in order to enhance the generalization ability of an ensemble model. Commonly used methods for constructing ensemble classifiers include bagging, boosting, and random subspace. The random subspace method selects a random feature subset for each classifier from the original feature space to diversify the base classifiers of an ensemble. Each ensemble member is trained by a randomly chosen feature subspace from the original feature set, and predictions from each ensemble member are combined by an aggregation method. The k-nearest neighbors (KNN) classifier is robust with respect to variations in the dataset but is very sensitive to changes in the feature space. For this reason, KNN is a good classifier for the random subspace method. The KNN random subspace ensemble model has been shown to be very effective for improving an individual KNN model. The k parameter of KNN base classifiers and selected feature subsets for base classifiers play an important role in determining the performance of the KNN ensemble model. However, few studies have focused on optimizing the k parameter and feature subsets of base classifiers in the ensemble. This study proposed a new ensemble method that improves upon the performance KNN ensemble model by optimizing both k parameters and feature subsets of base classifiers. A genetic algorithm was used to optimize the KNN ensemble model and improve the prediction accuracy of the ensemble model. The proposed model was applied to a bankruptcy prediction problem by using a real dataset from Korean companies. The research data included 1800 externally non-audited firms that filed for bankruptcy (900 cases) or non-bankruptcy (900 cases). Initially, the dataset consisted of 134 financial ratios. Prior to the experiments, 75 financial ratios were selected based on an independent sample t-test of each financial ratio as an input variable and bankruptcy or non-bankruptcy as an output variable. Of these, 24 financial ratios were selected by using a logistic regression backward feature selection method. The complete dataset was separated into two parts: training and validation. The training dataset was further divided into two portions: one for the training model and the other to avoid overfitting. The prediction accuracy against this dataset was used to determine the fitness value in order to avoid overfitting. The validation dataset was used to evaluate the effectiveness of the final model. A 10-fold cross-validation was implemented to compare the performances of the proposed model and other models. To evaluate the effectiveness of the proposed model, the classification accuracy of the proposed model was compared with that of other models. The Q-statistic values and average classification accuracies of base classifiers were investigated. The experimental results showed that the proposed model outperformed other models, such as the single model and random subspace ensemble model.

A Study on Clinical Variables Contributing to Differentiation of Delirium and Non-Delirium Patients in the ICU (중환자실 섬망 환자와 비섬망 환자 구분에 기여하는 임상 지표에 관한 연구)

  • Ko, Chanyoung;Kim, Jae-Jin;Cho, Dongrae;Oh, Jooyoung;Park, Jin Young
    • Korean Journal of Psychosomatic Medicine
    • /
    • v.27 no.2
    • /
    • pp.101-110
    • /
    • 2019
  • Objectives : It is not clear which clinical variables are most closely associated with delirium in the Intensive Care Unit (ICU). By comparing clinical data of ICU delirium and non-delirium patients, we sought to identify variables that most effectively differentiate delirium from non-delirium. Methods : Medical records of 6,386 ICU patients were reviewed. Random Subset Feature Selection and Principal Component Analysis were utilized to select a set of clinical variables with the highest discriminatory capacity. Statistical analyses were employed to determine the separation capacity of two models-one using just the selected few clinical variables and the other using all clinical variables associated with delirium. Results : There was a significant difference between delirium and non-delirium individuals across 32 clinical variables. Richmond Agitation Sedation Scale (RASS), urinary catheterization, vascular catheterization, Hamilton Anxiety Rating Scale (HAM-A), Blood urea nitrogen, and Acute Physiology and Chronic Health Examination II most effectively differentiated delirium from non-delirium. Multivariable logistic regression analysis showed that, with the exception of vascular catheterization, these clinical variables were independent risk factors associated with delirium. Separation capacity of the logistic regression model using just 6 clinical variables was measured with Receiver Operating Characteristic curve, with Area Under the Curve (AUC) of 0.818. Same analyses were performed using all 32 clinical variables;the AUC was 0.881, denoting a very high separation capacity. Conclusions : The six aforementioned variables most effectively separate delirium from non-delirium. This highlights the importance of close monitoring of patients who received invasive medical procedures and were rated with very low RASS and HAM-A scores.

Single Nucleotide Polymorphisms (SNPs) Discovery in GHSR Gene and Their Association Analysis with Economic Traits in Korean Native Chickens (GHSR 유전자 내 유전변이의 탐색과 한국재래계의 성장 및 산란 특성에 미치는 연관성 분석)

  • Choi, So-Young;Hong, Min-Wook;Yang, Song-Yi;Kim, Chong-Dae;Jeong, Dong Kee;Hong, Yeong Ho;Lee, Sung-Jin
    • Korean Journal of Poultry Science
    • /
    • v.43 no.4
    • /
    • pp.273-279
    • /
    • 2016
  • Recently, it was reported that certain polymorphisms in the growth hormone secretagogue receptor gene (GHSR) are associated with the growth of chickens. However, the correlation between GHSR polymorphisms and economic traits has not been investigated in Korean native chickens (KNCs). Therefore, the objective of this study was to confirm the suitability of the GHSR gene as a candidate for genomic selection and identify a genetic marker for KNCs. A total of 220 KNCs from six breeds raised at the National Institute of Animal Science were genotyped for the c.739+726 SNP in the GHSR gene using polymerase chain reaction- restriction fragment length polymorphism (PCR-RFLP), and the sequence for a subset of 30 birds was analyzed using direct sequencing. The association between the SNP genotypes and the economic traits of the KNCs was analyzed using the statistical package for the social science (SPSS) software program. The association analysis between the c.739+726T>C SNP and economic traits revealed that the SNP was significantly associated with body weight at 150 and 270 days (BW150 and BW270, respectively) in all KNCs (p<0.01), BW150 in KNC (Gary) (p<0.05), and egg production number in KNC (White, p<0.05). In addition, the SNPs discovered using direct sequencing (513A>G, 517A>T) had a significant effect on the body weight and egg production traits (p<0.05). In conclusion, these results might be useful as a basis for studies on the improvement of KNC breeds. Furthermore, these results suggest that the SNPs (c.739+726T>C, 513A>G, and 517A>T) located in the GHSR gene could be useful molecular genetic markers for KNCs.