• 제목/요약/키워드: variables selection

Search Result 1,200, Processing Time 0.03 seconds

The Credit Information Feature Selection Method in Default Rate Prediction Model for Individual Businesses (개인사업자 부도율 예측 모델에서 신용정보 특성 선택 방법)

  • Hong, Dongsuk;Baek, Hanjong;Shin, Hyunjoon
    • Journal of the Korea Society for Simulation
    • /
    • v.30 no.1
    • /
    • pp.75-85
    • /
    • 2021
  • In this paper, we present a deep neural network-based prediction model that processes and analyzes the corporate credit and personal credit information of individual business owners as a new method to predict the default rate of individual business more accurately. In modeling research in various fields, feature selection techniques have been actively studied as a method for improving performance, especially in predictive models including many features. In this paper, after statistical verification of macroeconomic indicators (macro variables) and credit information (micro variables), which are input variables used in the default rate prediction model, additionally, through the credit information feature selection method, the final feature set that improves prediction performance was identified. The proposed credit information feature selection method as an iterative & hybrid method that combines the filter-based and wrapper-based method builds submodels, constructs subsets by extracting important variables of the maximum performance submodels, and determines the final feature set through prediction performance analysis of the subset and the subset combined set.

연결강도분석을 이용한 통합된 부도예측용 신경망모형

  • Lee Woongkyu;Lim Young Ha
    • Proceedings of the Korea Association of Information Systems Conference
    • /
    • 2002.11a
    • /
    • pp.289-312
    • /
    • 2002
  • This study suggests the Link weight analysis approach to choose input variables and an integrated model to make more accurate bankruptcy prediction model. the Link weight analysis approach is a method to choose input variables to analyze each input node's link weight which is the absolute value of link weight between an input nodes and a hidden layer. There are the weak-linked neurons elimination method, the strong-linked neurons selection method in the link weight analysis approach. The Integrated Model is a combined type adapting Bagging method that uses the average value of the four models, the optimal weak-linked-neurons elimination method, optimal strong-linked neurons selection method, decision-making tree model, and MDA. As a result, the methods suggested in this study - the optimal strong-linked neurons selection method, the optimal weak-linked neurons elimination method, and the integrated model - show much higher accuracy than MDA and decision making tree model. Especially the integrated model shows much higher accuracy than MDA and decision making tree model and shows slightly higher accuracy than the optimal weak-linked neurons elimination method and the optimal strong-linked neurons selection method.

  • PDF

Prediction of Auditor Selection Using a Combination of PSO Algorithm and CART in Iran

  • Salehi, Mahdi;Kamalahmadi, Sharifeh;Bahrami, Mostafa
    • Journal of Distribution Science
    • /
    • v.12 no.3
    • /
    • pp.33-41
    • /
    • 2014
  • Purpose - The purpose of this study was to predict the selection of independent auditors in the companies listed on the Tehran Stock Exchange (TSE) using a combination of PSO algorithm and CART. This study involves applied research. Design, approach and methodology - The population consisted of all the companies listed on TSE during the period 2005-2010, and the sample included 576 data specimens from 95 companies during six consecutive years. The independent variables in the study were the financial ratios of the sample companies, which were analyzed using two data mining techniques, namely, PSO algorithm and CART. Results - The results of this study showed that among the analyzed variables, total assets, current assets, audit fee, working capital, current ratio, debt ratio, solvency ratio, turnover, and capital were predictors of independent auditor selection. Conclusion - The current study is practically the first to focus on this topic in the specific context of Iran. In this regard, the study may be valuable for application in developing countries.

Variable Selection in Normal Mixture Model Based Clustering under Heteroscedasticity (이분산 상황 하에서 정규혼합모형 기반 군집분석의 변수선택)

  • Kim, Seung-Gu
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.6
    • /
    • pp.1213-1224
    • /
    • 2011
  • In high dimensionality where the number of variables are excessively larger than observations, it is required to remove the noninformative variables to cluster observations. Most model-based approaches for variable selection have been considered under the assumption of homoscedasticity and their models are mainly estimated by a penalized likelihood method. In this paper, a different approach is proposed to remove the noninformative variables effectively and to cluster based on the modified normal mixture model simultaneously. The validity of the model was provided and an EM algorithm was derived to estimate the parameters. Simulation studies and an experiment using real microarray dataset showed the effectiveness of the proposed method.

A Hierarchical Expert System for Process Planning and Material Selection (공정계획과 재료선정의 동시적 해결을 위한 계층구조 전문가시스템)

  • 권순범;이영봉;이재규
    • Journal of Intelligence and Information Systems
    • /
    • v.6 no.2
    • /
    • pp.29-40
    • /
    • 2000
  • Process planning (selection and ordering of processes) and material selection for product manufacturing are two key things determined before taking full-scale manufacturing. Knowledge on product design. material characteristics, processes, time and cost all-together are mutually related and should be considered concurrently. Due to the complexity of problem, human experts have got only one of the feasilbe solutions with their field knowledge and experiences. We propose a hierarchical expert system framework of knowledge representation and reasoning in order to overcome the complexity. Manufacturing processes have inherently hierarchical relationships, from top level processes to bottom level operation processes. Process plan of one level is posted in process blackboard and used for lower level process planning. Process information on blackboard is also used to adjust the process plan in order to resolve the dead-end or inconsistency situation during reasoning. Decision variables for process, material, tool, time and cost are represented as object frames, and their relationships are represented as constraints and rules. Constraints are for relationship among variables such as compatibility, numerical inequality etc. Rules are for causal relationships among variables to reflect human expert\`s knowledge such as process precedence. CRSP(Constraint and Rule Satisfaction Problem) approach is adopted in order to obtain solution to satisfy both constraints and rules. The trade-off procedure gives user chances to see the impact of change of important variables such as material, cost, time and helps to determine the preferred solution. We developed the prototype system using visual C++ MFC, UNIK, and UNlK-CRSP on PC.

  • PDF

Drought forecasting over South Korea based on the teleconnected global climate variables

  • Taesam Lee;Yejin Kong;Sejeong Lee;Taegyun Kim
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.47-47
    • /
    • 2023
  • Drought occurs due to lack of water resources over an extended period and its intensity has been magnified globally by climate change. In recent years, drought over South Korea has also been intensed, and the prediction was inevitable for the water resource management and water industry. Therefore, drought forecasting over South Korea was performed in the current study with the following procedure. First, accumulated spring precipitation(ASP) driven by the 93 weather stations in South Korea was taken with their median. Then, correlation analysis was followed between ASP and Df4m, the differences of two pair of the global winter MSLP. The 37 Df4m variables with high correlations over 0.55 was chosen and sorted into three regions. The selected Df4m variables in the same region showed high similarity, leading the multicollinearity problem. To avoid this problem, a model that performs variable selection and model fitting at once, least absolute shrinkage and selection operator(LASSO) was applied. The LASSO model selected 5 variables which showed a good agreement of the predicted with the observed value, R2=0.72. Other models such as multiple linear regression model and ElasticNet were also performed, but did not present a performance as good as LASSO. Therefore, LASSO model can be an appropriate model to forecast spring drought over South Korea and can be used to mange water resources efficiently.

  • PDF

Variable Selection Based on Mutual Information

  • Huh, Moon-Y.;Choi, Byong-Su
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.1
    • /
    • pp.143-155
    • /
    • 2009
  • Best subset selection procedure based on mutual information (MI) between a set of explanatory variables and a dependent class variable is suggested. Derivation of multivariate MI is based on normal mixtures. Several types of normal mixtures are proposed. Also a best subset selection algorithm is proposed. Four real data sets are employed to demonstrate the efficiency of the proposals.

A Study on Vegetarian Market Segmentation by Vegetarian Selection Attributes (채식 선택 속성에 따른 채식 시장세분화 연구)

  • Do-Hyun Jeon;Myoung-Dae Jo;Seon-Hee Kim
    • Journal of the Korean Society of Food Culture
    • /
    • v.39 no.1
    • /
    • pp.30-37
    • /
    • 2024
  • Consumption market research was conducted on gradually increasing vegetarians using various selection attributes. Factors were extracted to identify vegetarian selection attributes and to divide the study cohort into groups, continuous variables (health, animal welfare, eco-friendliness, religion, familiarity, convenience, stability, and cost) and categorical variables (age, marital status, vegetarian duration, and vegetarian frequency) were simultaneously subjected to two-step cluster analysis. Cluster 1 contained high proportions of 20-29 and 30-39 year-olds, which are MZ-generation age groups. A high proportion had a vegetarian duration of 1-3 years, and the popular reasons for vegetarian selection were animal welfare and eco-friendliness. Cluster 2 contained high proportions of 50-59 and 40-49 year-olds, and many in this cluster were married, and mean vegetarian duration was ≥15 years. In addition, significant differences were observed between Clusters 1 and 2 in terms of religion, health, familiarity, cost, stability, and convenience. This study should contribute significantly to predicting vegetarian consumers' selection decisions and consumption behaviors and provide reliable marketing data for foodservice companies that develop vegetarian foods.

Ordinal Variable Selection in Decision Trees (의사결정나무에서 순서형 분리변수 선택에 관한 연구)

  • Kim Hyun-Joong
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.1
    • /
    • pp.149-161
    • /
    • 2006
  • The most important component in decision tree algorithm is the rule for split variable selection. Many earlier algorithms such as CART and C4.5 use greedy search algorithm for variable selection. Recently, many methods were developed to cope with the weakness of greedy search algorithm. Most algorithms have different selection criteria depending on the type of variables: continuous or nominal. However, ordinal type variables are usually treated as continuous ones. This approach did not cause any trouble for the methods using greedy search algorithm. However, it may cause problems for the newer algorithms because they use statistical methods valid for continuous or nominal types only. In this paper, we propose a ordinal variable selection method that uses Cramer-von Mises testing procedure. We performed comparisons among CART, C4.5, QUEST, CRUISE, and the new method. It was shown that the new method has a good variable selection power for ordinal type variables.

A Note on Model Selection in Mixture Experiments with Process Variables (공정변수를 갖는 혼합물 실험에서 모형선택의 한 방법)

  • Kim, Jung Il
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.1
    • /
    • pp.201-208
    • /
    • 2013
  • In this paper, we consider the mixture components-process variables model and propose a model selection strategy using MTS. This strategy is illustrated using an example that involves three mixture components and two process variables in a bread making experiment that was studied in several literatures.