• Title/Summary/Keyword: Input Variables

Search Result 1,770, Processing Time 0.028 seconds

Development and application of prediction model of hyperlipidemia using SVM and meta-learning algorithm (SVM과 meta-learning algorithm을 이용한 고지혈증 유병 예측모형 개발과 활용)

  • Lee, Seulki;Shin, Taeksoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.111-124
    • /
    • 2018
  • This study aims to develop a classification model for predicting the occurrence of hyperlipidemia, one of the chronic diseases. Prior studies applying data mining techniques for predicting disease can be classified into a model design study for predicting cardiovascular disease and a study comparing disease prediction research results. In the case of foreign literatures, studies predicting cardiovascular disease were predominant in predicting disease using data mining techniques. Although domestic studies were not much different from those of foreign countries, studies focusing on hypertension and diabetes were mainly conducted. Since hypertension and diabetes as well as chronic diseases, hyperlipidemia, are also of high importance, this study selected hyperlipidemia as the disease to be analyzed. We also developed a model for predicting hyperlipidemia using SVM and meta learning algorithms, which are already known to have excellent predictive power. In order to achieve the purpose of this study, we used data set from Korea Health Panel 2012. The Korean Health Panel produces basic data on the level of health expenditure, health level and health behavior, and has conducted an annual survey since 2008. In this study, 1,088 patients with hyperlipidemia were randomly selected from the hospitalized, outpatient, emergency, and chronic disease data of the Korean Health Panel in 2012, and 1,088 nonpatients were also randomly extracted. A total of 2,176 people were selected for the study. Three methods were used to select input variables for predicting hyperlipidemia. First, stepwise method was performed using logistic regression. Among the 17 variables, the categorical variables(except for length of smoking) are expressed as dummy variables, which are assumed to be separate variables on the basis of the reference group, and these variables were analyzed. Six variables (age, BMI, education level, marital status, smoking status, gender) excluding income level and smoking period were selected based on significance level 0.1. Second, C4.5 as a decision tree algorithm is used. The significant input variables were age, smoking status, and education level. Finally, C4.5 as a decision tree algorithm is used. In SVM, the input variables selected by genetic algorithms consisted of 6 variables such as age, marital status, education level, economic activity, smoking period, and physical activity status, and the input variables selected by genetic algorithms in artificial neural network consist of 3 variables such as age, marital status, and education level. Based on the selected parameters, we compared SVM, meta learning algorithm and other prediction models for hyperlipidemia patients, and compared the classification performances using TP rate and precision. The main results of the analysis are as follows. First, the accuracy of the SVM was 88.4% and the accuracy of the artificial neural network was 86.7%. Second, the accuracy of classification models using the selected input variables through stepwise method was slightly higher than that of classification models using the whole variables. Third, the precision of artificial neural network was higher than that of SVM when only three variables as input variables were selected by decision trees. As a result of classification models based on the input variables selected through the genetic algorithm, classification accuracy of SVM was 88.5% and that of artificial neural network was 87.9%. Finally, this study indicated that stacking as the meta learning algorithm proposed in this study, has the best performance when it uses the predicted outputs of SVM and MLP as input variables of SVM, which is a meta classifier. The purpose of this study was to predict hyperlipidemia, one of the representative chronic diseases. To do this, we used SVM and meta-learning algorithms, which is known to have high accuracy. As a result, the accuracy of classification of hyperlipidemia in the stacking as a meta learner was higher than other meta-learning algorithms. However, the predictive performance of the meta-learning algorithm proposed in this study is the same as that of SVM with the best performance (88.6%) among the single models. The limitations of this study are as follows. First, various variable selection methods were tried, but most variables used in the study were categorical dummy variables. In the case with a large number of categorical variables, the results may be different if continuous variables are used because the model can be better suited to categorical variables such as decision trees than general models such as neural networks. Despite these limitations, this study has significance in predicting hyperlipidemia with hybrid models such as met learning algorithms which have not been studied previously. It can be said that the result of improving the model accuracy by applying various variable selection techniques is meaningful. In addition, it is expected that our proposed model will be effective for the prevention and management of hyperlipidemia.

A Study on the Home Management Behavior in Employed Wives' Families Based on a System's Approach (체계론적 관점에서 본 취업주부가정의 가정관리행동 연구 -갈등 관리 행동을 중심으로-)

  • Choi, Ho-Sook;Moon, Sook-Jae
    • Journal of Families and Better Life
    • /
    • v.10 no.1 s.19
    • /
    • pp.75-94
    • /
    • 1992
  • The purpose of this study was to provide for the appropriate conflict management strategies to employed wives by investigation casual relations of conflict, resources, home management behavior and managerial satisfaction by applying a system's approach. The data were collected through the questionnaire whose respondent were 388 employed wives. The data were analyzed by various statistical methods such as Frequency, Percentage, ANOVA, F-test, T-test, Pearsons' correlation analysis, Multiple Regression analysis, Path analysys. The results of this study are as follows : 1) Input variables, throughput variables, output variables had differences significantly according to the family life cycle. The employed wives' families which are former term of the family life cycle used more appropriate conflict management strategies than latter term of FLC. That is, the employed wives' families which are former term of FLC had more abundant resources such as cohesive power of family, interaction with relatives, social support, had higher planning score, used more frequent structural management strategies. But, the managerial satisfaction had no differences. 2) For the relation of input variables and throughput variables, the more resources, the lower conflict is the higher planning, implementing score, structural management score. For the relation of throughput variables, the higher planning, implementing, structural management score is the higher managerial satisfaction score. For the relation of input variables and output variables, the more resources, the lower conflict is the higher managerial satisfaction, besides objective and material resource, subjective and psychological resource had influence. 4) among all variables affecting the managerial satisfaction, the commition of housework, cohesive power of family, wives' occupational level had indirect effect on managerial satisfaction through structural managemenet. Only the income had direct effect on managerial satisfaction.

  • PDF

A study on decision tree creation using marginally conditional variables (주변조건부 변수를 이용한 의사결정나무모형 생성에 관한 연구)

  • Cho, Kwang-Hyun;Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.2
    • /
    • pp.299-307
    • /
    • 2012
  • Data mining is a method of searching for an interesting relationship among items in a given database. The decision tree is a typical algorithm of data mining. The decision tree is the method that classifies or predicts a group as some subgroups. In general, when researchers create a decision tree model, the generated model can be complicated by the standard of model creation and the number of input variables. In particular, if the decision trees have a large number of input variables in a model, the generated models can be complex and difficult to analyze model. When creating the decision tree model, if there are marginally conditional variables (intervening variables, external variables) in the input variables, it is not directly relevant. In this study, we suggest the method of creating a decision tree using marginally conditional variables and apply to actual data to search for efficiency.

Genetically Optimized Fuzzy Polynomial Neural Networks Model and Its Application to Software Process (진화론적 최적 퍼지다항식 신경회로망 모델 및 소프트웨어 공정으로의 응용)

  • Lee, In-Tae;Park, Ho-Sung;Oh, Sung-Kwun;Ahn, Tae-Chon
    • Proceedings of the KIEE Conference
    • /
    • 2004.11c
    • /
    • pp.337-339
    • /
    • 2004
  • In this paper, we discuss optimal design of Fuzzy Polynomial Neural Networks by means of Genetic Algorithms(GAs). Proceeding the layer, this model creates the optimal network architecture through the selection and the elimination of nodes by itself. So, there is characteristic of flexibility. We use a triangle and a Gaussian-like membership function in premise part of rules and design the consequent structure by constant and regression polynomial (linear, quadratic and modified quadratic) function between input and output variables. GAs is applied to improve the performance with optimal input variables and number of input variables and order. To evaluate the performance of the GAs-based FPNNs, the models are experimented with the use of Medical Imaging System(MIS) data.

  • PDF

A practical application of cluster analysis using SPSS

  • Kim, Dae-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.6
    • /
    • pp.1207-1212
    • /
    • 2009
  • Basic objective in cluster analysis is to discover natural groupings of items or variables. In general, clustering is conducted based on some similarity (or dissimilarity) matrix or the original input text data. Various measures of similarities (or dissimilarities) between objects (or variables) are developed. We introduce a real application problem of clustering procedure in SPSS when the distance matrix of the objects (or variables) is only given as an input data. It will be very helpful for the cluster analysis of huge data set which leads the size of the proximity matrix greater than 1000, particularly. Syntax command for matrix input data in SPSS for clustering is given with numerical examples.

  • PDF

Fuzzy modeling using transformed input space partitioning

  • You, Je-Young;Lee, Sang-Chul;Won, Sang-Chul
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1996.10b
    • /
    • pp.494-498
    • /
    • 1996
  • Three fuzzy input space partitoining methods, which are grid, tree, and scatter method, are mainly used until now. These partition methods represent good performance in the modeling of the linear system and nonlinear system with independent modeling variables. But in the case of the nonlinear system with the coupled modeling variables, there should be many fuzzy rules for acquiring the exact fuzzy model. In this paper, it shows that the fuzzy model is acquired using transformed modeling vector by linear transformation of the modeling vector.

  • PDF

A V­Groove $CO_2$ Gas Metal Arc Welding Process with Root Face Height Using Genetic Algorithm

  • Ahn, S.;Rhee, S.
    • International Journal of Korean Welding Society
    • /
    • v.3 no.2
    • /
    • pp.15-23
    • /
    • 2003
  • A genetic algorithm was applied to an arc welding process to determine near optimal settings of welding process parameters which produce good weld quality. This method searches for optimal settings of welding parameters through systematic experiments without a model between input and output variables. It has an advantage of being able to find optimal conditions with a fewer number of experiments than conventional full factorial design. A genetic algorithm was applied to optimization of weld bead geometry. In the optimization problem, the input variables were wire feed rate, welding voltage, and welding speed, root opening and the output variables were bead height, bead width, penetration and back bead width. The number of level for each input variable is 8, 16, 8 and 3, respectively. Therefore, according to the conventional full factorial design, in order to find the optimal welding conditions, 3,072 experiments must be performed. The genetic algorithm, however, found the near optimal welding conditions from less than 48 experiments.

  • PDF

Multimachine Stabilizer using Sliding Mode Observer-Model Following including CLF for Measurable State Variables

  • Lee, Sang-Seung;Park, Jong-Keun
    • Journal of Electrical Engineering and information Science
    • /
    • v.2 no.4
    • /
    • pp.53-58
    • /
    • 1997
  • In this paper, the power system stabilizer(PSS) using the sliding mode observer-model following(SMO-MF) with closed-loop feedback (CLF) for single machine system is extended to multimachine system. This a multimachine SMO-MF PSS for unmeasureable plant state variable is obtained by combining the sliding mode-model following(SM-MF) including closed-loop feedback(CLF) with the full-order observer(FOO). And the estimated control input for unmeasurable plant sate variables is derived by Lyapunov's second method to determine a control input that keeps the system stable. Time domain simulation results for the torque angle and for the angular velocity show that the proposed multimachine SMO-MF PSS including CLF for unmeasurable plant sate variables is able to damp out the low frequency oscillation and to achieve asymptotic tracking error between the reference model state at different initial conditions and at step input.

  • PDF

Determination on Optima Condition for a Gas Metal Arc Welding Process Using Genetic Algorithm (유전 알고리즘을 이용한 가스 메탈 아크 용접 공정의 최적 조건 설정에 관한 연구)

  • 김동철;이세헌
    • Journal of Welding and Joining
    • /
    • v.18 no.5
    • /
    • pp.63-69
    • /
    • 2000
  • A genetic algorithm was applied to an arc welding process to determine near optimal settings of welding process parameters which produce good weld quality. This method searches for optimal settings of welding parameters through systematic experiments without a model between input and output variables. It has an advantage of being able to find optimal conditions with a fewer number of experiments than conventional full factorial design. A genetic algorithm was applied to optimization of weld bead geometry. In the optimization problem, the input variables was wire feed rate, welding voltage, and welding speed and the output variables were bead height, bead width, and penetration. The number of level for each input variable is 16, 16, and 8, respectively. Therefore, according to the conventional full factorial design, in order to find the optimal welding conditions, 2048 experiments must be performed. The genetic algorithm, however, found the near optimal welding conditions from less than 40 experiments.

  • PDF

A Study on Optimal Synthesis of Multiple-Valued Logic Circuits using Universal Logic Modules U$_{f}$ based on Reed-Muller Expansions (Reed-Muller 전개식에 의한 범용 논리 모듈 U$_{f}$ 의 다치 논리 회로의 최적 합성에 관한 연구)

  • 최재석;한영환;성현경
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.34C no.12
    • /
    • pp.43-53
    • /
    • 1997
  • In this paper, the optimal synthesis algorithm of multiple-valued logic circuits using universal logic modules (ULM) U$_{f}$ based on 3-variable ternary reed-muller expansions is presented. We check the degree of each varable for the coefficients of reed-muller expansions and determine the order of optimal control input variables that minimize the number of ULM U$_{f}$ modules. The order of optimal control input variables is utilized the realization of multiple-valued logic circuits to be constructed by ULM U$_{f}$ modules based on reed-muller expansions using the circuit cost matrix. This algorithm is performed only unit time in order to search for the optimal control input variables. Also, this algorithm is able to be programmed by computer and the run time on programming is O(p$^{n}$ ).

  • PDF