• Title/Summary/Keyword: Stepwise selection

Search Result 156, Processing Time 0.027 seconds

Development and application of prediction model of hyperlipidemia using SVM and meta-learning algorithm (SVM과 meta-learning algorithm을 이용한 고지혈증 유병 예측모형 개발과 활용)

  • Lee, Seulki;Shin, Taeksoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.111-124
    • /
    • 2018
  • This study aims to develop a classification model for predicting the occurrence of hyperlipidemia, one of the chronic diseases. Prior studies applying data mining techniques for predicting disease can be classified into a model design study for predicting cardiovascular disease and a study comparing disease prediction research results. In the case of foreign literatures, studies predicting cardiovascular disease were predominant in predicting disease using data mining techniques. Although domestic studies were not much different from those of foreign countries, studies focusing on hypertension and diabetes were mainly conducted. Since hypertension and diabetes as well as chronic diseases, hyperlipidemia, are also of high importance, this study selected hyperlipidemia as the disease to be analyzed. We also developed a model for predicting hyperlipidemia using SVM and meta learning algorithms, which are already known to have excellent predictive power. In order to achieve the purpose of this study, we used data set from Korea Health Panel 2012. The Korean Health Panel produces basic data on the level of health expenditure, health level and health behavior, and has conducted an annual survey since 2008. In this study, 1,088 patients with hyperlipidemia were randomly selected from the hospitalized, outpatient, emergency, and chronic disease data of the Korean Health Panel in 2012, and 1,088 nonpatients were also randomly extracted. A total of 2,176 people were selected for the study. Three methods were used to select input variables for predicting hyperlipidemia. First, stepwise method was performed using logistic regression. Among the 17 variables, the categorical variables(except for length of smoking) are expressed as dummy variables, which are assumed to be separate variables on the basis of the reference group, and these variables were analyzed. Six variables (age, BMI, education level, marital status, smoking status, gender) excluding income level and smoking period were selected based on significance level 0.1. Second, C4.5 as a decision tree algorithm is used. The significant input variables were age, smoking status, and education level. Finally, C4.5 as a decision tree algorithm is used. In SVM, the input variables selected by genetic algorithms consisted of 6 variables such as age, marital status, education level, economic activity, smoking period, and physical activity status, and the input variables selected by genetic algorithms in artificial neural network consist of 3 variables such as age, marital status, and education level. Based on the selected parameters, we compared SVM, meta learning algorithm and other prediction models for hyperlipidemia patients, and compared the classification performances using TP rate and precision. The main results of the analysis are as follows. First, the accuracy of the SVM was 88.4% and the accuracy of the artificial neural network was 86.7%. Second, the accuracy of classification models using the selected input variables through stepwise method was slightly higher than that of classification models using the whole variables. Third, the precision of artificial neural network was higher than that of SVM when only three variables as input variables were selected by decision trees. As a result of classification models based on the input variables selected through the genetic algorithm, classification accuracy of SVM was 88.5% and that of artificial neural network was 87.9%. Finally, this study indicated that stacking as the meta learning algorithm proposed in this study, has the best performance when it uses the predicted outputs of SVM and MLP as input variables of SVM, which is a meta classifier. The purpose of this study was to predict hyperlipidemia, one of the representative chronic diseases. To do this, we used SVM and meta-learning algorithms, which is known to have high accuracy. As a result, the accuracy of classification of hyperlipidemia in the stacking as a meta learner was higher than other meta-learning algorithms. However, the predictive performance of the meta-learning algorithm proposed in this study is the same as that of SVM with the best performance (88.6%) among the single models. The limitations of this study are as follows. First, various variable selection methods were tried, but most variables used in the study were categorical dummy variables. In the case with a large number of categorical variables, the results may be different if continuous variables are used because the model can be better suited to categorical variables such as decision trees than general models such as neural networks. Despite these limitations, this study has significance in predicting hyperlipidemia with hybrid models such as met learning algorithms which have not been studied previously. It can be said that the result of improving the model accuracy by applying various variable selection techniques is meaningful. In addition, it is expected that our proposed model will be effective for the prevention and management of hyperlipidemia.

Analysis of dentoalveolar compensation and discrimination of skeletal types (골격형에 따른 치아치조성 보상기전의 분석 및 골격형 판별)

  • Kim, Ji-Young;Kim, Tae-Woo;Nahm, Dong-Seok;Chang, Young-Il
    • The korean journal of orthodontics
    • /
    • v.33 no.6 s.101
    • /
    • pp.407-418
    • /
    • 2003
  • The purpose of this study is to analyze dentoalveolar compensation in normal occlusion samples previously classified into 9 skeletal types, and to provide clinically applicable diagnostic criteria for individual malocclusion patients. Cephalometric measurements of the 294 normal occlusion samples previously divided into 9 types were analyzed. The descriptive features of dentoalveolar variables were compared for the 9 types using analysis of variance, followed by post hoc multiple comparisons. In addition, the correlation between skeletal and dentoalveolar variables were analyzed. Discriminant analysis with a stepwise entry of variables was designed to find out several potential variables for use in skeletal typing. The dentoalveolar compensation pattern of the skeletal types varied, especially with regards to the variables that indicated the inclination of incisors and the occlusal plane. Stepwise variable selection identified four variables: AB-MP, SN-AB, PMA and ANB. Discriminant analysis assigned a classification accuracy of $87.8\%$ to the predictive model. On the basis of these results, this study could provide rudimentary information for the development of diagnostic criteria and treatment guidelines for individual skeletal types.

A Whole Genome Association Study to Detect Single Nucleotide Polymorphisms for Carcass Traits in Hanwoo Populations

  • Lee, Y.-M.;Han, C.-M.;Li, Yi;Lee, J.-J.;Kim, L.H.;Kim, J.-H.;Kim, D.-I.;Lee, S.-S.;Park, B.-L.;Shin, H.-D.;Kim, K.-S.;Kim, N.-S.;Kim, Jong-Joo
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.23 no.4
    • /
    • pp.417-424
    • /
    • 2010
  • The purpose of this study was to detect significant SNPs for carcass quality traits using DNA chips of high SNP density in Hanwoo populations. Carcass data of two hundred and eighty nine steers sired by 30 Korean proven sires were collected from two regions; the Hanwoo Improvement Center of National Agricultural Cooperative Federation in Seosan, Chungnam province and the commercial farms in Gyeongbuk province. The steers in Seosan were born between spring and fall of 2006 and those in Gyeonbuk between falls of 2004 and 2005. The former steers were slaughtered at approximately 24 months, while the latter steers were fed six months longer before slaughter. Among the 55,074 SNPs in the Illumina bovine 50K chip, a total of 32,756 available SNPs were selected for whole genome association study. After adjusting for the effects of sire, region and slaughter age, phenotypes were regressed on each SNP using a simple linear regression model. For the significance threshold, 0.1% point-wise p value from F distribution was used for each SNP test. Among the significant SNPs for a trait, the best set of SNP markers were selected using a stepwise regression procedure, and inclusion and exclusion of each SNP out of the model was determined at the p<0.001 level. A total of 118 SNPs were detected; 15, 20, 22, 28, 20, and 13 SNPs for final weight before slaughter, carcass weight, backfat thickness, weight index, longissimus dorsi muscle area, and marbling score, respectively. Among the significant SNPs, the best set of 44 SNPs was determined by stepwise regression procedures with 7, 9, 6, 9, 7, and 6 SNPs for the respective traits. Each set of SNPs per trait explained 20-40% of phenotypic variance. The number of detected SNPs per trait was not great in whole genome association tests, suggesting additional phenotype and genotype data are required to get more power to detect the trait-related SNPs with high accuracy for estimation of the SNP effect. These SNP markers could be applied to commercial Hanwoo populations via marker-assisted selection to verify the SNP effects and to improve genetic potentials in successive generations of the Hanwoo populations.

Development of Recombinant Chinese Hamster Ovary Cell Lines Producing Human Thrombopoietin or Its Analog

  • Chung, Joo-Young;Ahn, Hae-Kyung;Lim, Seung-Wook;Sung, Yun-Hee;Koh, Yeo-Wook;Park, Seung-Kook;Lee, Gyun-Min
    • Journal of Microbiology and Biotechnology
    • /
    • v.13 no.5
    • /
    • pp.759-766
    • /
    • 2003
  • Recombinant Chinese hamster ovary (rCHO) cell lines expressing a high level of human thrombopoietin (hTPO) or its analog, TPO33r, were obtained by transfecting expression vectors into dihydrofolate reductase-deficient (dhfr) CHO cells and subsequent gene amplification in media containing stepwise increments in methotrexate (MTX) level such as 20, 80, and 320 nM. The parental clones with a hTPO expression level $>0.40\;{\mu}g/ml$ (27 out of 1,200 clones) and the parental clones with a TPO33r expression level $>0.20\;{\mu}g/ml$ (36 out of 400 clones) were subjected to 20 nM MTX. The clones that displayed an increased expression level at 20 nM MTX were subjected to stepwise increasing levels of MTX such as 80 and 320 nM. When subjected to 320 nM MTX, most clones did not display an increased expression level, since the detrimental effect of gene amplification on growth reduction outweighed its beneficial effect of specific TPO productivity ($q_{TPO}$) enhancement at 320 nM MTX. Accordingly, the highest producer subclones ($1-434-80^{*}$ for hTPO and $2-3-80^{*}$ for TPO33r), whose $q_{TPO}$ was 2- to 3-fold higher than that of their parental clones selected at 80 nM MTX, were isolated by limiting dilution method and were established as rCHO cel1 lines. The $q_{TPO}$ of $1-434-80^{*}\;and\;2-3-80^{*}\;was\;5.89{\pm}074\;and\;1.02{\pm}0.23\;{\mu}g/10^6$ cells/day, respectively. Southern and Northern blot analyses showed that the enhanced $q_{TPO}$ of established rCHO cell lines resulted mainly from the increased TPO gene copy number and subsequent increased TPO mRNA level. The hTPO and TPO33r produced from the established rCHO cell lines were biologically active in vivo, as demonstrated by their ability to elevate platelet counts in treated mice.

A Study on the Optimal Discriminant Model Predicting the likelihood of Insolvency for Technology Financing (기술금융을 위한 부실 가능성 예측 최적 판별모형에 대한 연구)

  • Sung, Oong-Hyun
    • Journal of Korea Technology Innovation Society
    • /
    • v.10 no.2
    • /
    • pp.183-205
    • /
    • 2007
  • An investigation was undertaken of the optimal discriminant model for predicting the likelihood of insolvency in advance for medium-sized firms based on the technology evaluation. The explanatory variables included in the discriminant model were selected by both factor analysis and discriminant analysis using stepwise selection method. Five explanatory variables were selected in factor analysis in terms of explanatory ratio and communality. Six explanatory variables were selected in stepwise discriminant analysis. The effectiveness of linear discriminant model and logistic discriminant model were assessed by the criteria of the critical probability and correct classification rate. Result showed that both model had similar correct classification rate and the linear discriminant model was preferred to the logistic discriminant model in terms of criteria of the critical probability In case of the linear discriminant model with critical probability of 0.5, the total-group correct classification rate was 70.4% and correct classification rates of insolvent and solvent groups were 73.4% and 69.5% respectively. Correct classification rate is an estimate of the probability that the estimated discriminant function will correctly classify the present sample. However, the actual correct classification rate is an estimate of the probability that the estimated discriminant function will correctly classify a future observation. Unfortunately, the correct classification rate underestimates the actual correct classification rate because the data set used to estimate the discriminant function is also used to evaluate them. The cross-validation method were used to estimate the bias of the correct classification rate. According to the results the estimated bias were 2.9% and the predicted actual correct classification rate was 67.5%. And a threshold value is set to establish an in-doubt category. Results of linear discriminant model can be applied for the technology financing banks to evaluate the possibility of insolvency and give the ranking of the firms applied.

  • PDF

Transportation Card Based Optimal M-Similar Paths Searching for Estimating Passengers' Route Choice in Seoul Metropolitan Railway Network (수도권 도시철도망 승객이동경로추정을 위한 교통카드기반 최적 M-유사경로 구축방안)

  • Lee, Mee young
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.16 no.2
    • /
    • pp.1-12
    • /
    • 2017
  • The Seoul metropolitan transportation card's high value lies in its recording of total population movements of the public transit system. In case of recorded information on transit by bus, even though route information utilized by each passenger is accurate, the lack of passenger transfer information of the urban railway makes it difficult to estimate correct routes taken by each passenger. Therefore, pinpointing passenger path selection patterns arising in the metropolitan railway network and using this as part of a path movement estimation model is essential. This research seeks to determine that features of passenger movement routes in the urban railway system is comprised of M-similar routes with increasing number of transfer reflected as additional costs. In order to construct the path finding conditions, an M-similar route searching method is proposed, embedded with non additive path cost which appears through inclusion of the stepwise transportation parameter. As well, sensitivity of the M-similar route method based on transportation card records is evaluated and a stochastic trip assignment model using M-similar path finding is constructed. From these, link trip and transfer trip results between lines of the Seoul metropolitan railway are presented.

A Study on the Prediction Model of Stock Price Index Trend based on GA-MSVM that Simultaneously Optimizes Feature and Instance Selection (입력변수 및 학습사례 선정을 동시에 최적화하는 GA-MSVM 기반 주가지수 추세 예측 모형에 관한 연구)

  • Lee, Jong-sik;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.4
    • /
    • pp.147-168
    • /
    • 2017
  • There have been many studies on accurate stock market forecasting in academia for a long time, and now there are also various forecasting models using various techniques. Recently, many attempts have been made to predict the stock index using various machine learning methods including Deep Learning. Although the fundamental analysis and the technical analysis method are used for the analysis of the traditional stock investment transaction, the technical analysis method is more useful for the application of the short-term transaction prediction or statistical and mathematical techniques. Most of the studies that have been conducted using these technical indicators have studied the model of predicting stock prices by binary classification - rising or falling - of stock market fluctuations in the future market (usually next trading day). However, it is also true that this binary classification has many unfavorable aspects in predicting trends, identifying trading signals, or signaling portfolio rebalancing. In this study, we try to predict the stock index by expanding the stock index trend (upward trend, boxed, downward trend) to the multiple classification system in the existing binary index method. In order to solve this multi-classification problem, a technique such as Multinomial Logistic Regression Analysis (MLOGIT), Multiple Discriminant Analysis (MDA) or Artificial Neural Networks (ANN) we propose an optimization model using Genetic Algorithm as a wrapper for improving the performance of this model using Multi-classification Support Vector Machines (MSVM), which has proved to be superior in prediction performance. In particular, the proposed model named GA-MSVM is designed to maximize model performance by optimizing not only the kernel function parameters of MSVM, but also the optimal selection of input variables (feature selection) as well as instance selection. In order to verify the performance of the proposed model, we applied the proposed method to the real data. The results show that the proposed method is more effective than the conventional multivariate SVM, which has been known to show the best prediction performance up to now, as well as existing artificial intelligence / data mining techniques such as MDA, MLOGIT, CBR, and it is confirmed that the prediction performance is better than this. Especially, it has been confirmed that the 'instance selection' plays a very important role in predicting the stock index trend, and it is confirmed that the improvement effect of the model is more important than other factors. To verify the usefulness of GA-MSVM, we applied it to Korea's real KOSPI200 stock index trend forecast. Our research is primarily aimed at predicting trend segments to capture signal acquisition or short-term trend transition points. The experimental data set includes technical indicators such as the price and volatility index (2004 ~ 2017) and macroeconomic data (interest rate, exchange rate, S&P 500, etc.) of KOSPI200 stock index in Korea. Using a variety of statistical methods including one-way ANOVA and stepwise MDA, 15 indicators were selected as candidate independent variables. The dependent variable, trend classification, was classified into three states: 1 (upward trend), 0 (boxed), and -1 (downward trend). 70% of the total data for each class was used for training and the remaining 30% was used for verifying. To verify the performance of the proposed model, several comparative model experiments such as MDA, MLOGIT, CBR, ANN and MSVM were conducted. MSVM has adopted the One-Against-One (OAO) approach, which is known as the most accurate approach among the various MSVM approaches. Although there are some limitations, the final experimental results demonstrate that the proposed model, GA-MSVM, performs at a significantly higher level than all comparative models.

Subjective Well-Being and It's Related Factors in Korean Rural Elderly (농촌지역 노인들의 주관적 행복감과 이에 관련하는 요인)

  • Lee, Sung-Kook;Kai, Ichiro
    • Journal of agricultural medicine and community health
    • /
    • v.20 no.2
    • /
    • pp.121-131
    • /
    • 1995
  • This study aims 1) To explicate the multidimensional structure of a widely used measure of subjective well-being, the Philadelphia Geriatric Center(PGC) Morale Scale is used to measure health in elderly populations and 2) To examine the relationship between the socioeconomic status and related variables, health and physical disability and subjective well-being in elderly populations. The selection of subjects was determined through a survey of 672 rural dwelling elderly persons(269 males and 403 females) aged 60 years and over. The respondents were interviewed by 18 trained health workers using the questionnaire from July 4 to July 9 in 1994. The subjects were surveyed again during the period from August 1 to August 6 in 1994 to conform the questionnaire's reliability. Subjective well-being was evaluated using the Revised Questionnaire of the Philadelphia Geriatric Center(PGC) Moral Scale(17 items, Lawton, 1975). The results are as follows : 1) The item scores were intercorrelated and subjected to a principal component analysis. A rotated three-factor solution was done, accounting for 40.9% of the total variance. Thus, the PGC morale Scale can be derived from three stable factors : Factor 1 was explained "Lonely Dissatisfaction(7 items)", Factor 2 was explained "Agitation (5 items)", Factor 3 was explained "Attitude Towards Own Aging (5 items)". Further, these factors have a high degree of internal consistency, as determined by Cronbach's alpha : 0.7852. 2) The total mean score of PGC Morale Scale was 8.73. Sex, Age, Education, Current disease, Family type, Economic status, ADL, IADL were significantly difference in mean scores of PGC morale scale. 3) In the results of stepwise multiple regression analysis of subjective well-being. the most contributing factors were economic status, IADL, current disease, family type, sex and the R square was 0.23.

  • PDF

Pure additive contribution of genetic variants to a risk prediction model using propensity score matching: application to type 2 diabetes

  • Park, Chanwoo;Jiang, Nan;Park, Taesung
    • Genomics & Informatics
    • /
    • v.17 no.4
    • /
    • pp.47.1-47.12
    • /
    • 2019
  • The achievements of genome-wide association studies have suggested ways to predict diseases, such as type 2 diabetes (T2D), using single-nucleotide polymorphisms (SNPs). Most T2D risk prediction models have used SNPs in combination with demographic variables. However, it is difficult to evaluate the pure additive contribution of genetic variants to classically used demographic models. Since prediction models include some heritable traits, such as body mass index, the contribution of SNPs using unmatched case-control samples may be underestimated. In this article, we propose a method that uses propensity score matching to avoid underestimation by matching case and control samples, thereby determining the pure additive contribution of SNPs. To illustrate the proposed propensity score matching method, we used SNP data from the Korea Association Resources project and reported SNPs from the genome-wide association study catalog. We selected various SNP sets via stepwise logistic regression (SLR), least absolute shrinkage and selection operator (LASSO), and the elastic-net (EN) algorithm. Using these SNP sets, we made predictions using SLR, LASSO, and EN as logistic regression modeling techniques. The accuracy of the predictions was compared in terms of area under the receiver operating characteristic curve (AUC). The contribution of SNPs to T2D was evaluated by the difference in the AUC between models using only demographic variables and models that included the SNPs. The largest difference among our models showed that the AUC of the model using genetic variants with demographic variables could be 0.107 higher than that of the corresponding model using only demographic variables.

Evolutionary Concept Analysis of Korean Cancer Survivors' Happiness (한국 암생존자의 행복감에 대한 진화론적 개념분석)

  • Cho, HyeKyung;Song, MiSoon
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.8
    • /
    • pp.365-377
    • /
    • 2018
  • The purpose of the study was to analyze the concept of happiness that would be experienced in life of Korean cancer survivors. The concepts of happiness were analyzed using the Rogers' evolutionary concept analysis method for 11 papers satisfying the selection criteria, among domestic journals published from January 2000 to September 2017. The result of this study was that attributes of happiness were subjective experience, positive mind, meaning of life and formation of relationship. Antecedents were to accept-risk-of-life, cope-with-the-reality, implement-self-reflection and environmental-support. As a result of the concept, cancer survivors' inner strength increased, they became feeling love and pursuing new lives, and felt happiness through self-realization. This study is valuable to suggest a basic framework for the stepwise assessment that can improve the happiness of Korean cancer survivors and cancer survivors should be managed through multidisciplinary convergence programs.