• Title/Summary/Keyword: Logistic curve

Search Result 328, Processing Time 0.022 seconds

Pure additive contribution of genetic variants to a risk prediction model using propensity score matching: application to type 2 diabetes

  • Park, Chanwoo;Jiang, Nan;Park, Taesung
    • Genomics & Informatics
    • /
    • v.17 no.4
    • /
    • pp.47.1-47.12
    • /
    • 2019
  • The achievements of genome-wide association studies have suggested ways to predict diseases, such as type 2 diabetes (T2D), using single-nucleotide polymorphisms (SNPs). Most T2D risk prediction models have used SNPs in combination with demographic variables. However, it is difficult to evaluate the pure additive contribution of genetic variants to classically used demographic models. Since prediction models include some heritable traits, such as body mass index, the contribution of SNPs using unmatched case-control samples may be underestimated. In this article, we propose a method that uses propensity score matching to avoid underestimation by matching case and control samples, thereby determining the pure additive contribution of SNPs. To illustrate the proposed propensity score matching method, we used SNP data from the Korea Association Resources project and reported SNPs from the genome-wide association study catalog. We selected various SNP sets via stepwise logistic regression (SLR), least absolute shrinkage and selection operator (LASSO), and the elastic-net (EN) algorithm. Using these SNP sets, we made predictions using SLR, LASSO, and EN as logistic regression modeling techniques. The accuracy of the predictions was compared in terms of area under the receiver operating characteristic curve (AUC). The contribution of SNPs to T2D was evaluated by the difference in the AUC between models using only demographic variables and models that included the SNPs. The largest difference among our models showed that the AUC of the model using genetic variants with demographic variables could be 0.107 higher than that of the corresponding model using only demographic variables.

The Association between Obesity Indices in Adolescence and Carotid Intima-media Thickness in Young Adults: Kangwha Study (청소년기 비만지표와 초기 성인기 경동맥 내중막 두께와의 관련성: Kangwha Study)

  • Lee, Yoo-Jung;Nam, Chung-Mo;Kim, Hyeon-Chang;Hur, Nam-Wook;Suh, Il
    • Journal of Preventive Medicine and Public Health
    • /
    • v.41 no.2
    • /
    • pp.107-114
    • /
    • 2008
  • Objectives : The aim of this study is to investigate the association between obesity indices(body mass index, weight, waist-hip ratio and waist circumference) in adolescents and the carotid intima-media thickness (C-IMT) in early adulthood. We also wanted to identify the best predictor for C-IMT among these obesity indices. Methods : This study used community-based prospective cohort study, known as the Kangwha Study, and the data we used were from subjects who were 16-years old in 1996 (defined as "adolescence") and 25 years-old in 2005 (defined as "early adulthood"). The 256 subjects (113 men and 143 women) who were used for analysis participated in both follow-ups, and they underwent B-mode ultrasonography of the carotid arteries at the early adulthood follow-up. Obesity indices were defined as the body mass index, weight, waist-hip ratio and waist circumference. The C-IMT was defined as the mean of the maximal IMT of each common carotid artery. The C-IMT and obesity indices associations were evaluated via multivariable regression, logistic regression and the receiver-operator characteristic curve analyses. Results : In men, all the obesity indices in adolescence were showed to have statistically significant positive association with C-IMT in early adulthood. However, no such relationship was showed in women. On multiple regression and logistic regression analysis, the waist-hip ratio showed the biggest relationship with the C-IMT among the 4 obesity indices. However, there were no statistical significant differences and no best predictor was found. For the women, the obesity incidences and C-IMT showed no relationships. Conclusions : This study suggested that obesity in adolescence was related to an increase C-IMT in healthy young Korean men.

Self Introduction Essay Classification Using Doc2Vec for Efficient Job Matching (Doc2Vec 모형에 기반한 자기소개서 분류 모형 구축 및 실험)

  • Kim, Young Soo;Moon, Hyun Sil;Kim, Jae Kyeong
    • Journal of Information Technology Services
    • /
    • v.19 no.1
    • /
    • pp.103-112
    • /
    • 2020
  • Job seekers are making various efforts to find a good company and companies attempt to recruit good people. Job search activities through self-introduction essay are nowadays one of the most active processes. Companies spend time and cost to reviewing all of the numerous self-introduction essays of job seekers. Job seekers are also worried about the possibility of acceptance of their self-introduction essays by companies. This research builds a classification model and conducted an experiments to classify self-introduction essays into pass or fail using deep learning and decision tree techniques. Real world data were classified using stratified sampling to alleviate the data imbalance problem between passed self-introduction essays and failed essays. Documents were embedded using Doc2Vec method developed from existing Word2Vec, and they were classified using logistic regression analysis. The decision tree model was chosen as a benchmark model, and K-fold cross-validation was conducted for the performance evaluation. As a result of several experiments, the area under curve (AUC) value of PV-DM results better than that of other models of Doc2Vec, i.e., PV-DBOW and Concatenate. Furthmore PV-DM classifies passed essays as well as failed essays, while PV_DBOW can not classify passed essays even though it classifies well failed essays. In addition, the classification performance of the logistic regression model embedded using the PV-DM model is better than the decision tree-based classification model. The implication of the experimental results is that company can reduce the cost of recruiting good d job seekers. In addition, our suggested model can help job candidates for pre-evaluating their self-introduction essays.

Estimation of a Nationwide Statistics of Hernia Operation Applying Data Mining Technique to the National Health Insurance Database (데이터마이닝 기법을 이용한 건강보험공단의 수술 통계량 근사치 추정 -허니아 수술을 중심으로-)

  • Kang, Sung-Hong;Seo, Seok-Kyung;Yang, Yeong-Ja;Lee, Ae-Kyung;Bae, Jong-Myon
    • Journal of Preventive Medicine and Public Health
    • /
    • v.39 no.5
    • /
    • pp.433-437
    • /
    • 2006
  • Objectives: The aim of this study is to develop a methodology for estimating a nationwide statistic for hernia operations with using the claim database of the Korea Health Insurance Cooperation (KHIC). Methods: According to the insurance claim procedures, the claim database was divided into the electronic data interchange database (EDI_DB) and the sheet database (Paper_DB). Although the EDI_DB has operation and management codes showing the facts and kinds of operations, the Paper_DB doesn't. Using the hernia matched management code in the EDI_DB, the cases of hernia surgery were extracted. For drawing the potential cases from the Paper_DB, which doesn't have the code, the predictive model was developed using the data mining technique called SEMMA. The claim sheets of the cases that showed a predictive probability of an operation over the threshold, as was decided by the ROC curve, were identified in order to get the positive predictive value as an index of usefulness for the predictive model. Results: Of the claim databases in 2004, 14,386 cases had hernia related management codes with using the EDI system. For fitting the models with applying the data mining technique, logistic regression was chosen rather than the neural network method or the decision tree method. From the Paper_DB, 1,019 cases were extracted as potential cases. Direct review of the sheets of the extracted cases showed that the positive predictive value was 95.3%. Conclusions: The results suggested that applying the data mining technique to the claim database in the KHIC for estimating the nationwide surgical statistics would be useful from the aspect of execution and cost-effectiveness.

An Analysis of Nursing Needs for Hospitalized Cancer Patients;Using Data Mining Techniques (데이터 마이닝을 이용한 입원 암 환자 간호 중증도 예측모델 구축)

  • Park, Sun-A
    • Asian Oncology Nursing
    • /
    • v.5 no.1
    • /
    • pp.3-10
    • /
    • 2005
  • Back ground: Nurses now occupy one third of all hospital human resources. Therefore, efficient management of nursing manpower is getting more important. While it is very clear that nursing workload requirement analysis and patient severity classification should be done first for the efficient allocation of nursing workforce, these processes have been conducted manually with ad hoc rule. Purposes: This study was tried to make a predict model for patient classification according to nursing need. We tried to find the easier and faster method to classify nursing patients that can help efficient management of nursing manpower. Methods: The nursing patient classifications data of the hospitalized cancer patients in one of the biggest cancer center in Korea during 2003.1.1-2003.12.31 were assessed by trained nurses. This study developed a prediction model and analyzing nursing needs by data mining techniques. Patients were classified by three different data mining techniques, (Logistic regression, Decision tree and Neural network) and the results were assessed. Results: The data set was created using 165,073 records of 2,228 patients classification database. Main explaining variables were as follows in 3 different data mining techniques. 1) Logistic regression : age, month and section. 2) Decision tree : section, month, age and tumor. 3) Neural network : section, diagnosis, age, sex, metastasis, hospital days and month. Among these three techniques, neural network showed the best prediction power in ROC curve verification. As the result of the patient classification prediction model developed by neural network based on nurse needs, the prediction accuracy was 84.06%. Conclusion: The patient classification prediction model was developed and tested in this study using real patients data. The result can be employed for more accurate calculation of required nursing staff and effective use of labor force.

  • PDF

A Survival Prediction Model of Rats in Uncontrolled Acute Hemorrhagic Shock Using the Random Forest Classifier (랜덤 포리스트를 이용한 비제어 급성 출혈성 쇼크의 흰쥐에서의 생존 예측)

  • Choi, J.Y.;Kim, S.K.;Koo, J.M.;Kim, D.W.
    • Journal of Biomedical Engineering Research
    • /
    • v.33 no.3
    • /
    • pp.148-154
    • /
    • 2012
  • Hemorrhagic shock is a primary cause of deaths resulting from injury in the world. Although many studies have tried to diagnose accurately hemorrhagic shock in the early stage, such attempts were not successful due to compensatory mechanisms of humans. The objective of this study was to construct a survival prediction model of rats in acute hemorrhagic shock using a random forest (RF) model. Heart rate (HR), mean arterial pressure (MAP), respiration rate (RR), lactate concentration (LC), and peripheral perfusion (PP) measured in rats were used as input variables for the RF model and its performance was compared with that of a logistic regression (LR) model. Before constructing the models, we performed 5-fold cross validation for RF variable selection, and forward stepwise variable selection for the LR model to examine which variables were important for the models. For the LR model, sensitivity, specificity, accuracy, and area under the receiver operating characteristic curve (ROC-AUC) were 0.83, 0.95, 0.88, and 0.96, respectively. For the RF models, sensitivity, specificity, accuracy, and AUC were 0.97, 0.95, 0.96, and 0.99, respectively. In conclusion, the RF model was superior to the LR model for survival prediction in the rat model.

Trajectories of Self-rated Health among One-person Households: A Latent Class Growth Analysis (1인가구의 주관적 건강상태 변화: 잠재계층성장모형을 활용하여)

  • Kim, Eunjoo;Kim, Hyang;Yoon, Ju Young
    • Research in Community and Public Health Nursing
    • /
    • v.30 no.4
    • /
    • pp.449-459
    • /
    • 2019
  • Purpose: The aim of this study is to explore different types of self-rated health trajectories among one-person households in Korea. Methods: We used five time-point data derived from Korea Health Panel (2011~2015). A latent growth curve modeling was used to assess the overall feature of self-rated health trajectory in one-person households, and a latent class growth modeling was used to determine the number and shape of trajectories. We then applied multinomial logistic regression on each class to explore the predicting variables. Results: We found that the overall slope of self-rated health in one-person households decreases. In addition, latent class analysis demonstrated three classes: 1) High-Decreasing class (i.e., high intercept, significantly decreasing slope), 2) Moderate-Decreasing class (i.e., average intercept, significantly decreasing slope), and 3) Low-Stable class (i.e., low intercept, flat and nonsignificant slope). The multinomial logistic regression analysis showed that the predictors of each class were different. Especially, one-person households with poor health condition early were at greater risk of being Low-Stable class compared with High-Decreasing class group. Conclusion: The findings of this study demonstrate that more attentions to one-person households are needed to promote their health status. Policymakers may develop different health and welfare programs depending on different characteristics of one-person household trajectory groups in Korea.

Comparison of Two Ovarian Malignancy Prediction Models Based on Age Sonographic Findings and Serum Ca125 Measurement

  • Arab, Maliheh;Yaseri, Mehdi;Ashrafganjoi, Tahereh;Maktabi, Maryam;Noghabaee, Giti;Sheibani, Kourosh
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.13 no.8
    • /
    • pp.4199-4202
    • /
    • 2012
  • Objective: The aim of our study is to compare an ovarian malignancy prediction model based on age and four sonographic findings (OMPS1) with a new model called OMPS2 which differs just by adding serum CA125 measurement to (OMPS1). Methods: In a cross sectional comparative study OMPS1 was validated in 830 operated ovarian masses within a 3 years period (2006-2009). Logistic regression analysis was used to construct OMPS2 based on OMPS1 adding serum CA125 findings. The area under the curve for two models was compared in 411 patients. Results: OMPS2 was calculated as follows: OMPS1 + 1.444 (if serum CA125= 36-200) or 3.842 (if serum CA125 is more than 200). AUC of OMPS2 was increased to 84.3% (CI 95% 78.1- 89.8) in comparison to OMPS1 with AUC of 78.1% (CI 95% 71.8-84.5). Conclusion: Our second model is more accurate in prediction of ovarian malignancy, compared with our first model.

A Study on a Long-term Demand Forecasting and Characterization of Diffusion Process for Medical Equipments based on Diffusion Model (확산 모형에 의한 고가 의료기기의 수요 확산의 특성분석 및 중장기 수요예측에 관한 연구)

  • Hong, Jung-Sik;Kim, Tae-Gu;Lim, Dar-Oh
    • Health Policy and Management
    • /
    • v.18 no.4
    • /
    • pp.85-110
    • /
    • 2008
  • In this study, we explore the long-term demand forecasting of high-price medical equipments based on logistic and Bass diffusion model. We analyze the specific pattern of each equipment's diffusion curve by interpreting the parameter estimates of Bass diffusion model. Our findings are as follows. First, ultrasonic imaging system, CT are in the stage of maturity and so, the future demands of them are not too large. Second, medical image processing unit is between growth stage and maturity stage and so, the demand is expected to increase considerably for two or three years. Third, MRI is in the stage of take-off and Mammmography X-ray system is in the stage of maturity but, estimates of the potential number of adopters based on logistic model is considerably different to that based on Bass diffusion model. It means that additional data for these two equipments should be collected and analyzed to obtain the reliable estimates of their demands. Fourth, medical image processing unit have the largest q value. It means that the word-of-mouth effect is important in the diffusion of this equipment. Fifth, for MRI and Ultrasonic system, q/p values have the relatively large value. It means that collective power has an important role in adopting these two equipments.

Artificial Neural Network for Prediction of Distant Metastasis in Colorectal Cancer

  • Biglarian, Akbar;Bakhshi, Enayatollah;Gohari, Mahmood Reza;Khodabakhshi, Reza
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.13 no.3
    • /
    • pp.927-930
    • /
    • 2012
  • Background and Objectives: Artificial neural networks (ANNs) are flexible and nonlinear models which can be used by clinical oncologists in medical research as decision making tools. This study aimed to predict distant metastasis (DM) of colorectal cancer (CRC) patients using an ANN model. Methods: The data of this study were gathered from 1219 registered CRC patients at the Research Center for Gastroenterology and Liver Disease of Shahid Beheshti University of Medical Sciences, Tehran, Iran (January 2002 and October 2007). For prediction of DM in CRC patients, neural network (NN) and logistic regression (LR) models were used. Then, the concordance index (C index) and the area under receiver operating characteristic curve (AUROC) were used for comparison of neural network and logistic regression models. Data analysis was performed with R 2.14.1 software. Results: The C indices of ANN and LR models for colon cancer data were calculated to be 0.812 and 0.779, respectively. Based on testing dataset, the AUROC for ANN and LR models were 0.82 and 0.77, respectively. This means that the accuracy of ANN prediction was better than for LR prediction. Conclusion: The ANN model is a suitable method for predicting DM and in that case is suggested as a good classifier that usefulness to treatment goals.