• Title/Summary/Keyword: 로지스틱모델

Search Result 239, Processing Time 0.026 seconds

Development of Type 2 Prediction Prediction Based on Big Data (빅데이터 기반 2형 당뇨 예측 알고리즘 개발)

  • Hyun Sim;HyunWook Kim
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.5
    • /
    • pp.999-1008
    • /
    • 2023
  • Early prediction of chronic diseases such as diabetes is an important issue, and improving the accuracy of diabetes prediction is especially important. Various machine learning and deep learning-based methodologies are being introduced for diabetes prediction, but these technologies require large amounts of data for better performance than other methodologies, and the learning cost is high due to complex data models. In this study, we aim to verify the claim that DNN using the pima dataset and k-fold cross-validation reduces the efficiency of diabetes diagnosis models. Machine learning classification methods such as decision trees, SVM, random forests, logistic regression, KNN, and various ensemble techniques were used to determine which algorithm produces the best prediction results. After training and testing all classification models, the proposed system provided the best results on XGBoost classifier with ADASYN method, with accuracy of 81%, F1 coefficient of 0.81, and AUC of 0.84. Additionally, a domain adaptation method was implemented to demonstrate the versatility of the proposed system. An explainable AI approach using the LIME and SHAP frameworks was implemented to understand how the model predicts the final outcome.

Metabolic Diseases Classification Models according to Food Consumption using Machine Learning (머신러닝을 활용한 식품소비에 따른 대사성 질환 분류 모델)

  • Hong, Jun Ho;Lee, Kyung Hee;Lee, Hye Rim;Cheong, Hwan Suk;Cho, Wan-Sup
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.3
    • /
    • pp.354-360
    • /
    • 2022
  • Metabolic disease is a disease with a prevalence of 26% in Korean, and has three of the five states of abdominal obesity, hypertension, hunger glycemic disorder, high neutral fat, and low HDL cholesterol at the same time. This paper links the consumer panel data of the Rural Development Agency(RDA) and the medical care data of the National Health Insurance Service(NHIS) to generate a classification model that can be divided into a metabolic disease group and a control group through food consumption characteristics, and attempts to compare the differences. Many existing domestic and foreign studies related to metabolic diseases and food consumption characteristics are disease correlation studies of specific food groups and specific ingredients, and this paper is logistic considering all food groups included in the general diet. We created a classification model using regression, a decision tree-based classification model, and a classification model using XGBoost. Of the three models, the high-precision model is the XGBoost classification model, but the accuracy was not high at less than 0.7. As a future study, it is necessary to extend the observation period for food consumption in the patient group to more than 5 years and to study the metabolic disease classification model after converting the food consumed into nutritional characteristics.

Study of the Factors affecting Unmet Medical Needs in Patients with Cerebrovascular Diseases (뇌혈관질환자의 미 충족 의료에 미치는 영향요인 연구)

  • Lee, Jeong Wook
    • Journal of Digital Convergence
    • /
    • v.16 no.9
    • /
    • pp.279-291
    • /
    • 2018
  • This study is designed to demonstrate risk factors of unmet medical care for people with cerebrovascular disease. To do this, statistical analysis was performed by using hierarchical logistic regression analysis with SPSS/WIN24.0 program using Korean Medical Panel data in 2014. In the final model of the hierarchical logistic regression analysis, which is based on Anderson's Model, adjusted for the factors of the predisposing and enabling factors, the explanatory variables affecting the unmet medical development are gender, economic activity, income level, the experience of lying in a sickbed, restriction on activity, subjective health condition, and the number of chronic diseases. Based on the results of this study, the practical and policy implications for the effective management and treatment of cerebrovascular disease should be included in the countermeasures for cerebrovascular disease, a strategy to reduce the unmet medical incidence of cerebrovascular disease, in order to meet the medical needs, the necessity of comprehensive measures considering various dimensions of variables and the influential variables of unmet medical emergence have been suggested for the necessity of making a detailed service manual that can improve accessibility to medical services.

A Study for Building Credit Scoring Model using Enterprise Human Resource Factors (기업 인적자원 관련 변수를 이용한 기업 신용점수 모형 구축에 관한 연구)

  • Lee, Yung-Seop;Park, Joo-Wan
    • The Korean Journal of Applied Statistics
    • /
    • v.20 no.3
    • /
    • pp.423-440
    • /
    • 2007
  • Although various models have been developed to establish the enterprise credit scoring, no model has utilized the enterprise human resource so far. The purpose of this study was to build an enterprise credit scoring model using enterprise human resource factors. The data to measure the enterprise credit score were made by the first-year research material of HCCP was used to investigate the enterprise human resource and 2004 Credit Rating Score generated from KIS-Credit Scoring Model. The independent variables were chosen among questionnaires of HCCP based on Mclagan(1989)'s HR wheel model, and the credit score of Korean Information Service was used for the dependent variables. The statistical method used for data analysis was logistic regression. As a result of constructing a model, 22 variables were selected. To see these specifically by each large area, 6 variables in human resource development(HRD) area, 15 in human resource management(HRM) area, and 1 in the other area were chosen. As a consequence of 10 fold cross validation, misclassification rate and G-mean were 30.81 and 68.27 respectively. Decile having the highest response rate was bigger than the one having the lowest response rate by 6.08 times, and had a tendency to decrease. Therefore, the result of study showed that the proposed model was appropriate to measure enterprise credit score using enterprise human resource variables.

Longitudinal Patterns of Stages of Changes in Smoking Behaviors among Korean Adult Smokers: Applying the Transtheoretical Model of Change (범이론적 모델에 기반을 둔 흡연자의 금연행동 변화단계에 대한 탐색적 연구)

  • Park, Hyunyong;Jun, Jina;Sohn, Sunju
    • Korean Journal of Social Welfare Studies
    • /
    • v.49 no.1
    • /
    • pp.5-28
    • /
    • 2018
  • Smoking is one of the important public health concerns because it is preventable causes regarding individuals' negative health consequences and increased social and economic cost. However, few studies have examined longitudinal patterns of stages of changes(SOC) in smoking behaviors among the general population. The purpose of the study is to explore the latent patterns of SOC over time among Korean adult smokers using the 2008-2016 Korea Welfare Panel Study. A repeated measure latent class analysis is employed in the present study. The finding of the present study are as follows: First, four latent groups were identified: (1) action/maintenance stage(33.6%), (2) contemplation/preparation to action/maintenance stage(14.8%), (3) continuously contemplation/preparation stage(29.6%), and (4) continuously pre-contemplation stage(22.1%). Second, the results of a multinomial logistic regression found that socio-demographic and clinical characteristics were associated with the identified longitudinal patterns of smoking behaviors. Compared to a continuously pre-contemplation stage, higher levels of depressive symptoms and drinking behavior were associated with increased odds of being in action/maintenance stage. The findings of the present study highlight that a tailored intervention is needed for individuals with continuously pre-contemplation stage and contemplation stage.

Classification Model of Chronic Gastritis According to The Feature Extraction Method of Radial Artery Pulse Signal (맥파의 특징점 추출 방법에 따른 만성위염 판별 모형)

  • Choi, Sang-Ho;Shin, Ki-Young;Kim, Jeauk;Jin, Seung-Oh;Lee, Tea-Bum
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.1
    • /
    • pp.185-194
    • /
    • 2014
  • One in every 10 persons suffer from chronic gastritis in Korea. Endoscopy is most commonly used to diagnose the chronic gastritis. Endoscopic diagnosis is precise but it is accompanied with pain and high cost. According to pulse diagnosis in Traditional East Asian Medicine, health problems in stomach can be diagnosed with radial pulse signals in 'Guan' location in the right wrist, which are non-invasive and cost-effective. In this study, we developed a classification model of chronic gastritis using pulse signals in right 'Guan' location. We used both linear discrimination method and logistic regression model with respect to pulse features obtained with a peak-valley detection algorithm and a Gaussian model. As a result, we obtained sensitivity ranged between 77%~89% and specificity ranged between 72%~83% depending on classification models and feature extraction methods, and the average classification rates were approximately 80%, irrespective of the models. Specifically, the Gaussian model were featured by superior sensitivities (89.1% and 87.5%) while the peak-valley detection method showed superior specificities (82.8% and 81.3%), and the average classification rate (sensitivity + specificity) of the Gaussian model was 80.9% which was 1.2% ahead of the peak-valley method. In conclusion, we obtained a reliable classification model for the chronic gastritis based on the radial pulse feature extraction algorithms, where the Gaussian model was featured by outperformed sensitivity and the peak-valley method was featured by outperformed specificity.

A Study on Injury Severity Prediction for Car-to-Car Traffic Accidents (차대차 교통사고에 대한 상해 심각도 예측 연구)

  • Ko, Changwan;Kim, Hyeonmin;Jeong, Young-Seon;Kim, Jaehee
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.19 no.4
    • /
    • pp.13-29
    • /
    • 2020
  • Automobiles have long been an essential part of daily life, but the social costs of car traffic accidents exceed 9% of the national budget of Korea. Hence, it is necessary to establish prevention and response system for car traffic accidents. In order to present a model that can classify and predict the degree of injury in car traffic accidents, we used big data analysis techniques of K-nearest neighbor, logistic regression analysis, naive bayes classifier, decision tree, and ensemble algorithm. The performances of the models were analyzed by using the data on the nationwide traffic accidents over the past three years. In particular, considering the difference in the number of data among the respective injury severity levels, we used down-sampling methods for the group with a large number of samples to enhance the accuracy of the classification of the models and then verified the statistical significance of the models using ANOVA.

Feature selection and prediction modeling of drug responsiveness in Pharmacogenomics (약물유전체학에서 약물반응 예측모형과 변수선택 방법)

  • Kim, Kyuhwan;Kim, Wonkuk
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.2
    • /
    • pp.153-166
    • /
    • 2021
  • A main goal of pharmacogenomics studies is to predict individual's drug responsiveness based on high dimensional genetic variables. Due to a large number of variables, feature selection is required in order to reduce the number of variables. The selected features are used to construct a predictive model using machine learning algorithms. In the present study, we applied several hybrid feature selection methods such as combinations of logistic regression, ReliefF, TurF, random forest, and LASSO to a next generation sequencing data set of 400 epilepsy patients. We then applied the selected features to machine learning methods including random forest, gradient boosting, and support vector machine as well as a stacking ensemble method. Our results showed that the stacking model with a hybrid feature selection of random forest and ReliefF performs better than with other combinations of approaches. Based on a 5-fold cross validation partition, the mean test accuracy value of the best model was 0.727 and the mean test AUC value of the best model was 0.761. It also appeared that the stacking models outperform than single machine learning predictive models when using the same selected features.

The Effect of Experienced Consumers' Concerns on Willingness to Purchase Battery Electric Vehicles (순수전기차 경험 고객의 우려 요인에 따른 전기차 구매 의사 영향)

  • Jeong, Jikhan
    • Journal of Digital Convergence
    • /
    • v.19 no.6
    • /
    • pp.143-162
    • /
    • 2021
  • Research on consumers' perception and willingness to purchase Battery Electric Vehicles (BEVs) is necessary to simulate BEVs' deployment in South Korea because South Korea's BEVs market is still in the early stage. This paper derives a theoretical framework for consumer segmentation based on consumers' willingness to purchase before and after BEV usage experience. In particular, this study empirically evaluates consumers' willingness to purchase and concerns using the survey data from BEVs users in either Seoul or the Jeju region. The empirical results from logit models show that experienced consumers' concerns about the heater and air conditioning (HAC) in BEVs decreased the consumers' willingness to buy, while greater daily driving distances increased the consumers' willingness to buy. In addition, the empirical findings from ordered probit models show that experienced consumers' concerns about the short driving distance, the availability of maintenance service (i.e., A/S service) during unexpected events, and the difficulties of driving BEVs up-hill increased the degree of concern about HAC. This paper will provide insights related to consumer segmentation, R&D, marketing strategies, and policy design for policymakers and firms.

Predicting Corporate Bankruptcy using Simulated Annealing-based Random Fores (시뮬레이티드 어니일링 기반의 랜덤 포레스트를 이용한 기업부도예측)

  • Park, Hoyeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.155-170
    • /
    • 2018
  • Predicting a company's financial bankruptcy is traditionally one of the most crucial forecasting problems in business analytics. In previous studies, prediction models have been proposed by applying or combining statistical and machine learning-based techniques. In this paper, we propose a novel intelligent prediction model based on the simulated annealing which is one of the well-known optimization techniques. The simulated annealing is known to have comparable optimization performance to the genetic algorithms. Nevertheless, since there has been little research on the prediction and classification of business decision-making problems using the simulated annealing, it is meaningful to confirm the usefulness of the proposed model in business analytics. In this study, we use the combined model of simulated annealing and machine learning to select the input features of the bankruptcy prediction model. Typical types of combining optimization and machine learning techniques are feature selection, feature weighting, and instance selection. This study proposes a combining model for feature selection, which has been studied the most. In order to confirm the superiority of the proposed model in this study, we apply the real-world financial data of the Korean companies and analyze the results. The results show that the predictive accuracy of the proposed model is better than that of the naïve model. Notably, the performance is significantly improved as compared with the traditional decision tree, random forests, artificial neural network, SVM, and logistic regression analysis.