• 제목/요약/키워드: logistic classification

검색결과 383건 처리시간 0.026초

벌점 부분최소자승법을 이용한 분류방법 (A new classification method using penalized partial least squares)

  • 김윤대;전치혁;이혜선
    • Journal of the Korean Data and Information Science Society
    • /
    • 제22권5호
    • /
    • pp.931-940
    • /
    • 2011
  • 분류분석은 학습표본으로부터 분류규칙을 도출한 후 새로운 표본에 적용하여 특정 범주로 분류하는 방법이다. 데이터의 복잡성에 따라 다양한 분류분석 방법이 개발되어 왔지만, 데이터 차원이 높고 변수간 상관성이 높은 경우 정확하게 분류하는 것은 쉽지 않다. 본 연구에서는 데이터차원이 상대적으로 높고 변수간 상관성이 높을 때 강건한 분류방법을 제안하고자 한다. 부분최소자승법은 연속형데이터에 사용되는 기법으로서 고차원이면서 독립변수간 상관성이 높을 때 예측력이 높은 통계기법으로 알려져 있는 다변량 분석기법이다. 벌점 부분최소자승법을 이용한 분류방법을 실제데이터와 시뮬레이션을 적용하여 성능을 비교하고자 한다.

Development of a Metabolic Syndrome Classification and Prediction Model for Koreans Using Deep Learning Technology: The Korea National Health and Nutrition Examination Survey (KNHANES) (2013-2018)

  • Hyerim Kim;Ji Hye Heo;Dong Hoon Lim;Yoona Kim
    • Clinical Nutrition Research
    • /
    • 제12권2호
    • /
    • pp.138-153
    • /
    • 2023
  • The prevalence of metabolic syndrome (MetS) and its cost are increasing due to lifestyle changes and aging. This study aimed to develop a deep neural network model for prediction and classification of MetS according to nutrient intake and other MetS-related factors. This study included 17,848 individuals aged 40-69 years from the Korea National Health and Nutrition Examination Survey (2013-2018). We set MetS (3-5 risk factors present) as the dependent variable and 52 MetS-related factors and nutrient intake variables as independent variables in a regression analysis. The analysis compared and analyzed model accuracy, precision and recall by conventional logistic regression, machine learning-based logistic regression and deep learning. The accuracy of train data was 81.2089, and the accuracy of test data was 81.1485 in a MetS classification and prediction model developed in this study. These accuracies were higher than those obtained by conventional logistic regression or machine learning-based logistic regression. Precision, recall, and F1-score also showed the high accuracy in the deep learning model. Blood alanine aminotransferase (β = 12.2035) level showed the highest regression coefficient followed by blood aspartate aminotransferase (β = 11.771) level, waist circumference (β = 10.8555), body mass index (β = 10.3842), and blood glycated hemoglobin (β = 10.1802) level. Fats (cholesterol [β = -2.0545] and saturated fatty acid [β = -2.0483]) showed high regression coefficients among nutrient intakes. The deep learning model for classification and prediction on MetS showed a higher accuracy than conventional logistic regression or machine learning-based logistic regression.

Sparse Multinomial Kernel Logistic Regression

  • Shim, Joo-Yong;Bae, Jong-Sig;Hwang, Chang-Ha
    • Communications for Statistical Applications and Methods
    • /
    • 제15권1호
    • /
    • pp.43-50
    • /
    • 2008
  • Multinomial logistic regression is a well known multiclass classification method in the field of statistical learning. More recently, the development of sparse multinomial logistic regression model has found application in microarray classification, where explicit identification of the most informative observations is of value. In this paper, we propose a sparse multinomial kernel logistic regression model, in which the sparsity arises from the use of a Laplacian prior and a fast exact algorithm is derived by employing a bound optimization approach. Experimental results are then presented to indicate the performance of the proposed procedure.

기후변화를 통한 코로나바이러스감염증-19 추정 및 분류: 2018년도 이후 기상데이터를 중심으로 (Estimation and Classification of COVID-19 through Climate Change: Focusing on Weather Data since 2018)

  • 김윤수;장인홍;송광윤
    • 통합자연과학논문집
    • /
    • 제14권2호
    • /
    • pp.41-49
    • /
    • 2021
  • The causes of climate change are natural and artificial. Natural causes include changes in temperature and sunspot activities caused by changes in solar radiation due to large-scale volcanic activities, while artificial causes include increased greenhouse gas concentrations and land use changes. Studies have shown that excessive carbon use among artificial causes has accelerated global warming. Climate change is rapidly under way because of this. Due to climate change, the frequency and cycle of infectious disease viruses are greater and faster than before. Currently, the world is suffering greatly from coronavirus infection-19 (COVID-19). Korea is no exception. The first confirmed case occurred on January 20, 2020, and the number of infected people has steadily increased due to several waves since then, and many confirmed cases are occurring in 2021. In this study, we conduct a study on climate change before and after COVID-19 using weather data from Korea to determine whether climate change affects infectious disease viruses through logistic regression analysis. Based on this, we want to classify before and after COVID-19 through a logistic regression model to see how much classification rate we have. In addition, we compare monthly classification rates to see if there are seasonal classification differences.

Hand-crafted 특징 및 머신 러닝 기반의 은하 이미지 분류 기법 개발 (Development of Galaxy Image Classification Based on Hand-crafted Features and Machine Learning)

  • 오윤주;정희철
    • 대한임베디드공학회논문지
    • /
    • 제16권1호
    • /
    • pp.17-27
    • /
    • 2021
  • In this paper, we develop a galaxy image classification method based on hand-crafted features and machine learning techniques. Additionally, we provide an empirical analysis to reveal which combination of the techniques is effective for galaxy image classification. To achieve this, we developed a framework which consists of four modules such as preprocessing, feature extraction, feature post-processing, and classification. Finally, we found that the best technique for galaxy image classification is a method to use a median filter, ORB vector features and a voting classifier based on RBF SVM, random forest and logistic regression. The final method is efficient so we believe that it is applicable to embedded environments.

Prediction of Hypertension Complications Risk Using Classification Techniques

  • Lee, Wonji;Lee, Junghye;Lee, Hyeseon;Jun, Chi-Hyuck;Park, Il-Su;Kang, Sung-Hong
    • Industrial Engineering and Management Systems
    • /
    • 제13권4호
    • /
    • pp.449-453
    • /
    • 2014
  • Chronic diseases including hypertension and its complications are major sources causing the national medical expenditures to increase. We aim to predict the risk of hypertension complications for hypertension patients, using the sample national healthcare database established by Korean National Health Insurance Corporation. We apply classification techniques, such as logistic regression, linear discriminant analysis, and classification and regression tree to predict the hypertension complication onset event for each patient. The performance of these three methods is compared in terms of accuracy, sensitivity and specificity. The result shows that these methods seem to perform similarly although the logistic regression performs marginally better than the others.

로지스틱모형을 이용한 가로구간 사고모형 (Accidents Model of Arterial Link Sections by Logistic Model)

  • 박병호;임진강;한수산
    • 한국안전학회지
    • /
    • 제25권4호
    • /
    • pp.90-95
    • /
    • 2010
  • This study deals with the accident model of arterial link section in Cheongju. The objective is to develop the accident model of arterial link section using the logistic regression. In pursuing the above, the study uses the 258 accident data occurred at the 322 arterial link section. The main results are as follows. First, Nagellerke $R^2$ of developed accident model is analyzed to be 0.309 and t-values of variable that explains goodness of fit are evaluated to be significant. Second, the variables adopted in the model are AADT, the number of exit and entry. These variables are all analyzed to be statistically significant. Finally, the analysis of correct classification rate shows that the total accident of correct classification rate is analyzed to be 72.7% at the arterial link section.

Making Thoughts Real - a Machine Learning Approach for Brain-Computer Interface Systems

  • Tengis Tserendondog;Uurstaikh Luvsansambuu;Munkhbayar Bat-Erdende;Batmunkh Amar
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제15권2호
    • /
    • pp.124-132
    • /
    • 2023
  • In this paper, we present a simple classification model based on statistical features and demonstrate the successful implementation of a brain-computer interface (BCI) based light on/off control system. This research shows study and development of light on/off control system based on BCI technology, which allows the users to control switching a lamp using electroencephalogram (EEG) signals. The logistic regression algorithm is used for classification of the EEG signal to convert it into light on, light off control commands. Training data were collected using 14-channel BCI system which records the brain signals of participants watching a screen with flickering lights and saves the data into .csv file for future analysis. After extracting a number of features from the data and performing classification using logistic regression, we created commands to switch on a physical lamp and tested it in a real environment. Logistic regression allowed us to quite accurately classify the EEG signals based on the user's mental state and we were able to classify the EEG signals with 82.5% accuracy, producing reliable commands for turning on and off the light.

범주형 자료에 대한 데이터 마이닝 분류기법 성능 비교 (Comparison of Data Mining Classification Algorithms for Categorical Feature Variables)

  • 손소영;신형원
    • 산업공학
    • /
    • 제12권4호
    • /
    • pp.551-556
    • /
    • 1999
  • In this paper, we compare the performance of three data mining classification algorithms(neural network, decision tree, logistic regression) in consideration of various characteristics of categorical input and output data. $2^{4-1}$. 3 fractional factorial design is used to simulate the comparison situation where factors used are (1) the categorical ratio of input variables, (2) the complexity of functional relationship between the output and input variables, (3) the size of randomness in the relationship, (4) the categorical ratio of an output variable, and (5) the classification algorithm. Experimental study results indicate the following: decision tree performs better than the others when the relationship between output and input variables is simple while logistic regression is better when the other way is around; and neural network appears a better choice than the others when the randomness in the relationship is relatively large. We also use Taguchi design to improve the practicality of our study results by letting the relationship between the output and input variables as a noise factor. As a result, the classification accuracy of neural network and decision tree turns out to be higher than that of logistic regression, when the categorical proportion of the output variable is even.

  • PDF

Supervised Learning-Based Collaborative Filtering Using Market Basket Data for the Cold-Start Problem

  • Hwang, Wook-Yeon;Jun, Chi-Hyuck
    • Industrial Engineering and Management Systems
    • /
    • 제13권4호
    • /
    • pp.421-431
    • /
    • 2014
  • The market basket data in the form of a binary user-item matrix or a binary item-user matrix can be modelled as a binary classification problem. The binary logistic regression approach tackles the binary classification problem, where principal components are predictor variables. If users or items are sparse in the training data, the binary classification problem can be considered as a cold-start problem. The binary logistic regression approach may not function appropriately if the principal components are inefficient for the cold-start problem. Assuming that the market basket data can also be considered as a special regression problem whose response is either 0 or 1, we propose three supervised learning approaches: random forest regression, random forest classification, and elastic net to tackle the cold-start problem, comparing the performance in a variety of experimental settings. The experimental results show that the proposed supervised learning approaches outperform the conventional approaches.