• Title/Summary/Keyword: Logistic Regression Model

Search Result 1,534, Processing Time 0.033 seconds

A Exploratory Study on Multiple Trajectories of Life Satisfaction During Retirement Transition: Applied Latent Class Growth Analysis (은퇴 전후 생활만족도의 다중 변화궤적에 관한 탐색적 연구: 잠재집단성장모형을 중심으로)

  • Kang, Eun-Na
    • Korean Journal of Social Welfare Studies
    • /
    • v.44 no.3
    • /
    • pp.85-112
    • /
    • 2013
  • This study aims to understand the developmental trajectories of life satisfaction among retirees and to examine what factors differentiate different trajectory classes. This study used three waves of longitudinal data from Korean Retirement and Income Study and data collected every two years(2005, 2007, and 2009). Subjects were respondents aged 50-69 who identified to be retired between wave 1 and wave 2. Finally, this study used 243 respondents for final data analysis. Life satisfaction was measured by seven items. The latent class growth model and multiple logistic regression model were used for data analysis. This study identified three distinct trajectory classes: high stable class(47.7%), high at the early stage but decreased class(42.8%), and low at the early stage and then decreased class(9.5%). This study founded that approximately 50% of the retirees experienced the decline of life satisfaction after retirement and about 10% of the sample was the most vulnerable group. This study analyzed what factors make different among the distinct trajectory groups. As a results, retirees who experienced the improvement in health change were more likely to be in 'high stable class' compared to 'hight at the early stage but decreased class'. In addition, retirees who were less educated, maintained the same health status rather than the improvement, worked as a temporary or a day laborer, and had less household income were more likely to belong to 'low at the early stage and then decreased class' relative to 'high stable class'. This study suggests that there are distinct three trajectories on life satisfaction among the retirees and finds out factors differentiating between trajectory groups. Based on these findings, the study discusses the implications for social work practice and further study.

The Effect of Meta-Features of Multiclass Datasets on the Performance of Classification Algorithms (다중 클래스 데이터셋의 메타특징이 판별 알고리즘의 성능에 미치는 영향 연구)

  • Kim, Jeonghun;Kim, Min Yong;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.23-45
    • /
    • 2020
  • Big data is creating in a wide variety of fields such as medical care, manufacturing, logistics, sales site, SNS, and the dataset characteristics are also diverse. In order to secure the competitiveness of companies, it is necessary to improve decision-making capacity using a classification algorithm. However, most of them do not have sufficient knowledge on what kind of classification algorithm is appropriate for a specific problem area. In other words, determining which classification algorithm is appropriate depending on the characteristics of the dataset was has been a task that required expertise and effort. This is because the relationship between the characteristics of datasets (called meta-features) and the performance of classification algorithms has not been fully understood. Moreover, there has been little research on meta-features reflecting the characteristics of multi-class. Therefore, the purpose of this study is to empirically analyze whether meta-features of multi-class datasets have a significant effect on the performance of classification algorithms. In this study, meta-features of multi-class datasets were identified into two factors, (the data structure and the data complexity,) and seven representative meta-features were selected. Among those, we included the Herfindahl-Hirschman Index (HHI), originally a market concentration measurement index, in the meta-features to replace IR(Imbalanced Ratio). Also, we developed a new index called Reverse ReLU Silhouette Score into the meta-feature set. Among the UCI Machine Learning Repository data, six representative datasets (Balance Scale, PageBlocks, Car Evaluation, User Knowledge-Modeling, Wine Quality(red), Contraceptive Method Choice) were selected. The class of each dataset was classified by using the classification algorithms (KNN, Logistic Regression, Nave Bayes, Random Forest, and SVM) selected in the study. For each dataset, we applied 10-fold cross validation method. 10% to 100% oversampling method is applied for each fold and meta-features of the dataset is measured. The meta-features selected are HHI, Number of Classes, Number of Features, Entropy, Reverse ReLU Silhouette Score, Nonlinearity of Linear Classifier, Hub Score. F1-score was selected as the dependent variable. As a result, the results of this study showed that the six meta-features including Reverse ReLU Silhouette Score and HHI proposed in this study have a significant effect on the classification performance. (1) The meta-features HHI proposed in this study was significant in the classification performance. (2) The number of variables has a significant effect on the classification performance, unlike the number of classes, but it has a positive effect. (3) The number of classes has a negative effect on the performance of classification. (4) Entropy has a significant effect on the performance of classification. (5) The Reverse ReLU Silhouette Score also significantly affects the classification performance at a significant level of 0.01. (6) The nonlinearity of linear classifiers has a significant negative effect on classification performance. In addition, the results of the analysis by the classification algorithms were also consistent. In the regression analysis by classification algorithm, Naïve Bayes algorithm does not have a significant effect on the number of variables unlike other classification algorithms. This study has two theoretical contributions: (1) two new meta-features (HHI, Reverse ReLU Silhouette score) was proved to be significant. (2) The effects of data characteristics on the performance of classification were investigated using meta-features. The practical contribution points (1) can be utilized in the development of classification algorithm recommendation system according to the characteristics of datasets. (2) Many data scientists are often testing by adjusting the parameters of the algorithm to find the optimal algorithm for the situation because the characteristics of the data are different. In this process, excessive waste of resources occurs due to hardware, cost, time, and manpower. This study is expected to be useful for machine learning, data mining researchers, practitioners, and machine learning-based system developers. The composition of this study consists of introduction, related research, research model, experiment, conclusion and discussion.

The Association of Oral Impacts on Daily Performances for Children (C-OIDP), Oral Health Condition and Oral Health-Related Behaviors (어린이 일상생활구강영향지수(C-OIDP)와 구강관리 및 구강건강행태와의 관련성)

  • Jo, Hwa-Young;Jung, Yun-Sook;Park, Dong-Ok;Lee, Young-Eun;Choi, Youn-Hee;Song, Keun-Bae
    • Journal of dental hygiene science
    • /
    • v.16 no.3
    • /
    • pp.242-248
    • /
    • 2016
  • The purposes of this study were to investigate the factors affection the Oral Impacts on Daily Performances for Children (C-OIDP) in elementary and middle school students, and identify the association between oral health-related behaviors, oral health condition and C-OIDP. A cross-sectional study was conducted in three schools in Incheon, Asan, Korea. A total of 175 selected children were interviewed by a trained examiner using a questionnaire. Oral Health Related Quality of Life was assessed by the Korean version of C-OIDP. Socio-economic characteristics, oral health-related behaviors, oral health condition and C-OIDP were verified using the questionnaire. ANOVA analysis was performed to determine the oral health and C-OIDP, and multiple regression analysis was performed to determine the factors affecting the C-OIDP. The activities with the greatest effect were eating (28.0%), cleaning teeth (22.9%), and smiling (18.9%). In the logistic regression model, the high item score of C-OIDP was associated with experiencing dental caries and gum pain in the past month. The more the C-OIDP prevalence item, the more the fillng deciduous tooth surface (fs) (p=0.024), caries experienced deciduous tooth surface (dfs) (p=0.049), total caries tooth surface (ds+DS) (p=0.021), and total caries experienced tooth surface (dfs+DMFS) (p=0.047). It can be concluded that the factors affecting C-OIDP are fs, dfs, dfs+DMFS, and gingival pain. Based on these results, we can improve C-OIDP to advance preventive practice.

A Study on the Factors Related to the Cognitive Function and Depression Among the Elderly (일부지역 노인들의 인지기능과 우울에 관련된 요인에 관한 연구)

  • Shin, Cheol-Ho;Kim, Soo-Young;Lee, Young-Soo;Cho, Young-Chae;Lee, Tae-Yong;Lee, Dong-Bae
    • Journal of Preventive Medicine and Public Health
    • /
    • v.29 no.2 s.53
    • /
    • pp.199-214
    • /
    • 1996
  • To investigate the factors which affecting the cognitive function and depression of the 65 or more age group, the authors surveyed for the subjects in the region of Taejon and nearby Taejon area. 729 studied subjects were tested for cognitive function with MMSE and depression with GDS. The main results were followings; In the studied subjects, the rate of normal cognitive function was 56.8%, the rate of mildly impaired was 24.1% and the rate of severe impairment was 19.1%. The cognitive function level was closely related to the depression score. As the age increased, the cognitive function was more impaired. Sexual difference was also existed in the cognitive function level and the depression score. After adjusting the effect of age, the variables such as sex, marital status, education level, past job, instrumental ability of daily living, regular physical exercise, frequencies of going out the house, chest discomfort, visual and auditory disturbance, and dizziness had the significant relationship with cognitive function impairment. Among these variables instrumental ADL, age, visual disturbance, and sex showed statistical significance in the logistic regression model. In the multiple stepwise regression, the variables which had significant relationship to depression score were education level, frequencies of going out house, current job and house work activity, regular physical exercise, instrumental ADL, self-rated health and nutritional status, dimness, visual disturbance, and chest pain. In conclusion, main characteristics which had close relationship to the cognitive function and depression symptoms in the studied subjects were physical function and self rated health status.

  • PDF

The Effects of Wearing Protective Devices among Residents and Volunteers Participating in the Cleanup of the Hebei Spirit Oil Spill (허베이스피릿호 유류유출사고 방제작업 참여자의 보호장비착용 효과)

  • Lee, Seung-Min;Ha, Mi-Na;Kim, Eun-Jung;Jeong, Woo-Chul;Hur, Jong-Il;Park, Seok-Gun;Kwon, Ho-Jang;Hong, Yun-Chul;Ha, Eun-Hee;Lee, Jong-Seung;Chung, Bong-Chul;Lee, Jeong-Ae;Im, Ho-Sub;Choi, Ye-Yong;Cho, Yong-Min;Cheong, Hae-Kwan
    • Journal of Preventive Medicine and Public Health
    • /
    • v.42 no.2
    • /
    • pp.89-95
    • /
    • 2009
  • Objectives : To assess the protective effects of wearing protective devices among the residents and volunteers who participated in the cleanup of the Hebei Spirit oil spill. Methods : A total of 288 residents and 724 volunteers were surveyed about symptoms, whether they were wearing protective devices and potential confounding variables. The questionnaires were administered from the second to the sixth week following the accident. Spot urine samples were collected and analyzed for metabolites of 4 volatile organic compounds(VOCs), 2 polycyclic aromatic hydrocarbons(PAHs), and 6 heavy metals. The association between the wearing of protective devices and various symptoms was assessed using a multiple logistic regression adjusted for confounding variables. A multiple generalized linear regression model adjusted for the covariates was used to test for a difference in least-square mean concentration of urinary biomarkers between residents who wore protective devices and those who did not. Results : Thirty nine to 98% of the residents and 62-98% of volunteers wore protective devices. Levels of fatigue and fever were higher among residents not wearing masks than among those who did wear masks(odds ratio 4.5; 95% confidence interval 1.23-19.86). Urinary mercury levels were found to be significantly higher among residents not wearing work clothes or boots(p<0.05). Conclusions : Because the survey was not performed during the initial high-exposure period, no significant difference was found in metabolite levels between people who wore protective devices and those who did not, except for mercury, whose biological half-life is more than 6 weeks.

Effects of Social Support, Sleep Quality, and Oral Health Impact Profile on Depression among Pregnant Women (일부 임신부의 사회적 지지, 수면의 질 및 구강건강영향지수가 우울수준에 미치는 영향)

  • Han, Se-Young;Han, Yang-Keum
    • Journal of dental hygiene science
    • /
    • v.17 no.2
    • /
    • pp.134-141
    • /
    • 2017
  • This study examined 191 pregnant women before delivery in an obstetrics and gynecology clinic in North Gyeongsang Province from May to September 2016 by using a questionnaire after obtaining informed consent for voluntary participation in the study. The study was performed to investigate the association of depression with sociodemographic characteristics, pregnancy-related characteristics, social support, sleep quality and Oral Health Impact Profile (OHIP) in pregnant women. The prevalence of depression among the pregnant women was 25.1% in the healthy group and 74.9% in the depression group. The depression level was significantly higher in women in the depression group who were unsatisfied with their marriage life, had no occupation, had lower social support, had poor sleep quality and had higher OHIP scores. The results of the logistic regression analysis indicated that, the risk ratio for more severe depression was significantly higher in the group with no experience of miscarriage and induced childbirth than in the group with childbirth experience. Conversely, the risk ratio for more severe depression was significantly lower in the group with high social support than in the group with low social support. Depression in the respondents significantly positively correlated with sleep quality and OHIP score but significantly negatively correlated with social support. The multiple regression analysis revealed that the depression level was significantly higher by 22.3% among pregnant women with lower marital satisfaction, no childbirth experience, lower social support and higher OHIP scores. In summary, depression was related to marital satisfaction, childbirth experience, social support, and OHIP score, among others, in pregnant women in this study. Therefore, further investigation is warranted to construct programs and measures that will help build positive thinking by designing and verifying a three-dimensional study model by taking into consideration various variables to reduce the incidence of depression in pregnant women.

A Time Series Graph based Convolutional Neural Network Model for Effective Input Variable Pattern Learning : Application to the Prediction of Stock Market (효과적인 입력변수 패턴 학습을 위한 시계열 그래프 기반 합성곱 신경망 모형: 주식시장 예측에의 응용)

  • Lee, Mo-Se;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.167-181
    • /
    • 2018
  • Over the past decade, deep learning has been in spotlight among various machine learning algorithms. In particular, CNN(Convolutional Neural Network), which is known as the effective solution for recognizing and classifying images or voices, has been popularly applied to classification and prediction problems. In this study, we investigate the way to apply CNN in business problem solving. Specifically, this study propose to apply CNN to stock market prediction, one of the most challenging tasks in the machine learning research. As mentioned, CNN has strength in interpreting images. Thus, the model proposed in this study adopts CNN as the binary classifier that predicts stock market direction (upward or downward) by using time series graphs as its inputs. That is, our proposal is to build a machine learning algorithm that mimics an experts called 'technical analysts' who examine the graph of past price movement, and predict future financial price movements. Our proposed model named 'CNN-FG(Convolutional Neural Network using Fluctuation Graph)' consists of five steps. In the first step, it divides the dataset into the intervals of 5 days. And then, it creates time series graphs for the divided dataset in step 2. The size of the image in which the graph is drawn is $40(pixels){\times}40(pixels)$, and the graph of each independent variable was drawn using different colors. In step 3, the model converts the images into the matrices. Each image is converted into the combination of three matrices in order to express the value of the color using R(red), G(green), and B(blue) scale. In the next step, it splits the dataset of the graph images into training and validation datasets. We used 80% of the total dataset as the training dataset, and the remaining 20% as the validation dataset. And then, CNN classifiers are trained using the images of training dataset in the final step. Regarding the parameters of CNN-FG, we adopted two convolution filters ($5{\times}5{\times}6$ and $5{\times}5{\times}9$) in the convolution layer. In the pooling layer, $2{\times}2$ max pooling filter was used. The numbers of the nodes in two hidden layers were set to, respectively, 900 and 32, and the number of the nodes in the output layer was set to 2(one is for the prediction of upward trend, and the other one is for downward trend). Activation functions for the convolution layer and the hidden layer were set to ReLU(Rectified Linear Unit), and one for the output layer set to Softmax function. To validate our model - CNN-FG, we applied it to the prediction of KOSPI200 for 2,026 days in eight years (from 2009 to 2016). To match the proportions of the two groups in the independent variable (i.e. tomorrow's stock market movement), we selected 1,950 samples by applying random sampling. Finally, we built the training dataset using 80% of the total dataset (1,560 samples), and the validation dataset using 20% (390 samples). The dependent variables of the experimental dataset included twelve technical indicators popularly been used in the previous studies. They include Stochastic %K, Stochastic %D, Momentum, ROC(rate of change), LW %R(Larry William's %R), A/D oscillator(accumulation/distribution oscillator), OSCP(price oscillator), CCI(commodity channel index), and so on. To confirm the superiority of CNN-FG, we compared its prediction accuracy with the ones of other classification models. Experimental results showed that CNN-FG outperforms LOGIT(logistic regression), ANN(artificial neural network), and SVM(support vector machine) with the statistical significance. These empirical results imply that converting time series business data into graphs and building CNN-based classification models using these graphs can be effective from the perspective of prediction accuracy. Thus, this paper sheds a light on how to apply deep learning techniques to the domain of business problem solving.

Assessment of future climate and land use changes impact on hydrologic behavior in Anseong-cheon Gongdo urban-growing watershed (미래 기후변화와 토지이용변화가 안성천 공도 도시성장 유역의 수문에 미치는 영향 평가)

  • Kim, Da Rae;Lee, Yong Gwan;Lee, Ji Wan;Kim, Seong Joon
    • Journal of Korea Water Resources Association
    • /
    • v.51 no.2
    • /
    • pp.141-150
    • /
    • 2018
  • The purpose of this study is to evaluate the future hydrologic behavior affected by the potential climate and land use changes in upstream of Anseong-cheon watershed ($366.5km^2$) using SWAT. The HadGEM3-RA RCP 4.5 and 8.5 scenarios were used for 2030s (2020-2039) and 2050s (2040-2059) periods as the future climate change scenario. It was shown that maximum changes of precipitation ranged from -5.7% in 2030s to +18.5% in 2050s for RCP 4.5 scenarios and the temperature increased up to $1.8^{\circ}C$ and $2.6^{\circ}C$ in 2030s RCP 4.5 and 2050s 8.5 scenarios respectively based on baseline (1976-2005) period. The future land uses were predicted using the CLUE-s model by establishing logistic regression equation. The 2050 urban area were predicted to increase of 58.6% (29.0 to $46.0km^2$). The SWAT was calibrated and verified using 14 years (2002-2015) of daily streamflow with 0.86 and 0.76 Nash-Sutcliffe model efficiency (NSE) for stream flow (Q) and low flow 1/Q respectively focusing on 2 drought years (2014-2015) calibration. For future climate change only, the stream discharge showed maximum decrease of 24.2% in 2030s RCP 4.5 and turned to maximum increase of 10.9% in 2050s RCP 4.5 scenario compared with the baseline period stream discharge of 601.0 mm by the precipitation variation and gradual temperature increase. While considering both future climate and land use change, the stream discharge showed maximum decrease of 14.9% in 2030s RCP 4.5 and maximum increase of 19.5% in 2050s RCP 4.5 scenario by the urban growth and the related land use changes. The results supported that the future land use factor might be considered especially for having high potential urban growth within a watershed in the future climate change assessment.

Treatment Status and Its Related Factors of the Hypertensives Detect ed Through Community Health Promotion Program (지역사회 보건사업에서 발견된 고혈압환자의 치료실태와 관련요인)

  • Kam, Sin;Kim, In-Ki;Chun, Byung-Yeol;Lee, Sang-Won;Lee, Kyung-Eun;Ahn, Soon-Ki;Jin, Dae-Gu;Lee, Kyeong-Soo
    • Journal of agricultural medicine and community health
    • /
    • v.26 no.2
    • /
    • pp.133-146
    • /
    • 2001
  • The purpose of this study was to investigate the treatment status and its related factors of the newly detected rural hypertensives through community health promotion program. A questionnaire survey and blood pressure measurement were performed to 6,977 residents of a rural area, and 282 hypertensives detected by blood pressure measurement were selected as subjects of the study. The study employed the health belief model as a hypothetical model. The major results of this study were as follows: The proportion of person experienced treatment among hypertensives was 12.0%. Treatment experience rate was significantly related with age and educational level(p<0.01). That is, if they were older, lower educational level, the treatment experience rate was higher. The major reasons of no treatment were 'they had not hypertensive symptoms ' (45.6%), 'their blood pressure was not high so much that they received treatment ' (43.2%). The chief facilities for treatment were public health institutions(57.9%) such as health center and health subcenter, and hospital/ clinics(29.8%). The treatment experience rate was higher when they had higher perceived severity for hypertension, lower perceived barrier to treatment, although statistically not significant. Treatment experience rate was significantly related with cues to action and health education experience(p<0.05). That is, if they had hypertension related symptoms such as headache previously, patients suffered from hypertension complication and health education experience for hypertension, the treatment experience rate was higher. In multiple logistic regression analysis for treatment experience, having a cerebrovascular patient in their acquaintance and the experience of health education for hypertension were significant variables. On consideration of above findings, it would to be essential to provide knowledge about hypertension and its treatment, and severity of hypertension complications through health education.

  • PDF

Corporate Bond Rating Using Various Multiclass Support Vector Machines (다양한 다분류 SVM을 적용한 기업채권평가)

  • Ahn, Hyun-Chul;Kim, Kyoung-Jae
    • Asia pacific journal of information systems
    • /
    • v.19 no.2
    • /
    • pp.157-178
    • /
    • 2009
  • Corporate credit rating is a very important factor in the market for corporate debt. Information concerning corporate operations is often disseminated to market participants through the changes in credit ratings that are published by professional rating agencies, such as Standard and Poor's (S&P) and Moody's Investor Service. Since these agencies generally require a large fee for the service, and the periodically provided ratings sometimes do not reflect the default risk of the company at the time, it may be advantageous for bond-market participants to be able to classify credit ratings before the agencies actually publish them. As a result, it is very important for companies (especially, financial companies) to develop a proper model of credit rating. From a technical perspective, the credit rating constitutes a typical, multiclass, classification problem because rating agencies generally have ten or more categories of ratings. For example, S&P's ratings range from AAA for the highest-quality bonds to D for the lowest-quality bonds. The professional rating agencies emphasize the importance of analysts' subjective judgments in the determination of credit ratings. However, in practice, a mathematical model that uses the financial variables of companies plays an important role in determining credit ratings, since it is convenient to apply and cost efficient. These financial variables include the ratios that represent a company's leverage status, liquidity status, and profitability status. Several statistical and artificial intelligence (AI) techniques have been applied as tools for predicting credit ratings. Among them, artificial neural networks are most prevalent in the area of finance because of their broad applicability to many business problems and their preeminent ability to adapt. However, artificial neural networks also have many defects, including the difficulty in determining the values of the control parameters and the number of processing elements in the layer as well as the risk of over-fitting. Of late, because of their robustness and high accuracy, support vector machines (SVMs) have become popular as a solution for problems with generating accurate prediction. An SVM's solution may be globally optimal because SVMs seek to minimize structural risk. On the other hand, artificial neural network models may tend to find locally optimal solutions because they seek to minimize empirical risk. In addition, no parameters need to be tuned in SVMs, barring the upper bound for non-separable cases in linear SVMs. Since SVMs were originally devised for binary classification, however they are not intrinsically geared for multiclass classifications as in credit ratings. Thus, researchers have tried to extend the original SVM to multiclass classification. Hitherto, a variety of techniques to extend standard SVMs to multiclass SVMs (MSVMs) has been proposed in the literature Only a few types of MSVM are, however, tested using prior studies that apply MSVMs to credit ratings studies. In this study, we examined six different techniques of MSVMs: (1) One-Against-One, (2) One-Against-AIL (3) DAGSVM, (4) ECOC, (5) Method of Weston and Watkins, and (6) Method of Crammer and Singer. In addition, we examined the prediction accuracy of some modified version of conventional MSVM techniques. To find the most appropriate technique of MSVMs for corporate bond rating, we applied all the techniques of MSVMs to a real-world case of credit rating in Korea. The best application is in corporate bond rating, which is the most frequently studied area of credit rating for specific debt issues or other financial obligations. For our study the research data were collected from National Information and Credit Evaluation, Inc., a major bond-rating company in Korea. The data set is comprised of the bond-ratings for the year 2002 and various financial variables for 1,295 companies from the manufacturing industry in Korea. We compared the results of these techniques with one another, and with those of traditional methods for credit ratings, such as multiple discriminant analysis (MDA), multinomial logistic regression (MLOGIT), and artificial neural networks (ANNs). As a result, we found that DAGSVM with an ordered list was the best approach for the prediction of bond rating. In addition, we found that the modified version of ECOC approach can yield higher prediction accuracy for the cases showing clear patterns.