• Title/Summary/Keyword: 로지스틱판별분석

Search Result 74, Processing Time 0.028 seconds

A Prediction Model for the Development of Cataract Using Random Forests (Random Forests 기법을 이용한 백내장 예측모형 - 일개 대학병원 건강검진 수검자료에서 -)

  • Han, Eun-Jeong;Song, Ki-Jun;Kim, Dong-Geon
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.4
    • /
    • pp.771-780
    • /
    • 2009
  • Cataract is the main cause of blindness and visual impairment, especially, age-related cataract accounts for about half of the 32 million cases of blindness worldwide. As the life expectancy and the expansion of the elderly population are increasing, the cases of cataract increase as well, which causes a serious economic and social problem throughout the country. However, the incidence of cataract can be reduced dramatically through early diagnosis and prevention. In this study, we developed a prediction model of cataracts for early diagnosis using hospital data of 3,237 subjects who received the screening test first and then later visited medical center for cataract check-ups cataract between 1994 and 2005. To develop the prediction model, we used random forests and compared the predictive performance of this model with other common discriminant models such as logistic regression, discriminant model, decision tree, naive Bayes, and two popular ensemble model, bagging and arcing. The accuracy of random forests was 67.16%, sensitivity was 72.28%, and main factors included in this model were age, diabetes, WBC, platelet, triglyceride, BMI and so on. The results showed that it could predict about 70% of cataract existence by screening test without any information from direct eye examination by ophthalmologist. We expect that our model may contribute to diagnose cataract and help preventing cataract in early stages.

A Study on the Development of Forest Fire Occurrence Probability Model using Canadian Forest Fire Weather Index -Occurrence of Forest Fire in Kangwon Province- (캐나다 산불 기상지수를 이용한 산불발생확률모형 개발 -강원도 지역 산불발생을 중심으로-)

  • Park, Houng-Sek;Lee, Si-Young;Chae, Hee-Mun;Lee, Woo-Kyun
    • Journal of the Korean Society of Hazard Mitigation
    • /
    • v.9 no.3
    • /
    • pp.95-100
    • /
    • 2009
  • Fine fuel moisture code (FFMC), a main component of forest fire weather index(FWI) in the Canadian forest fire danger rating system(CFFDRS), indicated a probability of ignition through expecting a dryness of fine fuels. According to this code, a rising of temperature and wind velocity, a decreasing of precipitation and decline of humidity in a weather condition showed a rising of a danger rate for the forest fire. In this study, we analyzed a weather condition during 5 years in Kangwon province, calculated a FFMC and examined an application of FFMC. Very low humidity and little precipitation was a characteristic during spring and fall fire season in Kangwon province. 75% of forest fires during 5 years occurred in this season and especially 90% of forest fire during fire season occurred in spring. For developing of the prediction model for a forest fire occurrence probability, we used a logistic regression function with forest fire occurrence data and classified mean FFMC during 10 days. Accuracy of a developed model was 63.6%. To improve this model, we need to deal with more meteorological data during overall seasons and to associate a meteorological condition with a forest fire occurrence with more research results.

An Optimized Combination of π-fuzzy Logic and Support Vector Machine for Stock Market Prediction (주식 시장 예측을 위한 π-퍼지 논리와 SVM의 최적 결합)

  • Dao, Tuanhung;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.43-58
    • /
    • 2014
  • As the use of trading systems has increased rapidly, many researchers have become interested in developing effective stock market prediction models using artificial intelligence techniques. Stock market prediction involves multifaceted interactions between market-controlling factors and unknown random processes. A successful stock prediction model achieves the most accurate result from minimum input data with the least complex model. In this research, we develop a combination model of ${\pi}$-fuzzy logic and support vector machine (SVM) models, using a genetic algorithm to optimize the parameters of the SVM and ${\pi}$-fuzzy functions, as well as feature subset selection to improve the performance of stock market prediction. To evaluate the performance of our proposed model, we compare the performance of our model to other comparative models, including the logistic regression, multiple discriminant analysis, classification and regression tree, artificial neural network, SVM, and fuzzy SVM models, with the same data. The results show that our model outperforms all other comparative models in prediction accuracy as well as return on investment.

A Hybrid Under-sampling Approach for Better Bankruptcy Prediction (부도예측 개선을 위한 하이브리드 언더샘플링 접근법)

  • Kim, Taehoon;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.173-190
    • /
    • 2015
  • The purpose of this study is to improve bankruptcy prediction models by using a novel hybrid under-sampling approach. Most prior studies have tried to enhance the accuracy of bankruptcy prediction models by improving the classification methods involved. In contrast, we focus on appropriate data preprocessing as a means of enhancing accuracy. In particular, we aim to develop an effective sampling approach for bankruptcy prediction, since most prediction models suffer from class imbalance problems. The approach proposed in this study is a hybrid under-sampling method that combines the k-Reverse Nearest Neighbor (k-RNN) and one-class support vector machine (OCSVM) approaches. k-RNN can effectively eliminate outliers, while OCSVM contributes to the selection of informative training samples from majority class data. To validate our proposed approach, we have applied it to data from H Bank's non-external auditing companies in Korea, and compared the performances of the classifiers with the proposed under-sampling and random sampling data. The empirical results show that the proposed under-sampling approach generally improves the accuracy of classifiers, such as logistic regression, discriminant analysis, decision tree, and support vector machines. They also show that the proposed under-sampling approach reduces the risk of false negative errors, which lead to higher misclassification costs.

A Study on the Verification of Significance of Assessment Items for Selecting Start-ups: Focusing on Project Fostering Start-ups through Leading Universities (창업기업 선정평가지표 유의성 검증에 관한 연구: 창업선도대학육성사업을 중심으로)

  • Jung, Kyung Hee;Sung, Chang So
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.13 no.4
    • /
    • pp.13-22
    • /
    • 2018
  • In this study, we examined the accuracy of the assessment items for selecting start-ups used in the project to support start-ups and verified their validity in determining whether they are appropriate assessment items based on selection criteria. The results of 973 start-ups that applied for the project fostering startup leading universities were collected and logistic regression was performed using SPSS 18.0. The study results are summarized as follows. First, the differences in characteristics of start-ups were identified in terms of selection. Second, the impact of selection by assessment items was gender in 2015, capability of the founder, business establishment in 2016, performance and potential in the global market, and business startup in 2017. Third, the overall selection accuracy analysis for the last three years confirmed that the accuracy of the selection is lower each year and that the accuracy of the selection is lower than the accuracy of the non-selection. This means that the current assessment items for selecting start-ups are inaccurate for selection, and that changes in the items due to changes in the start-up environment each year have led to lower accuracy of selection. It is meaningful that this study raised the importance of assessment items and the need for improvement of assessment items for the screening functions of good start-ups to enhance efficiency of the policies for startup support.

Artificial Intelligence Techniques for Predicting Online Peer-to-Peer(P2P) Loan Default (인공지능기법을 이용한 온라인 P2P 대출거래의 채무불이행 예측에 관한 실증연구)

  • Bae, Jae Kwon;Lee, Seung Yeon;Seo, Hee Jin
    • The Journal of Society for e-Business Studies
    • /
    • v.23 no.3
    • /
    • pp.207-224
    • /
    • 2018
  • In this article, an empirical study was conducted by using public dataset from Lending Club Corporation, the largest online peer-to-peer (P2P) lending in the world. We explore significant predictor variables related to P2P lending default that housing situation, length of employment, average current balance, debt-to-income ratio, loan amount, loan purpose, interest rate, public records, number of finance trades, total credit/credit limit, number of delinquent accounts, number of mortgage accounts, and number of bank card accounts are significant factors to loan funded successful on Lending Club platform. We developed online P2P lending default prediction models using discriminant analysis, logistic regression, neural networks, and decision trees (i.e., CART and C5.0) in order to predict P2P loan default. To verify the feasibility and effectiveness of P2P lending default prediction models, borrower loan data and credit data used in this study. Empirical results indicated that neural networks outperforms other classifiers such as discriminant analysis, logistic regression, CART, and C5.0. Neural networks always outperforms other classifiers in P2P loan default prediction.

THE ANTERIOR-POSTERIOR AND VERTICAL RELATIONSHIP OF THE GROWING CHILDREN WITH CLASS III MALOCCLUSION BY LATERAL CEPHALOMETRIC MEASUREMENT (측모두부방사선 사진을 이용한 성장기 III급 부정교합아동의 전후방적, 수직적 악골관계에 대한 연구)

  • Yang, Ku-Ho;Choi, Nam-Ki;Kim, Seong-Nam
    • Journal of the korean academy of Pediatric Dentistry
    • /
    • v.30 no.2
    • /
    • pp.291-297
    • /
    • 2003
  • While making diagnosis and the treatment plan for the growing children who visited at Chonnam National University Hospital for orthodontic treatment, authors obtained 8 lateral cephalometric measurements in antero-posterior and vertical relationship such as APDI, WITS, ANB, SN-MP, ODI, PFH/AFH, Y-axis, SUM for children aged 7 to 9 with class III malocclusion and compared them with these of 73 children of elementary school aged 7 to 9 with proper profile and normal occlusion in Gwangju. The results were as follows: 1. Between normal occlusion and class III malocclusion, ANB, SN-MP, ODI, SUM, except PFH/AFH and Y-axis showed statistically significant differences(p<0.05). 2. Between mesurements to describe skeletal disorder of antero-posterior relationship such as APDI, WITS, ANB and skeletal disorder of vertical relationship such as SN-MP, ODI, PFH/AFH, Y-axis, SUM, all of them in both normal occlusion and Class III malocclusion showed significant correlation, except Y-axis, SUM correlation(p<0.01). 3. Wald' statistics of WITS, ANB and APDI expressing skeletal disorder of antero-posterior relationship showed 7.118, 5.148, 0.741, respectively and Wald' statistics of ODI, Y-axis, PFH/AFH, SN-MP, SUM were presented 28.348, 2.238, 1.376, 0.090, 0.089, respectively. Therefore, WITS and ODI could be considered as useful diagnotic measurements for class III malocclusion.

  • PDF

Socio-Demographic Characteristics and Subjective Class Identification of 'Joongsancheung' (중산층의 사회인구학적 특성과 주관적 계층의식)

  • Jo, Dong-Gi
    • Korea journal of population studies
    • /
    • v.29 no.3
    • /
    • pp.89-109
    • /
    • 2006
  • The 'Joongsancheung(JSC)', a unique term for the middle class in Korea, is defined as a stratum sharing common lifestyles and a certain level of life chances. It involves non-economic factors such as life chance, educational attainment, occupational groups as well as economic factor. Such objective measures as the occupational status of the main breadwinner, family income, and the educational level of respondent, and subjective measures of class identification are used for the operational definition of the JSC. Data from a national survey of 1,515 respondents is analyzed to investigate the change of the JSC in size and the major determinants of class identification. The results show that while there is no strong evidence of any significant change of the JSC by the objective measures during the recent decade, there seems to be a slight decrease in the subjective class identification. In addition, binary logistical regression analysis reveals that self-identification of JSC is heavily influenced by house ownership, along with subjective evaluation of one's own income and property ownership. This study demonstrates that the apparent class polarization in Korean society reflects not so much objective conditions but subjective perception of respondent of his or her circumstance. It is suggested that problems of housing and relative derivation people have as regards income and property should be resolved to alleviate such class polarization in Korean society.

One-probe P300 based concealed information test with machine learning (기계학습을 이용한 단일 관련자극 P300기반 숨김정보검사)

  • Hyuk Kim;Hyun-Taek Kim
    • Korean Journal of Cognitive Science
    • /
    • v.35 no.1
    • /
    • pp.49-95
    • /
    • 2024
  • Polygraph examination, statement validity analysis and P300-based concealed information test are major three examination tools, which are use to determine a person's truthfulness and credibility in criminal procedure. Although polygraph examination is most common in criminal procedure, but it has little admissibility of evidence due to the weakness of scientific basis. In 1990s to support the weakness of scientific basis about polygraph, Farwell and Donchin proposed the P300-based concealed information test technique. The P300-based concealed information test has two strong points. First, the P300-based concealed information test is easy to conduct with polygraph. Second, the P300-based concealed information test has plentiful scientific basis. Nevertheless, the utilization of P300-based concealed information test is infrequent, because of the quantity of probe stimulus. The probe stimulus contains closed information that is relevant to the crime or other investigated situation. In tradition P300-based concealed information test protocol, three or more probe stimuli are necessarily needed. But it is hard to acquire three or more probe stimuli, because most of the crime relevant information is opened in investigative situation. In addition, P300-based concealed information test uses oddball paradigm, and oddball paradigm makes imbalance between the number of probe and irrelevant stimulus. Thus, there is a possibility that the unbalanced number of probe and irrelevant stimulus caused systematic underestimation of P300 amplitude of irrelevant stimuli. To overcome the these two limitation of P300-based concealed information test, one-probe P300-based concealed information test protocol is explored with various machine learning algorithms. According to this study, parameters of the modified one-probe protocol are as follows. In the condition of female and male face stimuli, the duration of stimuli are encouraged 400ms, the repetition of stimuli are encouraged 60 times, the analysis method of P300 amplitude is encouraged peak to peak method, the cut-off of guilty condition is encouraged 90% and the cut-off of innocent condition is encouraged 30%. In the condition of two-syllable word stimulus, the duration of stimulus is encouraged 300ms, the repetition of stimulus is encouraged 60 times, the analysis method of P300 amplitude is encouraged peak to peak method, the cut-off of guilty condition is encouraged 90% and the cut-off of innocent condition is encouraged 30%. It was also conformed that the logistic regression (LR), linear discriminant analysis (LDA), K Neighbors (KNN) algorithms were probable methods for analysis of P300 amplitude. The one-probe P300-based concealed information test with machine learning protocol is helpful to increase utilization of P300-based concealed information test, and supports to determine a person's truthfulness and credibility with the polygraph examination in criminal procedure.

A study on the prediction of korean NPL market return (한국 NPL시장 수익률 예측에 관한 연구)

  • Lee, Hyeon Su;Jeong, Seung Hwan;Oh, Kyong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.123-139
    • /
    • 2019
  • The Korean NPL market was formed by the government and foreign capital shortly after the 1997 IMF crisis. However, this market is short-lived, as the bad debt has started to increase after the global financial crisis in 2009 due to the real economic recession. NPL has become a major investment in the market in recent years when the domestic capital market's investment capital began to enter the NPL market in earnest. Although the domestic NPL market has received considerable attention due to the overheating of the NPL market in recent years, research on the NPL market has been abrupt since the history of capital market investment in the domestic NPL market is short. In addition, decision-making through more scientific and systematic analysis is required due to the decline in profitability and the price fluctuation due to the fluctuation of the real estate business. In this study, we propose a prediction model that can determine the achievement of the benchmark yield by using the NPL market related data in accordance with the market demand. In order to build the model, we used Korean NPL data from December 2013 to December 2017 for about 4 years. The total number of things data was 2291. As independent variables, only the variables related to the dependent variable were selected for the 11 variables that indicate the characteristics of the real estate. In order to select the variables, one to one t-test and logistic regression stepwise and decision tree were performed. Seven independent variables (purchase year, SPC (Special Purpose Company), municipality, appraisal value, purchase cost, OPB (Outstanding Principle Balance), HP (Holding Period)). The dependent variable is a bivariate variable that indicates whether the benchmark rate is reached. This is because the accuracy of the model predicting the binomial variables is higher than the model predicting the continuous variables, and the accuracy of these models is directly related to the effectiveness of the model. In addition, in the case of a special purpose company, whether or not to purchase the property is the main concern. Therefore, whether or not to achieve a certain level of return is enough to make a decision. For the dependent variable, we constructed and compared the predictive model by calculating the dependent variable by adjusting the numerical value to ascertain whether 12%, which is the standard rate of return used in the industry, is a meaningful reference value. As a result, it was found that the hit ratio average of the predictive model constructed using the dependent variable calculated by the 12% standard rate of return was the best at 64.60%. In order to propose an optimal prediction model based on the determined dependent variables and 7 independent variables, we construct a prediction model by applying the five methodologies of discriminant analysis, logistic regression analysis, decision tree, artificial neural network, and genetic algorithm linear model we tried to compare them. To do this, 10 sets of training data and testing data were extracted using 10 fold validation method. After building the model using this data, the hit ratio of each set was averaged and the performance was compared. As a result, the hit ratio average of prediction models constructed by using discriminant analysis, logistic regression model, decision tree, artificial neural network, and genetic algorithm linear model were 64.40%, 65.12%, 63.54%, 67.40%, and 60.51%, respectively. It was confirmed that the model using the artificial neural network is the best. Through this study, it is proved that it is effective to utilize 7 independent variables and artificial neural network prediction model in the future NPL market. The proposed model predicts that the 12% return of new things will be achieved beforehand, which will help the special purpose companies make investment decisions. Furthermore, we anticipate that the NPL market will be liquidated as the transaction proceeds at an appropriate price.