• Title/Summary/Keyword: 결정나무

Search Result 792, Processing Time 0.025 seconds

A Study on the Factors of Normal Repayment of Financial Debt Delinquents (국내 연체경험자의 정상변제 요인에 관한 연구)

  • Sungmin Choi;Hoyoung Kim
    • Information Systems Review
    • /
    • v.23 no.1
    • /
    • pp.69-91
    • /
    • 2021
  • Credit Bureaus in Korea commonly use financial transaction information of the past and present time for calculating an individual's credit scores. Compared to other rating factors, the repayment history information accounts for a larger weights on credit scores. Accordingly, despite full redemption of overdue payments, late payment history is reflected negatively for the assessment of credit scores for certain period of the time. An individual with debt delinquency can be classified into two groups; (1) the individuals who have faithfully paid off theirs overdue debts(Normal Repayment), and (2) those who have not and as differences of creditworthiness between these two groups do exist, it needs to grant relatively higher credit scores to the former individuals with normal repayment. This study is designed to analyze the factors of normal repayment of Korean financial debt delinquents based on credit information of personal loan, overdue payments, redemption from Korea Credit Information Services. As a result of the analysis, the number of overdue and the type of personal loan and delinquency were identified as significant variables affecting normal repayment and among applied methodologies, neural network models suggested the highest classification accuracy. The findings of this study are expected to improve the performance of individual credit scoring model by identifying the factors affecting normal repayment of a financial debt delinquent.

Stock Price Direction Prediction Using Convolutional Neural Network: Emphasis on Correlation Feature Selection (합성곱 신경망을 이용한 주가방향 예측: 상관관계 속성선택 방법을 중심으로)

  • Kyun Sun Eo;Kun Chang Lee
    • Information Systems Review
    • /
    • v.22 no.4
    • /
    • pp.21-39
    • /
    • 2020
  • Recently, deep learning has shown high performance in various applications such as pattern analysis and image classification. Especially known as a difficult task in the field of machine learning research, stock market forecasting is an area where the effectiveness of deep learning techniques is being verified by many researchers. This study proposed a deep learning Convolutional Neural Network (CNN) model to predict the direction of stock prices. We then used the feature selection method to improve the performance of the model. We compared the performance of machine learning classifiers against CNN. The classifiers used in this study are as follows: Logistic Regression, Decision Tree, Neural Network, Support Vector Machine, Adaboost, Bagging, and Random Forest. The results of this study confirmed that the CNN showed higher performancecompared with other classifiers in the case of feature selection. The results show that the CNN model effectively predicted the stock price direction by analyzing the embedded values of the financial data

A study on algal bloom forecast system based on hydro-meteorological factors in the mainstream of Nakdong river using machine learning (머신러닝를 이용한 낙동강 본류 구간 수문-기상인자 조류 예보체계 연구)

  • Taewoo Lee;Soojun Kim;Junhyeong Lee;Kyunghun Kim;Hoyong Lee;Duckgil Kim
    • Journal of Wetlands Research
    • /
    • v.26 no.3
    • /
    • pp.245-253
    • /
    • 2024
  • Blue-green algal bloom, or harmful algal bloom has a negative impact on the aquatic ecosystem and purified water supply system due to oxygen depletion in the water body, odor, and secretion of toxic substances in the freshwater ecosystem. This Blue-green algal bloom is expected to increase in intensity and frequency due to the increase in algae's residence time in the water body after the construction of the Nakdong River weir, as well as the increase in surface temperature due to climate change. In this study, in order to respond to the expected increase in green algae phenomenon, an algal bloom forecast system based on hydro-meteorological factors was presented for preemptive response before issuing a algal bloom warning. Through polyserial correlation analysis, the preceding influence periods of temperature and discharge according to the algal bloom forecast level were derived. Using the decision tree classification, a machine learning technique, Classification models for the algal bloom forecast levels based on temperature and discharge of the preceding period were derived. And a algal bloom forecast system based on hydro-meteorological factors was derived based on the results of the decision tree classification models. The proposed algae forecast system based on hydro-meteorological factors can be used as basic research for preemptive response before blue-green algal blooms.

A Study on the Effect of Water Soluble Extractive upon Physical Properties of Wood (수용성(水溶性) 추출물(抽出物)이 목재(木材)의 물리적(物理的) 성질(性質)에 미치는 영향(影響))

  • Shim, Chong-Supp
    • Journal of the Korean Wood Science and Technology
    • /
    • v.10 no.3
    • /
    • pp.13-44
    • /
    • 1982
  • 1. Since long time ago, it has been talked about that soaking wood into water for a long time would be profitable for the decreasing of defects such as checking, cupping and bow due to the undue-shrinking and swelling. There are, however, no any actual data providing this fact definitly, although there are some guesses that water soluble extractives might effect on this problem. On the other hand, this is a few work which has been done about the effect of water soluble extractives upon the some physical properties of wood and that it might be related to the above mentioned problem. If man does account for that whether soaking wood into water for a long time would be profitable for the decreasing of defects due to the undue-shrinking and swelling in comparison with unsoaking wood or not, it may bring a great contribution on the reasonable uses of wood. To account for the effect of water soluble extractives upon physical properties of wood, this study has been made at the wood technology laboratory, School of Forestry, Yale university, under competent guidance of Dr. F. F. Wangaard, with the following three different species which had been provided at the same laboratory. 1. Pinus strobus 2. Quercus borealis 3. Hymenaea courbaril 2. The physical properties investigated in this study are as follows. a. Equilibrium moisture content at different relative humidity conditions. b. Shrinkage value from gre condition to different relative humidity conditions and oven dry condition. c. Swelling value from oven dry condition to different relative humidity conditions. d. Specific gravity 3. In order to investigate the effect of water soluble extractives upon physical properties of wood, the experiment has been carried out with two differently treated specimens, that is, one has been treated into water and the other into sugar solution, and with controlled specimens. 4. The quantity of water soluble extractives of each species and the group of chemical compounds in the extracted liquid from each species have shown in Table 36. Between species, there is some difference in quantity of extractives and group of chemical compounds. 5. In the case of equilibrium moisture contents at different relative humidity condition, (a) Except the desorption case at 80% R. H. C. (Relative Humidity Condition), there is a definite line between untreated specimens and treated specimens that is, untreated specimens hold water more than treated specimens at the same R.H.C. (b) The specimens treated into sugar solution have shown almost the same tendency in results compared with the untreated specimens. (c) Between species, there is no any definite relation in equilibrium moisture content each other, however E. M. C. in heartwood of pine is lesser than in sapwood. This might cause from the difference of wood anatomical structure. 6. In the case of shrinkage, (a) The shrinkage value of the treated specimen into water is more than that of the untreated specimens, except anyone case of heartwood of pine at 80% R. H. C. (b) The shrinkage value of treated specimens in the sugar solution is less than that of the others and has almost the same tendency to the untreated specimens. It would mean that the penetration of some sugar into the wood can decrease the shrinkage value of wood. (c) Between species, the shrinkage value of heartwood of pine is less than sapwood of the same, shrinkage value of oak is the largest, Hymenaea is lesser than oak and more than pine. (d) Directional difference of shrinkage value through all species can also see as other all kind of species previously tested. (e) There is a definite relation in between the difference of shrinkage value of treated and untreated specimens and amount of extractives, that is, increasing extractives gives increasing the difference of shrinkage value between treated and untreated specimens. 7. In the case of swelling, (a) The swelling value of treated specimens is greater than that of the untreated specimens through all cases. (b) In comparison with the tangential direction and radial direction, the swelling value of tangential direction is larger than that of radial direction in the same species. (c) Between species, the largest one in swelling values is oak and the smallest pine heartwood, there are also a tendency that species which shrink more swell also more and, on the contrary, species which shrink lesser swell also lesser than the others. 8. In the case of specific gravity, (a) The specific gravity of the treated specimens is larger than that of untreated specimens. This reversed value between treated and untreated specimens has been resulted from the volume of specimen of oven dry condition. (b) Between species, there are differences, that is, the specific gravity of Hymenaea is the largest one and the sapwood of pine is the smallest. 9. Through this investigation, it has been concluded that soaking wood into plain water before use without any special consideration may bring more hastful results than unsoaking for use of wood. However soaking wood into the some specially provided solutions such as salt water or inorganic matter may be dissolved in it, can be profitable for the decreasing shrinkage and swelling, checking, shaking and bow etc. if soaking wood into plain water might bring the decreasing defects, it might come from even shrinking and swelling through all dimension.

  • PDF

A Literature Review and Classification of Recommender Systems on Academic Journals (추천시스템관련 학술논문 분석 및 분류)

  • Park, Deuk-Hee;Kim, Hyea-Kyeong;Choi, Il-Young;Kim, Jae-Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.139-152
    • /
    • 2011
  • Recommender systems have become an important research field since the emergence of the first paper on collaborative filtering in the mid-1990s. In general, recommender systems are defined as the supporting systems which help users to find information, products, or services (such as books, movies, music, digital products, web sites, and TV programs) by aggregating and analyzing suggestions from other users, which mean reviews from various authorities, and user attributes. However, as academic researches on recommender systems have increased significantly over the last ten years, more researches are required to be applicable in the real world situation. Because research field on recommender systems is still wide and less mature than other research fields. Accordingly, the existing articles on recommender systems need to be reviewed toward the next generation of recommender systems. However, it would be not easy to confine the recommender system researches to specific disciplines, considering the nature of the recommender system researches. So, we reviewed all articles on recommender systems from 37 journals which were published from 2001 to 2010. The 37 journals are selected from top 125 journals of the MIS Journal Rankings. Also, the literature search was based on the descriptors "Recommender system", "Recommendation system", "Personalization system", "Collaborative filtering" and "Contents filtering". The full text of each article was reviewed to eliminate the article that was not actually related to recommender systems. Many of articles were excluded because the articles such as Conference papers, master's and doctoral dissertations, textbook, unpublished working papers, non-English publication papers and news were unfit for our research. We classified articles by year of publication, journals, recommendation fields, and data mining techniques. The recommendation fields and data mining techniques of 187 articles are reviewed and classified into eight recommendation fields (book, document, image, movie, music, shopping, TV program, and others) and eight data mining techniques (association rule, clustering, decision tree, k-nearest neighbor, link analysis, neural network, regression, and other heuristic methods). The results represented in this paper have several significant implications. First, based on previous publication rates, the interest in the recommender system related research will grow significantly in the future. Second, 49 articles are related to movie recommendation whereas image and TV program recommendation are identified in only 6 articles. This result has been caused by the easy use of MovieLens data set. So, it is necessary to prepare data set of other fields. Third, recently social network analysis has been used in the various applications. However studies on recommender systems using social network analysis are deficient. Henceforth, we expect that new recommendation approaches using social network analysis will be developed in the recommender systems. So, it will be an interesting and further research area to evaluate the recommendation system researches using social method analysis. This result provides trend of recommender system researches by examining the published literature, and provides practitioners and researchers with insight and future direction on recommender systems. We hope that this research helps anyone who is interested in recommender systems research to gain insight for future research.

A Study of on the Method to Select Manufacturing Activities Sensitive to Regional Characteristics by Analyzing the Locational Hierarchy (입지계층분석을 활용한 산업단지 유치 업종 결정에 관한 연구)

  • So, Jin-Kwang;Lee, Hyeon-Joo;Kim, Sun-Woo
    • Land and Housing Review
    • /
    • v.2 no.4
    • /
    • pp.559-568
    • /
    • 2011
  • This study aims at listing up those manufacturing activities sensitive to regional characteristics by analyzing locational hierarchy designed on the urban rank-size rule. This locational hierarchy by manufacturing activities is expected to provide a ground for the proper supply of an industrial complex. The analysis of the locational hierarchy by manufacturing activities can work as a method of observing the characteristics of the distribution of location for each economic activity by analyzing the trend in the change of manufacturing location. Consequently, it can be used to determine the appropriate manufacturing activities for the industrial complex of a particular region. Here, the locational hierarchy is analyzed depending on the base of the basic local government such as Gun(district level) and Si(city level), and manufacturing activities are categorized by Korea Standard Industry Code. Those activities demonstrating growth pattern are Manufacture of Electronic Equipment(KSIC 26), Manufacture of Medical Precision Optical Instruments Watch(KSIC 27), Manufacture of Motor Vehicles (KSIC 30, 31), etc. With proper infrastructures, these activities can be located everywhere. Those sectors on the decline pattern in the locational hierarchy can be summarized as Manufacture of Tobacco Products(KSIC 12), Manufacture of wearing apparel Fur Articles(KSIC 14), etc. Those sectors scattered widely in the locational hierarchy are Manufacture of Food Products(KSIC 10), Manufacture of Coke Petroleum Products(KSIC 19), Manufacture of Chemical Products(KSIC 20), Manufacture of Electronic Equipment(KSIC 26). These particular manufacturing activities can be operated in those regions in a sufficient supply of unskilled workers regardless of proper infrastructures. Those activities that have a tendency to reconcentrate on larger cities are Manufacture of Textiles(KSIC 13), Manufacture of Wearing Apparel Clothing Fur Articles(KSIC 14), Manufacture of Other Transport Equiptmen(KSIC 31). In most cases, these sectors tend to favor their existing agglomerated areas and concentrate around large cities. Therefore, it is inefficient to promote these sectors in small or medium-sized cities or underdeveloped regions. The establishment of developmental strategies of an industrial complex can gain greater competitiveness by observing such characteristics of the locational hierarchy.

A Recidivism Prediction Model Based on XGBoost Considering Asymmetric Error Costs (비대칭 오류 비용을 고려한 XGBoost 기반 재범 예측 모델)

  • Won, Ha-Ram;Shim, Jae-Seung;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.127-137
    • /
    • 2019
  • Recidivism prediction has been a subject of constant research by experts since the early 1970s. But it has become more important as committed crimes by recidivist steadily increase. Especially, in the 1990s, after the US and Canada adopted the 'Recidivism Risk Assessment Report' as a decisive criterion during trial and parole screening, research on recidivism prediction became more active. And in the same period, empirical studies on 'Recidivism Factors' were started even at Korea. Even though most recidivism prediction studies have so far focused on factors of recidivism or the accuracy of recidivism prediction, it is important to minimize the prediction misclassification cost, because recidivism prediction has an asymmetric error cost structure. In general, the cost of misrecognizing people who do not cause recidivism to cause recidivism is lower than the cost of incorrectly classifying people who would cause recidivism. Because the former increases only the additional monitoring costs, while the latter increases the amount of social, and economic costs. Therefore, in this paper, we propose an XGBoost(eXtream Gradient Boosting; XGB) based recidivism prediction model considering asymmetric error cost. In the first step of the model, XGB, being recognized as high performance ensemble method in the field of data mining, was applied. And the results of XGB were compared with various prediction models such as LOGIT(logistic regression analysis), DT(decision trees), ANN(artificial neural networks), and SVM(support vector machines). In the next step, the threshold is optimized to minimize the total misclassification cost, which is the weighted average of FNE(False Negative Error) and FPE(False Positive Error). To verify the usefulness of the model, the model was applied to a real recidivism prediction dataset. As a result, it was confirmed that the XGB model not only showed better prediction accuracy than other prediction models but also reduced the cost of misclassification most effectively.

Determination of Fire Severity and Deduction of Influence Factors Through Landsat-8 Satellite Image Analysis - A Case Study of Gangneung and Donghae Forest Fires - (Landsat-8 위성영상 분석을 통한 산불피해 심각도 판정 및 영향 인자 도출 - 강릉, 동해 산불을 사례로 -)

  • Soo-Dong Lee;Gyoung-Sik Park;Chung-Hyeon Oh;Bong-Gyo Cho;Byeong-Hyeok Yu
    • Korean Journal of Environment and Ecology
    • /
    • v.38 no.3
    • /
    • pp.277-292
    • /
    • 2024
  • In order to manage large-scale forest fires concentrated in Gangwon-do and Gyeongsangbuk-do with severe topographical heterogeneity, a decision-making process through efficient and rapid damage assessment using satellite images is essential. Accordingly, this study targets a large-scale forest fire that ignited in Gangneung and the Donghae, Gangwon-do on March 5, 2022, and was extinguished around 19:00 on March 8, to estimate the fire severity using dNBR and derive environmental factors that affect the grade. As environmental factors, we quantified the regular vegetation index representing vegetation or fuel type, the forest index that classifies tree species, the regular moisture index representing moisture content, and DEM in relation to topography, and then analyzed the correlation with the fire severity. In terms of fire severity, the widest range was 'Unbured' at 52.4%, followed by low severity at 42.9%, medium-low severity at 4.3%, and medium-high severity at 0.4%. Environmental factors showed a negative correlation with dNDVI and dNDWI, and a positive correlation with slope. Regarding vegetation, the differences between coniferous, broad-leaved, and other groups in dNDVI, dNIWI, and slope, which were analyzed to affect the fire severity, were analyzed to be significant with p-value < 2.2e-16. In particular, the difference between coniferous and broad-leaved forests was clear, and it was confirmed that coniferous forest suffered more damage than broad-leaved forest due to the higher fire severity in the Gangwon-do region, including Pinus densiflora, which are dominant species, as well as P. koraiensis, P. rigida and P. thunbergii.

Development of Predictive Models for Rights Issues Using Financial Analysis Indices and Decision Tree Technique (경영분석지표와 의사결정나무기법을 이용한 유상증자 예측모형 개발)

  • Kim, Myeong-Kyun;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.59-77
    • /
    • 2012
  • This study focuses on predicting which firms will increase capital by issuing new stocks in the near future. Many stakeholders, including banks, credit rating agencies and investors, performs a variety of analyses for firms' growth, profitability, stability, activity, productivity, etc., and regularly report the firms' financial analysis indices. In the paper, we develop predictive models for rights issues using these financial analysis indices and data mining techniques. This study approaches to building the predictive models from the perspective of two different analyses. The first is the analysis period. We divide the analysis period into before and after the IMF financial crisis, and examine whether there is the difference between the two periods. The second is the prediction time. In order to predict when firms increase capital by issuing new stocks, the prediction time is categorized as one year, two years and three years later. Therefore Total six prediction models are developed and analyzed. In this paper, we employ the decision tree technique to build the prediction models for rights issues. The decision tree is the most widely used prediction method which builds decision trees to label or categorize cases into a set of known classes. In contrast to neural networks, logistic regression and SVM, decision tree techniques are well suited for high-dimensional applications and have strong explanation capabilities. There are well-known decision tree induction algorithms such as CHAID, CART, QUEST, C5.0, etc. Among them, we use C5.0 algorithm which is the most recently developed algorithm and yields performance better than other algorithms. We obtained data for the rights issue and financial analysis from TS2000 of Korea Listed Companies Association. A record of financial analysis data is consisted of 89 variables which include 9 growth indices, 30 profitability indices, 23 stability indices, 6 activity indices and 8 productivity indices. For the model building and test, we used 10,925 financial analysis data of total 658 listed firms. PASW Modeler 13 was used to build C5.0 decision trees for the six prediction models. Total 84 variables among financial analysis data are selected as the input variables of each model, and the rights issue status (issued or not issued) is defined as the output variable. To develop prediction models using C5.0 node (Node Options: Output type = Rule set, Use boosting = false, Cross-validate = false, Mode = Simple, Favor = Generality), we used 60% of data for model building and 40% of data for model test. The results of experimental analysis show that the prediction accuracies of data after the IMF financial crisis (59.04% to 60.43%) are about 10 percent higher than ones before IMF financial crisis (68.78% to 71.41%). These results indicate that since the IMF financial crisis, the reliability of financial analysis indices has increased and the firm intention of rights issue has been more obvious. The experiment results also show that the stability-related indices have a major impact on conducting rights issue in the case of short-term prediction. On the other hand, the long-term prediction of conducting rights issue is affected by financial analysis indices on profitability, stability, activity and productivity. All the prediction models include the industry code as one of significant variables. This means that companies in different types of industries show their different types of patterns for rights issue. We conclude that it is desirable for stakeholders to take into account stability-related indices and more various financial analysis indices for short-term prediction and long-term prediction, respectively. The current study has several limitations. First, we need to compare the differences in accuracy by using different data mining techniques such as neural networks, logistic regression and SVM. Second, we are required to develop and to evaluate new prediction models including variables which research in the theory of capital structure has mentioned about the relevance to rights issue.

A Study on the Effect of Network Centralities on Recommendation Performance (네트워크 중심성 척도가 추천 성능에 미치는 영향에 대한 연구)

  • Lee, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.23-46
    • /
    • 2021
  • Collaborative filtering, which is often used in personalization recommendations, is recognized as a very useful technique to find similar customers and recommend products to them based on their purchase history. However, the traditional collaborative filtering technique has raised the question of having difficulty calculating the similarity for new customers or products due to the method of calculating similaritiesbased on direct connections and common features among customers. For this reason, a hybrid technique was designed to use content-based filtering techniques together. On the one hand, efforts have been made to solve these problems by applying the structural characteristics of social networks. This applies a method of indirectly calculating similarities through their similar customers placed between them. This means creating a customer's network based on purchasing data and calculating the similarity between the two based on the features of the network that indirectly connects the two customers within this network. Such similarity can be used as a measure to predict whether the target customer accepts recommendations. The centrality metrics of networks can be utilized for the calculation of these similarities. Different centrality metrics have important implications in that they may have different effects on recommended performance. In this study, furthermore, the effect of these centrality metrics on the performance of recommendation may vary depending on recommender algorithms. In addition, recommendation techniques using network analysis can be expected to contribute to increasing recommendation performance even if they apply not only to new customers or products but also to entire customers or products. By considering a customer's purchase of an item as a link generated between the customer and the item on the network, the prediction of user acceptance of recommendation is solved as a prediction of whether a new link will be created between them. As the classification models fit the purpose of solving the binary problem of whether the link is engaged or not, decision tree, k-nearest neighbors (KNN), logistic regression, artificial neural network, and support vector machine (SVM) are selected in the research. The data for performance evaluation used order data collected from an online shopping mall over four years and two months. Among them, the previous three years and eight months constitute social networks composed of and the experiment was conducted by organizing the data collected into the social network. The next four months' records were used to train and evaluate recommender models. Experiments with the centrality metrics applied to each model show that the recommendation acceptance rates of the centrality metrics are different for each algorithm at a meaningful level. In this work, we analyzed only four commonly used centrality metrics: degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality. Eigenvector centrality records the lowest performance in all models except support vector machines. Closeness centrality and betweenness centrality show similar performance across all models. Degree centrality ranking moderate across overall models while betweenness centrality always ranking higher than degree centrality. Finally, closeness centrality is characterized by distinct differences in performance according to the model. It ranks first in logistic regression, artificial neural network, and decision tree withnumerically high performance. However, it only records very low rankings in support vector machine and K-neighborhood with low-performance levels. As the experiment results reveal, in a classification model, network centrality metrics over a subnetwork that connects the two nodes can effectively predict the connectivity between two nodes in a social network. Furthermore, each metric has a different performance depending on the classification model type. This result implies that choosing appropriate metrics for each algorithm can lead to achieving higher recommendation performance. In general, betweenness centrality can guarantee a high level of performance in any model. It would be possible to consider the introduction of proximity centrality to obtain higher performance for certain models.