Search | Korea Science

A Recidivism Prediction Model Based on XGBoost Considering Asymmetric Error Costs (비대칭 오류 비용을 고려한 XGBoost 기반 재범 예측 모델)

Won, Ha-Ram;Shim, Jae-Seung;Ahn, Hyunchul
- Journal of Intelligence and Information Systems
- /
- v.25 no.1
- /
- pp.127-137
- /
- 2019
Recidivism prediction has been a subject of constant research by experts since the early 1970s. But it has become more important as committed crimes by recidivist steadily increase. Especially, in the 1990s, after the US and Canada adopted the 'Recidivism Risk Assessment Report' as a decisive criterion during trial and parole screening, research on recidivism prediction became more active. And in the same period, empirical studies on 'Recidivism Factors' were started even at Korea. Even though most recidivism prediction studies have so far focused on factors of recidivism or the accuracy of recidivism prediction, it is important to minimize the prediction misclassification cost, because recidivism prediction has an asymmetric error cost structure. In general, the cost of misrecognizing people who do not cause recidivism to cause recidivism is lower than the cost of incorrectly classifying people who would cause recidivism. Because the former increases only the additional monitoring costs, while the latter increases the amount of social, and economic costs. Therefore, in this paper, we propose an XGBoost(eXtream Gradient Boosting; XGB) based recidivism prediction model considering asymmetric error cost. In the first step of the model, XGB, being recognized as high performance ensemble method in the field of data mining, was applied. And the results of XGB were compared with various prediction models such as LOGIT(logistic regression analysis), DT(decision trees), ANN(artificial neural networks), and SVM(support vector machines). In the next step, the threshold is optimized to minimize the total misclassification cost, which is the weighted average of FNE(False Negative Error) and FPE(False Positive Error). To verify the usefulness of the model, the model was applied to a real recidivism prediction dataset. As a result, it was confirmed that the XGB model not only showed better prediction accuracy than other prediction models but also reduced the cost of misclassification most effectively.
https://doi.org/10.13088/jiis.2019.25.1.127 인용 PDF KSCI HTML

The KMA Global Seasonal forecasting system (GloSea6) - Part 2: Climatological Mean Bias Characteristics (기상청 기후예측시스템(GloSea6) - Part 2: 기후모의 평균 오차 특성 분석)

Hyun, Yu-Kyung;Lee, Johan;Shin, Beomcheol;Choi, Yuna;Kim, Ji-Yeong;Lee, Sang-Min;Ji, Hee-Sook;Boo, Kyung-On;Lim, Somin;Kim, Hyeri;Ryu, Young;Park, Yeon-Hee;Park, Hyeong-Sik;Choo, Sung-Ho;Hyun, Seung-Hwon;Hwang, Seung-On
- Atmosphere
- /
- v.32 no.2
- /
- pp.87-101
- /
- 2022
In this paper, the performance improvement for the new KMA's Climate Prediction System (GloSea6), which has been built and tested in 2021, is presented by assessing the bias distribution of basic variables from 24 years of GloSea6 hindcasts. Along with the upgrade from GloSea5 to GloSea6, the performance of GloSea6 can be regarded as notable in many respects: improvements in (i) negative bias of geopotential height over the tropical and mid-latitude troposphere and over polar stratosphere in boreal summer; (ii) cold bias of tropospheric temperature; (iii) underestimation of mid-latitude jets; (iv) dry bias in the lower troposphere; (v) cold tongue bias in the equatorial SST and the warm bias of Southern Ocean, suggesting the potential of improvements to the major climate variability in GloSea6. The warm surface temperature in the northern hemisphere continent in summer is eliminated by using CDF-matched soil-moisture initials. However, the cold bias in high latitude snow-covered area in winter still needs to be improved in the future. The intensification of the westerly winds of the summer Asian monsoon and the weakening of the northwest Pacific high, which are considered to be major errors in the GloSea system, had not been significantly improved. However, both the use of increased number of ensembles and the initial conditions at the closest initial dates reveals possibility to improve these biases. It is also noted that the effect of ensemble expansion mainly contributes to the improvement of annual variability over high latitudes and polar regions.
https://doi.org/10.14191/Atmos.2022.32.2.087 인용 PDF KSCI

A Study of Cultural Migration of Pungmul-gut - Focusing on a Pungmul-pae's Activity in Toronto, Canada - (풍물굿의 해외 문화이주 현상에 관한 연구 - 캐나다 토론토의 풍물패 활동을 중심으로 -)

Lee, Yon-Shik
- (The) Research of the performance art and culture
- /
- no.41
- /
- pp.353-380
- /
- 2020
Samul nori/Pungmul-gut is the symbol of ethnic identity for the Koreans abroad. It is the representative diaspora musical genre which is performed many cultural events held by Koreans. It is, at the same time, a global music which is appreciated by not only the Koreans but also the foreigners. Many musical communities in various countries exhibit the cultural migration through the discourse of 'tradition/variation' and 'authenticity/hybridity' in the course of the acculturation and enculturation of samul nori/pungmul-gut. The pungmul-pae 'Bichoe June' active in Toronto, Canada was organized by a foreign performer. For the foreigners pungmul-gut is easy to access as a genre of world music. As a percussion ensemble, it is easy to learn for the foreigners. The pungmul-pae 'Bichoe June' is a 'music community' consist of the Koreans and foreigners. The band tries to preserve the traditionality and authenticity of the Korean music. There is no variation or hybridity in its music since the member still learns the authentic music through various available textbooks and internet sites. Through the participation of the Koreans and foreigners, the band stimulates the globalzation of the pungmul-gut. The enculturation of the pungmul-gut is exhibited in two performances held by the band. One was host by the Canadian progressive group and the other was by the Korean conservative community. The former understood the nature of pungmul-gut as the music of the common people. The latter, however, accepted the music as the representative traditional music but was not easy to enjoy the 'noisy' music. In other words, the positive/negative acceptance of the pungmul-gut depends of the ideological nature of the listeners rather than the ethnical nature.

The Melodic Structure of Sangnyeongsan in Gwanak-yeongsanhoesang - Focused on the Relationship between Piri Melody and Daegeum yeoneum - (관악영산회상 중 상령산의 선율 구조 - 피리 선율과 대금 연음의 관계를 중심으로 -)

Yim, Hyun-Taek
- (The) Research of the performance art and culture
- /
- no.39
- /
- pp.701-748
- /
- 2019
Gwanak-yeongsanhoesang, called as Samhyeon-yeongsanhoesang or Pyojeongmanbangjigok, is played by the musical instrumental organization, Samhyeonyukgak or by a large scale wind ensemble added Sogeum and Ajaeng. This study aims to analyze the structure and form of Piri melody which plays major melody of Sangnyeongsan in Gwanak-yeongsanhoesang, and to examine the relationship between Piri melody and Daegeum yeoneum grasping the structure and function of yeoneum. In Sangnyeongsan of Gwanak-yeongsanhoesang, the criterion for grouping the phrases of Piri melody is yeoneum. Especially, Daegeum yeoneum carries out the function of finishing the phrase of Piri playing the major melody by ornamenting or extending it, and presenting the motives or motive elements of the next phrase while Piri rests. The types of a, b, g, and i in the various shapes of the minimum melodic fragment of Piri are important motive elements that constitute a phrase of Piri melody. Especially, main motive a-type (仲→無) contrasts with b type (林→潢) which forms a strong tension by transposing 2 degrees upward. In addition, a-type gradually descends towards the end of music by changing to g-type (仲→林) or to i-type (太→林) which is 3 degrees below, which is related to the gradual descent cadence of Korean traditional music. A phrase of Piri melody of Sangnyeongsan in Gwanak-yeongsanhoesang consists of a combination of the types a, b, g, i, and cadence (x-type), and each phrase is structured in the repeating tension-relaxation. Looking at the structure of Piri phrases by similar types, each phrase has a logical variation structure through the methods such as omission and addition of notes, and crossing of melodies. The shape of the minimum melodic fragment of Daegeum yeoneum can be divided into a back-yeoneum of a~b types and a front-yeoneum of x1~x3. The x-types ornament Jungnyeo (仲), the cadence tone of Piri melody or are simply used as the extending back-yeoneum, and types a and b have the function of a front-yeoneum that prepares the beginning of the next phrase of Piri melody. The combination types of the minimum melodic fragment of Daegeum yeoneum appear mostly as the shape of back-yeoneum + front-yeoneum. In addition, the front-yeoneum of the type a and b appears independently without back-yeoneum, and the x3 type has a shape of the back-yeoneum without the front-yeoneum. Looking at the structure of Daegeum yeoneum by similar types, it can be seen that Daegeum yeoneum is also composed of a variation structure of omission and addition of notes like Piri melody.

Identifying sources of heavy metal contamination in stream sediments using machine learning classifiers (기계학습 분류모델을 이용한 하천퇴적물의 중금속 오염원 식별)

Min Jeong Ban;Sangwook Shin;Dong Hoon Lee;Jeong-Gyu Kim;Hosik Lee;Young Kim;Jeong-Hun Park;ShunHwa Lee;Seon-Young Kim;Joo-Hyon Kang
- Journal of Wetlands Research
- /
- v.25 no.4
- /
- pp.306-314
- /
- 2023
Stream sediments are an important component of water quality management because they are receptors of various pollutants such as heavy metals and organic matters emitted from upland sources and can be secondary pollution sources, adversely affecting water environment. To effectively manage the stream sediments, identification of primary sources of sediment contamination and source-associated control strategies will be required. We evaluated the performance of machine learning models in identifying primary sources of sediment contamination based on the physico-chemical properties of stream sediments. A total of 356 stream sediment data sets of 18 quality parameters including 10 heavy metal species(Cd, Cu, Pb, Ni, As, Zn, Cr, Hg, Li, and Al), 3 soil parameters(clay, silt, and sand fractions), and 5 water quality parameters(water content, loss on ignition, total organic carbon, total nitrogen, and total phosphorous) were collected near abandoned metal mines and industrial complexes across the four major river basins in Korea. Two machine learning algorithms, linear discriminant analysis (LDA) and support vector machine (SVM) classifiers were used to classify the sediments into four cases of different combinations of the sampling period and locations (i.e., mine in dry season, mine in wet season, industrial complex in dry season, and　industrial complex in wet season). Both models showed good performance in the classification, with SVM outperformed LDA; the accuracy values of LDA and SVM were 79.5% and 88.1%, respectively. An SVM ensemble model was used for multi-label classification of the multiple contamination sources inlcuding landuses in the upland areas within 1 km radius from the sampling sites. The results showed that the multi-label classifier was comparable performance with sinlgle-label SVM in classifying mines and industrial complexes, but was less accurate in classifying dominant land uses (50~60%). The poor performance of the multi-label SVM is likely due to the overfitting caused by small data sets compared to the complexity of the model. A larger data set might increase the performance of the machine learning models in identifying contamination sources.
https://doi.org/10.17663/JWR.2023.25.4.306 인용 PDF HTML

Estimation of Chlorophyll-a Concentration in Nakdong River Using Machine Learning-Based Satellite Data and Water Quality, Hydrological, and Meteorological Factors (머신러닝 기반 위성영상과 수질·수문·기상 인자를 활용한 낙동강의 Chlorophyll-a 농도 추정)

Soryeon Park;Sanghun Son;Jaegu Bae;Doi Lee;Dongju Seo;Jinsoo Kim
- Korean Journal of Remote Sensing
- /
- v.39 no.5_1
- /
- pp.655-667
- /
- 2023
Algal bloom outbreaks are frequently reported around the world, and serious water pollution problems arise every year in Korea. It is necessary to protect the aquatic ecosystem through continuous management and rapid response. Many studies using satellite images are being conducted to estimate the concentration of chlorophyll-a (Chl-a), an indicator of algal bloom occurrence. However, machine learning models have recently been used because it is difficult to accurately calculate Chl-a due to the spectral characteristics and atmospheric correction errors that change depending on the water system. It is necessary to consider the factors affecting algal bloom as well as the satellite spectral index. Therefore, this study constructed a dataset by considering water quality, hydrological and meteorological factors, and sentinel-2 images in combination. Representative ensemble models random forest and extreme gradient boosting (XGBoost) were used to predict the concentration of Chl-a in eight weirs located on the Nakdong river over the past five years. R-squared score (R²), root mean square errors (RMSE), and mean absolute errors (MAE) were used as model evaluation indicators, and it was confirmed that R² of XGBoost was 0.80, RMSE was 6.612, and MAE was 4.457. Shapley additive expansion analysis showed that water quality factors, suspended solids, biochemical oxygen demand, dissolved oxygen, and the band ratio using red edge bands were of high importance in both models. Various input data were confirmed to help improve model performance, and it seems that it can be applied to domestic and international algal bloom detection.
https://doi.org/10.7780/kjrs.2023.39.5.1.15 인용 PDF HTML

'Yongyudam of Hamyang', the Significance and Value as a Traditional Scenic Place ('함양 용유담(咸陽龍遊潭)', 전래명승으로서의 의의와 가치 구명)

Rho, Jae-hyun
- Korean Journal of Heritage: History & Science
- /
- v.47 no.1
- /
- pp.82-101
- /
- 2014
The purpose of this study was to survey and analyze the origin story and the legends associated with Yongyudam(龍遊潭, Dragon Creek), its scenic and spatial description in Climbing Writings(遊山記: Yusangi Notes), its geographical and geological features, its surrounding remains and letters chiseled on the rocks through the field study and the study on literatures associated with it so to identify its significance and value and then to ensure justification on preservation of Yongyudam scenic site. Conclusions of this study are as follow. As the traditional scenic place 'Geumdae-Jiri(金臺智異)' representing Hamyang-gun(咸陽郡) depicts Mount Cheonwangbong and 'Yongyudong Village(龍遊洞)', ancient maps and literatures have positioned Yongyudam as the center of Eomcheon-river Creek as well as the representing scenic site of Yongyudong Village. Core images in the spatial awareness of Yongyudam described in our ancestors' Climbing Writings Notes on Jirisan Mount are 'geographical and scenic peculiarity associated with swimming dragons', 'potholes in various shapes and sizes scattered on rocks', 'loud sound generated by swirling from shoals' and 'the scenic metaphor from the dragon legend', which have led scenic features of Yongyudam scenic site. In addition, significant scenic metaphors from legends such as 'Nine Dragons and Ascetic Majeog' and 'Kasaya Fish' as well as 'the Holy Place of Dragon God', the rain calling magic god have descended not only as the very nature of Yongyudam scenic site but also the catalyst deepening its mystic and place nature. On the other hand, Jangguso Place(杖?所, Place of Scholars) in the vicinity of Yongyudam was the place of resting and amusement for scholars from Yeongnam Province, to name a few, Kim Il-son, Cho Sik, Jung Yeo-chang and Kang Dae-su, where they experienced and recognized Jirisan Mount as the scenic living place. Letters Carved on the rocks at Jangguso Place are memorial tributes and monumental signs. Around Yongyudam, there are 3 stairs, letters chiseled on the rocks and the water rock artificially built to collect clean water, which are traditional scenic remains detectable of territoriality as the ritual place. In addition, The letters on the rock at Yongyudong-mun(龍遊洞門) discovered for the first time by this study are the sign promoting Yongyudam as the place of splendid landscape. The laconism, 'It is the Greatest Water in Jirisan Mount(方丈第一山水)' on a rock expresses the pride of Yongyudam as the representing scenic place of Mount Jirisan. Other than those, standing rocks such as Simjindae Rock, Yeong-gwidae Rock and Ganghwadae Rock show the sign that they are used as amusement and gathering places for ancestor scholars, which add significance to Yongyudam. By this study, it was possible to verify that Yongyudam in Mount Jirisan is not simply 'the scenic place in the tangible reality' but also has seamlessly inherited as the traditional scenic attraction spiritualized by overlapped historical and cultural values. Yongyudam, as the combined heritage by itself, shows that it is the product of the place nature as well as unique ensemble of cultural scenic attraction inherited through long history based on natural scenery. It is certain that not only the place value but also geographical, geological, historical and cultural values of Yongyudam are the essence of traditional scenic attraction, which should not be disparaged or damaged by whatever political or economic interests and logics.
https://doi.org/10.22755/kjchs.2014.47.1.82 인용 PDF

Types and Characteristics of Traditional Music Performance of the 1920s - Focused on the mixed performances type in the western-style genre - (1920년대 전통음악공연의 형태와 특징 - 서양식 장르와의 혼성공연형태를 중심으로 -)

Keum, Yong-woong
- (The) Research of the performance art and culture
- /
- no.35
- /
- pp.61-92
- /
- 2017
During the Japanese colonial era, traditional music performances were gradually diminishing and weakening in the particular condition of colonization. Meanwhile, from the time of enlightenment, Western genre performances were becoming vitalized with the influence of Western civilization that began to be spread steadily throughout the society. In that situation, traditional music performances tended to be mixed performances accompanied by Western ones, not independent performances. Mostly, they were accompanied by Western music, and also, they were performed along with other genres like plays, lectures, movies, dances, or magic, too. Such form of mixed performances accompanied by Western genres became even more vitalized in the 1920's and came to be positioned as a form of traditional music performances. Therefore, research on the forms of mixed performances between Western genres and traditional music is meaningful in examining the forms of traditional music performances that have not been studied in the history of Korean modern music and understanding the trends of traditional music performances which were generally found in the Japanese colonial era. However, such research has hardly been conducted concretely yet. Accordingly, concerning the forms of mixed performances between Western genres and traditional music in the 1920's, this author considered the background of vitalizing mixed performances between Western genres and traditional music mainly with newspaper articles of the time and their formal characteristics. Regarding the background of vitalizing the forms of mixed performances between Western genres and traditional music, from the 1920's, the forms of mixed performances between Western genres and traditional music became more vitalized than before. The causes of that may include the increase of groups hosting or sponsoring such performances from the 1920's and also the dramatic increase of such performances in general. Moreover, the increased performances were conducted in the forms of mixed performances mainly in order to satisfy the people's needs becoming diversified with the distribution of Western civilization. Concerning the formal characteristics of mixed performances between Western genres and traditional music, this researcher classified western genres performed with traditional music and examined what characteristics were found in such mixed performances of tradition music by the types of Western genres respectively. First, in the mixed performances type of western-type genre and traditional music, the number of programs for the western music had significant portion in general, and there were certain ensemble of the western music and traditional musical instrument that was rare at this period of time, and it also had the characteristics of classifying two genres to perform for each title or date. Second, in the mixed performances type of the drama and traditional music, the traditional music is directly participated in the drama with the similar type to the theater, or performed independently from the drama with the role of interlude performance for the stage conversion of the drama to have the characteristics of performing in audience publicity or entertainment. Third, in the mixed performances type of the lecture and traditional music, the traditional music is played before or after the lecture to play the role to set the atmosphere and entertainment for the lecture as displaying the feature to perform for the audience attraction. And, fourth, in the mixed performances type of the movie and traditional music, the traditional music sometimes directly participated in the movie or had the features of independent performance, and there was a characteristic to perform for the entertainment after showing a movie.

A Methodology of Customer Churn Prediction based on Two-Dimensional Loyalty Segmentation (이차원 고객충성도 세그먼트 기반의 고객이탈예측 방법론)

Kim, Hyung Su;Hong, Seung Woo
- Journal of Intelligence and Information Systems
- /
- v.26 no.4
- /
- pp.111-126
- /
- 2020
Most industries have recently become aware of the importance of customer lifetime value as they are exposed to a competitive environment. As a result, preventing customers from churn is becoming a more important business issue than securing new customers. This is because maintaining churn customers is far more economical than securing new customers, and in fact, the acquisition cost of new customers is known to be five to six times higher than the maintenance cost of churn customers. Also, Companies that effectively prevent customer churn and improve customer retention rates are known to have a positive effect on not only increasing the company's profitability but also improving its brand image by improving customer satisfaction. Predicting customer churn, which had been conducted as a sub-research area for CRM, has recently become more important as a big data-based performance marketing theme due to the development of business machine learning technology. Until now, research on customer churn prediction has been carried out actively in such sectors as the mobile telecommunication industry, the financial industry, the distribution industry, and the game industry, which are highly competitive and urgent to manage churn. In addition, These churn prediction studies were focused on improving the performance of the churn prediction model itself, such as simply comparing the performance of various models, exploring features that are effective in forecasting departures, or developing new ensemble techniques, and were limited in terms of practical utilization because most studies considered the entire customer group as a group and developed a predictive model. As such, the main purpose of the existing related research was to improve the performance of the predictive model itself, and there was a relatively lack of research to improve the overall customer churn prediction process. In fact, customers in the business have different behavior characteristics due to heterogeneous transaction patterns, and the resulting churn rate is different, so it is unreasonable to assume the entire customer as a single customer group. Therefore, it is desirable to segment customers according to customer classification criteria, such as loyalty, and to operate an appropriate churn prediction model individually, in order to carry out effective customer churn predictions in heterogeneous industries. Of course, in some studies, there are studies in which customers are subdivided using clustering techniques and applied a churn prediction model for individual customer groups. Although this process of predicting churn can produce better predictions than a single predict model for the entire customer population, there is still room for improvement in that clustering is a mechanical, exploratory grouping technique that calculates distances based on inputs and does not reflect the strategic intent of an entity such as loyalties. This study proposes a segment-based customer departure prediction process (CCP/2DL: Customer Churn Prediction based on Two-Dimensional Loyalty segmentation) based on two-dimensional customer loyalty, assuming that successful customer churn management can be better done through improvements in the overall process than through the performance of the model itself. CCP/2DL is a series of churn prediction processes that segment two-way, quantitative and qualitative loyalty-based customer, conduct secondary grouping of customer segments according to churn patterns, and then independently apply heterogeneous churn prediction models for each churn pattern group. Performance comparisons were performed with the most commonly applied the General churn prediction process and the Clustering-based churn prediction process to assess the relative excellence of the proposed churn prediction process. The General churn prediction process used in this study refers to the process of predicting a single group of customers simply intended to be predicted as a machine learning model, using the most commonly used churn predicting method. And the Clustering-based churn prediction process is a method of first using clustering techniques to segment customers and implement a churn prediction model for each individual group. In cooperation with a global NGO, the proposed CCP/2DL performance showed better performance than other methodologies for predicting churn. This churn prediction process is not only effective in predicting churn, but can also be a strategic basis for obtaining a variety of customer observations and carrying out other related performance marketing activities.
https://doi.org/10.13088/jiis.2020.26.4.111 인용 PDF KSCI

The Prediction of Export Credit Guarantee Accident using Machine Learning (기계학습을 이용한 수출신용보증 사고예측)

Cho, Jaeyoung;Joo, Jihwan;Han, Ingoo
- Journal of Intelligence and Information Systems
- /
- v.27 no.1
- /
- pp.83-102
- /
- 2021
The government recently announced various policies for developing big-data and artificial intelligence fields to provide a great opportunity to the public with respect to disclosure of high-quality data within public institutions. KSURE(Korea Trade Insurance Corporation) is a major public institution for financial policy in Korea, and thus the company is strongly committed to backing export companies with various systems. Nevertheless, there are still fewer cases of realized business model based on big-data analyses. In this situation, this paper aims to develop a new business model which can be applied to an ex-ante prediction for the likelihood of the insurance accident of credit guarantee. We utilize internal data from KSURE which supports export companies in Korea and apply machine learning models. Then, we conduct performance comparison among the predictive models including Logistic Regression, Random Forest, XGBoost, LightGBM, and DNN(Deep Neural Network). For decades, many researchers have tried to find better models which can help to predict bankruptcy since the ex-ante prediction is crucial for corporate managers, investors, creditors, and other stakeholders. The development of the prediction for financial distress or bankruptcy was originated from Smith(1930), Fitzpatrick(1932), or Merwin(1942). One of the most famous models is the Altman's Z-score model(Altman, 1968) which was based on the multiple discriminant analysis. This model is widely used in both research and practice by this time. The author suggests the score model that utilizes five key financial ratios to predict the probability of bankruptcy in the next two years. Ohlson(1980) introduces logit model to complement some limitations of previous models. Furthermore, Elmer and Borowski(1988) develop and examine a rule-based, automated system which conducts the financial analysis of savings and loans. Since the 1980s, researchers in Korea have started to examine analyses on the prediction of financial distress or bankruptcy. Kim(1987) analyzes financial ratios and develops the prediction model. Also, Han et al.(1995, 1996, 1997, 2003, 2005, 2006) construct the prediction model using various techniques including artificial neural network. Yang(1996) introduces multiple discriminant analysis and logit model. Besides, Kim and Kim(2001) utilize artificial neural network techniques for ex-ante prediction of insolvent enterprises. After that, many scholars have been trying to predict financial distress or bankruptcy more precisely based on diverse models such as Random Forest or SVM. One major distinction of our research from the previous research is that we focus on examining the predicted probability of default for each sample case, not only on investigating the classification accuracy of each model for the entire sample. Most predictive models in this paper show that the level of the accuracy of classification is about 70% based on the entire sample. To be specific, LightGBM model shows the highest accuracy of 71.1% and Logit model indicates the lowest accuracy of 69%. However, we confirm that there are open to multiple interpretations. In the context of the business, we have to put more emphasis on efforts to minimize type 2 error which causes more harmful operating losses for the guaranty company. Thus, we also compare the classification accuracy by splitting predicted probability of the default into ten equal intervals. When we examine the classification accuracy for each interval, Logit model has the highest accuracy of 100% for 0~10% of the predicted probability of the default, however, Logit model has a relatively lower accuracy of 61.5% for 90~100% of the predicted probability of the default. On the other hand, Random Forest, XGBoost, LightGBM, and DNN indicate more desirable results since they indicate a higher level of accuracy for both 0~10% and 90~100% of the predicted probability of the default but have a lower level of accuracy around 50% of the predicted probability of the default. When it comes to the distribution of samples for each predicted probability of the default, both LightGBM and XGBoost models have a relatively large number of samples for both 0~10% and 90~100% of the predicted probability of the default. Although Random Forest model has an advantage with regard to the perspective of classification accuracy with small number of cases, LightGBM or XGBoost could become a more desirable model since they classify large number of cases into the two extreme intervals of the predicted probability of the default, even allowing for their relatively low classification accuracy. Considering the importance of type 2 error and total prediction accuracy, XGBoost and DNN show superior performance. Next, Random Forest and LightGBM show good results, but logistic regression shows the worst performance. However, each predictive model has a comparative advantage in terms of various evaluation standards. For instance, Random Forest model shows almost 100% accuracy for samples which are expected to have a high level of the probability of default. Collectively, we can construct more comprehensive ensemble models which contain multiple classification machine learning models and conduct majority voting for maximizing its overall performance.
https://doi.org/10.13088/jiis.2021.27.1.083 인용 PDF KSCI

Search Result 1,361, Processing Time 0.035 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)