• Title/Summary/Keyword: probability calculation

Search Result 459, Processing Time 0.019 seconds

Machine learning-based corporate default risk prediction model verification and policy recommendation: Focusing on improvement through stacking ensemble model (머신러닝 기반 기업부도위험 예측모델 검증 및 정책적 제언: 스태킹 앙상블 모델을 통한 개선을 중심으로)

  • Eom, Haneul;Kim, Jaeseong;Choi, Sangok
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.105-129
    • /
    • 2020
  • This study uses corporate data from 2012 to 2018 when K-IFRS was applied in earnest to predict default risks. The data used in the analysis totaled 10,545 rows, consisting of 160 columns including 38 in the statement of financial position, 26 in the statement of comprehensive income, 11 in the statement of cash flows, and 76 in the index of financial ratios. Unlike most previous prior studies used the default event as the basis for learning about default risk, this study calculated default risk using the market capitalization and stock price volatility of each company based on the Merton model. Through this, it was able to solve the problem of data imbalance due to the scarcity of default events, which had been pointed out as the limitation of the existing methodology, and the problem of reflecting the difference in default risk that exists within ordinary companies. Because learning was conducted only by using corporate information available to unlisted companies, default risks of unlisted companies without stock price information can be appropriately derived. Through this, it can provide stable default risk assessment services to unlisted companies that are difficult to determine proper default risk with traditional credit rating models such as small and medium-sized companies and startups. Although there has been an active study of predicting corporate default risks using machine learning recently, model bias issues exist because most studies are making predictions based on a single model. Stable and reliable valuation methodology is required for the calculation of default risk, given that the entity's default risk information is very widely utilized in the market and the sensitivity to the difference in default risk is high. Also, Strict standards are also required for methods of calculation. The credit rating method stipulated by the Financial Services Commission in the Financial Investment Regulations calls for the preparation of evaluation methods, including verification of the adequacy of evaluation methods, in consideration of past statistical data and experiences on credit ratings and changes in future market conditions. This study allowed the reduction of individual models' bias by utilizing stacking ensemble techniques that synthesize various machine learning models. This allows us to capture complex nonlinear relationships between default risk and various corporate information and maximize the advantages of machine learning-based default risk prediction models that take less time to calculate. To calculate forecasts by sub model to be used as input data for the Stacking Ensemble model, training data were divided into seven pieces, and sub-models were trained in a divided set to produce forecasts. To compare the predictive power of the Stacking Ensemble model, Random Forest, MLP, and CNN models were trained with full training data, then the predictive power of each model was verified on the test set. The analysis showed that the Stacking Ensemble model exceeded the predictive power of the Random Forest model, which had the best performance on a single model. Next, to check for statistically significant differences between the Stacking Ensemble model and the forecasts for each individual model, the Pair between the Stacking Ensemble model and each individual model was constructed. Because the results of the Shapiro-wilk normality test also showed that all Pair did not follow normality, Using the nonparametric method wilcoxon rank sum test, we checked whether the two model forecasts that make up the Pair showed statistically significant differences. The analysis showed that the forecasts of the Staging Ensemble model showed statistically significant differences from those of the MLP model and CNN model. In addition, this study can provide a methodology that allows existing credit rating agencies to apply machine learning-based bankruptcy risk prediction methodologies, given that traditional credit rating models can also be reflected as sub-models to calculate the final default probability. Also, the Stacking Ensemble techniques proposed in this study can help design to meet the requirements of the Financial Investment Business Regulations through the combination of various sub-models. We hope that this research will be used as a resource to increase practical use by overcoming and improving the limitations of existing machine learning-based models.

Sensitivity Experiment of Surface Reflectance to Error-inducing Variables Based on the GEMS Satellite Observations (GEMS 위성관측에 기반한 지면반사도 산출 시에 오차 유발 변수에 대한 민감도 실험)

  • Shin, Hee-Woo;Yoo, Jung-Moon
    • Journal of the Korean earth science society
    • /
    • v.39 no.1
    • /
    • pp.53-66
    • /
    • 2018
  • The information of surface reflectance ($R_{sfc}$) is important for the heat balance and the environmental/climate monitoring. The $R_{sfc}$ sensitivity to error-induced variables for the Geostationary Environment Monitoring Spectrometer (GEMS) retrieval from geostationary-orbit satellite observations at 300-500 nm was investigated, utilizing polar-orbit satellite data of the MODerate resolution Imaging Spectroradiometer (MODIS) and Ozone Mapping Instrument (OMI), and the radiative transfer model (RTM) experiment. The variables in this study can be cloud, Rayleigh-scattering, aerosol, ozone and surface type. The cloud detection in high-resolution MODIS pixels ($1km{\times}1km$) was compared with that in GEMS-scale pixels ($8km{\times}7km$). The GEMS detection was consistent (~79%) with the MODIS result. However, the detection probability in partially-cloudy (${\leq}40%$) GEMS pixels decreased due to other effects (i.e., aerosol and surface type). The Rayleigh-scattering effect in RGB images was noticeable over ocean, based on the RTM calculation. The reflectance at top of atmosphere ($R_{toa}$) increased with aerosol amounts in case of $R_{sfc}$<0.2, but decreased in $R_{sfc}{\geq}0.2$. The $R_{sfc}$ errors due to the aerosol increased with wavelength in the UV, but were constant or slightly decreased in the visible. The ozone absorption was most sensitive at 328 nm in the UV region (328-354 nm). The $R_{sfc}$ error was +0.1 because of negative total ozone anomaly (-100 DU) under the condition of $R_{sfc}=0.15$. This study can be useful to estimate $R_{sfc}$ uncertainties in the GEMS retrieval.

Decision Making on the Non surgical, Surgical Treatment on Chronic Adult Periodontitis (만성 성인성 치주염 치료시 비외과적, 외과적 방법에 대한 의사결정)

  • Song, Si-Eun;Li, Seung-Won;Cho, Kyoo-Sung;Chai, Jung-Kiu;Kim, Chong-Kwan
    • Journal of Periodontal and Implant Science
    • /
    • v.28 no.4
    • /
    • pp.645-660
    • /
    • 1998
  • The purpose of this study was to make and ascertain a decision making process on the base of patient-oriented utilitarianism in the treatment of patients of chronic adult periodontitis. Fifty subjects were chosen in Yonsei Dental hospital and the other fifty were chosen in Severance dental hospital according to the selection criteria. Fifty four patients agreed in this study. NS group(N=32) was treated with scaling and root planing without any surgical intervention, the other S group(N=22) done with flap operation. During the active treatment and healing time, all patients of both groups were educated about the importance of oral hygiene and controlled every visit to the hospital. When periodontal treatment needed according to the diagnostic results, some patients were subjected to professional tooth cleaning and scaling once every 3 months according to an individually designed oral hygienic protocol. Probing depth was recorded on baseline and 18 months after treatments. A questionnaire composed of 6 kinds(hygienic easiness, hypersensitivity, post treatment comfort, complication, functional comfort, compliance) of questions was delivered to each patient to obtain the subjective evaluation regarding the results of therapy. The decision tree for the treatment of adult periodontal disease was made on the result of 2 kinds of periodontal treatment and patient's ubjective evaluation. The optimal path was calculated by using the success rate of the results as the probability and utility according to relative value and the economic value in the insurance system. The success rate to achieve the diagnostic goal of periodontal treatment as the remaining pocket depth less than 3mm and without BOP was $0.83{\pm}0.12$ by non surgical treatment and $0.82{\pm}0.14$ by surgical treatment without any statistically significant difference. The moderate success rate of more than 4mm probing pocket depth were 0.17 together. The utilities of non-surgical treatment results were 100 for a result with less than 3mm probing pocket depth, 80 for the other results with more than 4mm probing pocket depth, 0 for the extraction. Those of surgical treatment results were the same except 75 for the results with more than 4mm. The pooling results of subjective evaluation by using a questionnaire were 60% for satisfaction level and 40% for no satisfaction level in the patient group receiving nonsurgical treatment and 33% and 67% in the other group receiving surgical treatment. The utilities for 4 satisfaction levels were 100, 75, 60, 50 on the base of that the patient would express the satisfaction level with normal distribution. The optimal path of periodontal treatment was rolled back by timing the utility on terminal node and the success rate, the distributed ratio of patient's satisfaction level. Both results of the calculation was non surgical treatment. Therefore, it can be said that non-surgical treatment may be the optimal path for this decision tree of treatment protocol if the goal of the periodontal treatment is to achieve the remaining probing pocket depth of less than 3mm for adult chronic periodontitis and if the utilitarian philosophy to maximise the expected utility for the patients is advocated.

  • PDF

Estimation of the Moisture Maximizing Rate based on the Moisture Inflow Direction : A Case Study of Typhoon Rusa in Gangneung Region (수분유입방향을 고려한 강릉지역 태풍 루사의 수분최대화비 산정)

  • Kim, Moon-Hyun;Jung, Il-Won;Im, Eun-Soon;Kwon, Won-Tae
    • Journal of Korea Water Resources Association
    • /
    • v.40 no.9
    • /
    • pp.697-707
    • /
    • 2007
  • In this study, we estimated the PMP(Probable Maximum Precipitation) and its transition in case of the typhoon Rusa which happened the biggest damage of all typhoons in the Korea. Specially, we analysed the moisture maximizing rate under the consideration of meteorological condition based on the orographic property when it hits in Gangneung region. The PMP is calculated by the rate of the maximum persisting 12 hours 1000 hPa dew points and representative persisting 12 hours 1000 hPa dew point. The former is influenced by the moisture inflow regions. These regions are determined by the surface wind direction, 850 hPa moisture flux and streamline, which are the critically different aspects compared to that of previous study. The latter is calculated using statistics program (FARD2002) provided by NIDP(National Institute for Disaster Prevention). In this program, the dew point is calculated by reappearance period 50-year frequency analysis from 5% of the level of significant when probability distribution type is applied extreme type I (Gumbel distribution) and parameter estimation method is used the Moment method. So this study indicated for small basin$(3.76km^2)$ the difference the PMP through new method and through existing result of established storm transposition and DAD(Depth-Area-Duration). Consequently, the moisture maximizing rate is calculated in the moisture inflow regions determined by meteorological fields is higher $0.20{\sim}0.40$ range than that of previous study. And the precipitation is increased $16{\sim}31%$ when this rate is applied for calculation.

Comparison and Decision of Exposure Coefficient for Calculation of Snow Load on Greenhouse Structure (온실의 적설하중 산정을 위한 노출계수의 비교 및 결정)

  • Jung, Seung-Hyeon;Yoon, Jae-Sub;Lee, Jong-Won;Lee, Hyun-Woo
    • Journal of Bio-Environment Control
    • /
    • v.24 no.3
    • /
    • pp.226-234
    • /
    • 2015
  • To provide the data necessary to determine exposure coefficients used for calculating the snow load acting on a greenhouse, we compared the exposure coefficients in the greenhouse structure design standards for various countries. We determined the exposure coefficient for each region and tried to improve on the method used to decide it. Our results are as follows: After comparing the exposure coefficients in the standards of various countries, we could determine that the main factors affecting the exposure coefficient were terrain roughness, wind speed, and whether a windbreak was present. On comparing national standards, the exposure coefficients could be divided into three groups: exposure coefficients of 0.8(0.9) for areas with strong winds, 1.0(1.1) for partially exposed areas, and 1.2 for areas with dense windbreaks. After analyzing the exposure coefficients for 94 areas in South Korea according to the ISO4355 standard, all of the areas had two coefficients (1.0 and 0.8), except Daegwallyeong (0.5) and Yeosu (0.6), which had one coefficient each. In South Korea, the probability of snow is greater inland than in coastal areas and there are fewer days with a maximum wind velocity > $5m{\cdot}s^{-1}$ inland. When determining the exposure coefficients in South Korea, we can subdivide the country into three regions: coastal areas with strong winds have an exposure coefficient of 0.8; inland areas have a coefficient of 1.0; and areas with dense windbreaks have an exposure coefficient of 1.2. Further research that considers the number of days with a wind velocity > $5m{\cdot}s^{-1}$ as the threshold wind speed is needed before we can make specific recommendations for the exposure coefficient for different regions.

Features of sample concepts in the probability and statistics chapters of Korean mathematics textbooks of grades 1-12 (초.중.고등학교 확률과 통계 단원에 나타난 표본개념에 대한 분석)

  • Lee, Young-Ha;Shin, Sou-Yeong
    • Journal of Educational Research in Mathematics
    • /
    • v.21 no.4
    • /
    • pp.327-344
    • /
    • 2011
  • This study is the first step for us toward improving high school students' capability of statistical inferences, such as obtaining and interpreting the confidence interval on the population mean that is currently learned in high school. We suggest 5 underlying concepts of 'discretion of contingency and inevitability', 'discretion of induction and deduction', 'likelihood principle', 'variability of a statistic' and 'statistical model', those are necessary to appreciate statistical inferences as a reliable arguing tools in spite of its occasional erroneous conclusions. We assume those 5 concepts above are to be gradually developing in their school periods and Korean mathematics textbooks of grades 1-12 were analyzed. Followings were found. For the right choice of solving methodology of the given problem, no elementary textbook but a few high school textbooks describe its difference between the contingent circumstance and the inevitable one. Formal definitions of population and sample are not introduced until high school grades, so that the developments of critical thoughts on the reliability of inductive reasoning could not be observed. On the contrary of it, strong emphasis lies on the calculation stuff of the sample data without any inference on the population prospective based upon the sample. Instead of the representative properties of a random sample, more emphasis lies on how to get a random sample. As a result of it, the fact that 'the random variability of the value of a statistic which is calculated from the sample ought to be inherited from the randomness of the sample' could neither be noticed nor be explained as well. No comparative descriptions on the statistical inferences against the mathematical(deductive) reasoning were found. Few explanations on the likelihood principle and its probabilistic applications in accordance with students' cognitive developmental growth were found. It was hard to find the explanation of a random variability of statistics and on the existence of its sampling distribution. It is worthwhile to explain it because, nevertheless obtaining the sampling distribution of a particular statistic, like a sample mean, is a very difficult job, mere noticing its existence may cause a drastic change of understanding in a statistical inference.

  • PDF

Calculation of future rainfall scenarios to consider the impact of climate change in Seoul City's hydraulic facility design standards (서울시 수리시설 설계기준의 기후변화 영향 고려를 위한 미래강우시나리오 산정)

  • Yoon, Sun-Kwon;Lee, Taesam;Seong, Kiyoung;Ahn, Yujin
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.6
    • /
    • pp.419-431
    • /
    • 2021
  • In Seoul, it has been confirmed that the duration of rainfall is shortened and the frequency and intensity of heavy rains are increasing with a changing climate. In addition, due to high population density and urbanization in most areas, floods frequently occur in flood-prone areas for the increase in impermeable areas. Furthermore, the Seoul City is pursuing various projects such as structural and non-structural measures to resolve flood-prone areas. A disaster prevention performance target was set in consideration of the climate change impact of future precipitation, and this study conducted to reduce the overall flood damage in Seoul for the long-term. In this study, 29 GCMs with RCP4.5 and RCP8.5 scenarios were used for spatial and temporal disaggregation, and we also considered for 3 research periods, which is short-term (2006-2040, P1), mid-term (2041-2070, P2), and long-term (2071-2100, P3), respectively. For spatial downscaling, daily data of GCM was processed through Quantile Mapping based on the rainfall of the Seoul station managed by the Korea Meteorological Administration and for temporal downscaling, daily data were downscaled to hourly data through k-nearest neighbor resampling and nonparametric temporal detailing techniques using genetic algorithms. Through temporal downscaling, 100 detailed scenarios were calculated for each GCM scenario, and the IDF curve was calculated based on a total of 2,900 detailed scenarios, and by averaging this, the change in the future extreme rainfall was calculated. As a result, it was confirmed that the probability of rainfall for a duration of 100 years and a duration of 1 hour increased by 8 to 16% in the RCP4.5 scenario, and increased by 7 to 26% in the RCP8.5 scenario. Based on the results of this study, the amount of rainfall designed to prepare for future climate change in Seoul was estimated and if can be used to establish purpose-wise water related disaster prevention policies.

Conjunction Assessments of the Satellites Transported by KSLV-II and Preparation of the Countermeasure for Possible Events in Timeline (누리호 탑재 위성들의 충돌위험의 예측 및 향후 상황의 대응을 위한 분석)

  • Shawn Seunghwan Choi;Peter Joonghyung Ryu;John Kim;Lowell Kim;Chris Sheen;Yongil Kim;Jaejin Lee;Sunghwan Choi;Jae Wook Song;Hae-Dong Kim;Misoon Mah;Douglas Deok-Soo Kim
    • Journal of Space Technology and Applications
    • /
    • v.3 no.2
    • /
    • pp.118-143
    • /
    • 2023
  • Space is becoming more commercialized. Despite of its delayed start-up, space activities in Korea are attracting more nation-wide supports from both investors and government. May 25, 2023, KSLV II, also called Nuri, successfully transported, and inserted seven satellites to a sun-synchronous orbit of 550 km altitude. However, Starlink has over 4,000 satellites around this altitude for its commercial activities. Hence, it is necessary for us to constantly monitor the collision risks of these satellites against resident space objects including Starlink. Here we report a quantitative research output regarding the conjunctions, particularly between the Nuri satellites and Starlink. Our calculation shows that, on average, three times everyday, the Nuri satellites encounter Starlink within 1 km distance with the probability of collision higher than 1.0E-5. A comparative study with KOMPSAT-5, also called Arirang-5, shows that its distance of closest approach distribution significantly differs from those of Nuri satellites. We also report a quantitative analysis of collision-avoiding maneuver cost of Starlink satellites and a strategy for Korea, being a delayed starter, to speed up to position itself in the space leading countries. We used the AstroOne program for analyses and compared its output with that of Socrates Plus of Celestrak. The two line element data was used for computation.

Analysis of Twitter for 2012 South Korea Presidential Election by Text Mining Techniques (텍스트 마이닝을 이용한 2012년 한국대선 관련 트위터 분석)

  • Bae, Jung-Hwan;Son, Ji-Eun;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.141-156
    • /
    • 2013
  • Social media is a representative form of the Web 2.0 that shapes the change of a user's information behavior by allowing users to produce their own contents without any expert skills. In particular, as a new communication medium, it has a profound impact on the social change by enabling users to communicate with the masses and acquaintances their opinions and thoughts. Social media data plays a significant role in an emerging Big Data arena. A variety of research areas such as social network analysis, opinion mining, and so on, therefore, have paid attention to discover meaningful information from vast amounts of data buried in social media. Social media has recently become main foci to the field of Information Retrieval and Text Mining because not only it produces massive unstructured textual data in real-time but also it serves as an influential channel for opinion leading. But most of the previous studies have adopted broad-brush and limited approaches. These approaches have made it difficult to find and analyze new information. To overcome these limitations, we developed a real-time Twitter trend mining system to capture the trend in real-time processing big stream datasets of Twitter. The system offers the functions of term co-occurrence retrieval, visualization of Twitter users by query, similarity calculation between two users, topic modeling to keep track of changes of topical trend, and mention-based user network analysis. In addition, we conducted a case study on the 2012 Korean presidential election. We collected 1,737,969 tweets which contain candidates' name and election on Twitter in Korea (http://www.twitter.com/) for one month in 2012 (October 1 to October 31). The case study shows that the system provides useful information and detects the trend of society effectively. The system also retrieves the list of terms co-occurred by given query terms. We compare the results of term co-occurrence retrieval by giving influential candidates' name, 'Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn' as query terms. General terms which are related to presidential election such as 'Presidential Election', 'Proclamation in Support', Public opinion poll' appear frequently. Also the results show specific terms that differentiate each candidate's feature such as 'Park Jung Hee' and 'Yuk Young Su' from the query 'Guen Hae Park', 'a single candidacy agreement' and 'Time of voting extension' from the query 'Jae In Moon' and 'a single candidacy agreement' and 'down contract' from the query 'Chul Su Ahn'. Our system not only extracts 10 topics along with related terms but also shows topics' dynamic changes over time by employing the multinomial Latent Dirichlet Allocation technique. Each topic can show one of two types of patterns-Rising tendency and Falling tendencydepending on the change of the probability distribution. To determine the relationship between topic trends in Twitter and social issues in the real world, we compare topic trends with related news articles. We are able to identify that Twitter can track the issue faster than the other media, newspapers. The user network in Twitter is different from those of other social media because of distinctive characteristics of making relationships in Twitter. Twitter users can make their relationships by exchanging mentions. We visualize and analyze mention based networks of 136,754 users. We put three candidates' name as query terms-Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn'. The results show that Twitter users mention all candidates' name regardless of their political tendencies. This case study discloses that Twitter could be an effective tool to detect and predict dynamic changes of social issues, and mention-based user networks could show different aspects of user behavior as a unique network that is uniquely found in Twitter.