• Title/Summary/Keyword: nonparametric model

Search Result 281, Processing Time 0.027 seconds

Impact of the Crossed-Structures Installed in Streams and Prediction of Fish Abundance in the Seomjin River System, Korea (하천에 설치된 횡구조물의 영향 및 섬진강 수계의 어류 풍부도 예측)

  • Moon, Woon Ki;Noh, Da Hye;Yoo, Jae Sang;Lim, O Young;Kim, Myoung Chul;Kim, Ji Hye;Lee, Jeong Min;Kim, Jai Ku
    • Ecology and Resilient Infrastructure
    • /
    • v.9 no.2
    • /
    • pp.100-106
    • /
    • 2022
  • The relationships between river length and weir density versus fish species observed were analyzed for 210 local rivers in the Seomjin River system (SJR). A nonlinear exponential relationship between river length and number of fish species were observed. Model coefficient was 0.03 and coefficient of determinant (R2) was 0.59, meaning that about 59.0% of total variance was explained by river length variable. Predicted value by model and observed number of species showed a difference. About 110 local rivers (about 52.4%) showed lower value than predictive value. The average index of weir's density (IWD) in the SJR was about 2.7/km, which was significantly higher than that of other river basins. As a result of nonparametric 2-Kimensional Kolmogorov-Smirnov (2-DKS) analysis based on the IWD, the threshold value affecting fish diversity was about 2.5/km (Dmax=0.048, p<0.05). Above the threshold value, it means that the number of fish species would be decreased. In fact, the ratio of the expected species to the observed species was lowered to less than 70%, when the IWD is higher than the threshold value. To maintain aquatic ecological connectivity in future, it is necessary to manage IWD below the threshold value.

Machine learning-based corporate default risk prediction model verification and policy recommendation: Focusing on improvement through stacking ensemble model (머신러닝 기반 기업부도위험 예측모델 검증 및 정책적 제언: 스태킹 앙상블 모델을 통한 개선을 중심으로)

  • Eom, Haneul;Kim, Jaeseong;Choi, Sangok
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.105-129
    • /
    • 2020
  • This study uses corporate data from 2012 to 2018 when K-IFRS was applied in earnest to predict default risks. The data used in the analysis totaled 10,545 rows, consisting of 160 columns including 38 in the statement of financial position, 26 in the statement of comprehensive income, 11 in the statement of cash flows, and 76 in the index of financial ratios. Unlike most previous prior studies used the default event as the basis for learning about default risk, this study calculated default risk using the market capitalization and stock price volatility of each company based on the Merton model. Through this, it was able to solve the problem of data imbalance due to the scarcity of default events, which had been pointed out as the limitation of the existing methodology, and the problem of reflecting the difference in default risk that exists within ordinary companies. Because learning was conducted only by using corporate information available to unlisted companies, default risks of unlisted companies without stock price information can be appropriately derived. Through this, it can provide stable default risk assessment services to unlisted companies that are difficult to determine proper default risk with traditional credit rating models such as small and medium-sized companies and startups. Although there has been an active study of predicting corporate default risks using machine learning recently, model bias issues exist because most studies are making predictions based on a single model. Stable and reliable valuation methodology is required for the calculation of default risk, given that the entity's default risk information is very widely utilized in the market and the sensitivity to the difference in default risk is high. Also, Strict standards are also required for methods of calculation. The credit rating method stipulated by the Financial Services Commission in the Financial Investment Regulations calls for the preparation of evaluation methods, including verification of the adequacy of evaluation methods, in consideration of past statistical data and experiences on credit ratings and changes in future market conditions. This study allowed the reduction of individual models' bias by utilizing stacking ensemble techniques that synthesize various machine learning models. This allows us to capture complex nonlinear relationships between default risk and various corporate information and maximize the advantages of machine learning-based default risk prediction models that take less time to calculate. To calculate forecasts by sub model to be used as input data for the Stacking Ensemble model, training data were divided into seven pieces, and sub-models were trained in a divided set to produce forecasts. To compare the predictive power of the Stacking Ensemble model, Random Forest, MLP, and CNN models were trained with full training data, then the predictive power of each model was verified on the test set. The analysis showed that the Stacking Ensemble model exceeded the predictive power of the Random Forest model, which had the best performance on a single model. Next, to check for statistically significant differences between the Stacking Ensemble model and the forecasts for each individual model, the Pair between the Stacking Ensemble model and each individual model was constructed. Because the results of the Shapiro-wilk normality test also showed that all Pair did not follow normality, Using the nonparametric method wilcoxon rank sum test, we checked whether the two model forecasts that make up the Pair showed statistically significant differences. The analysis showed that the forecasts of the Staging Ensemble model showed statistically significant differences from those of the MLP model and CNN model. In addition, this study can provide a methodology that allows existing credit rating agencies to apply machine learning-based bankruptcy risk prediction methodologies, given that traditional credit rating models can also be reflected as sub-models to calculate the final default probability. Also, the Stacking Ensemble techniques proposed in this study can help design to meet the requirements of the Financial Investment Business Regulations through the combination of various sub-models. We hope that this research will be used as a resource to increase practical use by overcoming and improving the limitations of existing machine learning-based models.

Estimating Willingness to Pay for the Tap Water Quality Improvement in Busan Using Nonparametric Approach (비모수추정법에 의한 부산시 가정용수 수질개선에 대한 지불의사액 추정)

  • Pyo, Hee-Dong;Park, Cheol-Hyung;Choo, Jae-Wook
    • Journal of Korea Water Resources Association
    • /
    • v.44 no.2
    • /
    • pp.125-134
    • /
    • 2011
  • The paper is to estimate willingness-to-pay (WTP) for residential water quality improvement in Busan, using non-parametric approach. There are several significant advantages of non-parametric approach, compared to parametric methods. That is, no probability distribution assumption is necessary so that there are no needs to assume or test goodness of fit, model specification and heteroscedasticity statistically. For the reliability and the validity of contingent valuation method a survey was conducted for 665 respondents, who were sampled by stratified random sampling method, by personal interview method. The result of mean WTP for residential water quality improvement in Busan was estimated to be 3,190 won to 3,331 won per month per household, while median WTP being 1,750 won. Provided that our sample is broadly representative of the Busan's population, an estimate of the annual aggregated benefit of residential water improvement for all Busan households is approximately 50.2 billion won in case of mean WTP or 27.5 billion won in case of median WTP.

A Test for Nonlinear Causality and Its Application to Money, Production and Prices (통화(通貨)·생산(生産)·물가(物價)의 비선형인과관계(非線型因果關係) 검정(檢定))

  • Baek, Ehung-gi
    • KDI Journal of Economic Policy
    • /
    • v.13 no.4
    • /
    • pp.117-140
    • /
    • 1991
  • The purpose of this paper is primarily to introduce a nonparametric statistical tool developed by Baek and Brock to detect a unidirectional causal ordering between two economic variables and apply it to interesting macroeconomic relationships among money, production and prices. It can be applied to any other causal structure, for instance, defense spending and economic performance, stock market index and market interest rates etc. A key building block of the test for nonlinear Granger causality used in this paper is the correlation. The main emphasis is put on nonlinear causal structure rather than a linear one because the conventional F-test provides high power against the linear causal relationship. Based on asymptotic normality of our test statistic, the nonlinear causality test is finally derived. Size of the test is reported for some parameters. When it is applied to a money, production and prices model, some evidences of nonlinear causality are found by the corrected size of the test. For instance, nonlinear causal relationships between production and prices are demonstrated in both directions, however, these results were ignored by the conventional F-test. A similar results between money and prices are obtained at high lag variables.

  • PDF

The Effect of PM10 on Respiratory-related Admission in Seoul (서울지역의 미세먼지가 호흡기계 질환으로 인한 병원입원에 미치는 영향)

  • Seo, Ju-Hee;Ha, Eun-Hee;Lee, Bo-Eun;Park, Hye-Sook;Kim, Ho;Hong, Yun-Chul;Yi, Ok-Hee
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.22 no.5
    • /
    • pp.564-573
    • /
    • 2006
  • This study was performed to examine the effect of particulate matter less than 10 ${\mu}m$ in diameter($PM_{10}$) on respiratory-related admission in Seoul, 1999. Daily counts of respiratory-related admission were analyzed by generalized additive model with adjustment for effects of air temperature, humidity, and day of the week as confounders in a nonparametric approach. The results follow associations between $PM_{10}$ and asthma, acute upper respiratory disease, acute lower respiratory disease, pneumonia, and chronic respiratory disease. The relative risks were 1.30(95% CI=1.14$\sim$1.50) for pneumonia, 1.18(95% CI=1.01$\sim$1.37) for acute lower respiratory disease in less than 15 years, respectively. The relative risks were 1.85(95% CI=1.22$\sim$2.81) for acute lower respiratory disease, 1.28(95% CI=1.04$\sim$1.57) for asthma, 1.25(95% CI=1.01$\sim$1.54) for pneumonia and 1.19(95% CI=1.01$\sim$1.41) for acute upper respiratory disease in 15 to 64 years, respectively The relative risks were 1.54(95% CI=1.15$\sim$2.08) for asthma, 1.38(95% CI=1.06$\sim$l.80) for chronic respiratory disease in more than 65 years, respectively. The study showed that $PM_{10}$ was considerably affects daily counts of respiratory-related admission in Seoul, 1999 Statistically significant associations were mostly found in the adult group like If to 64 years. The highly relative risks come out in the elderly.

Using noise filtering and sufficient dimension reduction method on unstructured economic data (노이즈 필터링과 충분차원축소를 이용한 비정형 경제 데이터 활용에 대한 연구)

  • Jae Keun Yoo;Yujin Park;Beomseok Seo
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.2
    • /
    • pp.119-138
    • /
    • 2024
  • Text indicators are increasingly valuable in economic forecasting, but are often hindered by noise and high dimensionality. This study aims to explore post-processing techniques, specifically noise filtering and dimensionality reduction, to normalize text indicators and enhance their utility through empirical analysis. Predictive target variables for the empirical analysis include monthly leading index cyclical variations, BSI (business survey index) All industry sales performance, BSI All industry sales outlook, as well as quarterly real GDP SA (seasonally adjusted) growth rate and real GDP YoY (year-on-year) growth rate. This study explores the Hodrick and Prescott filter, which is widely used in econometrics for noise filtering, and employs sufficient dimension reduction, a nonparametric dimensionality reduction methodology, in conjunction with unstructured text data. The analysis results reveal that noise filtering of text indicators significantly improves predictive accuracy for both monthly and quarterly variables, particularly when the dataset is large. Moreover, this study demonstrated that applying dimensionality reduction further enhances predictive performance. These findings imply that post-processing techniques, such as noise filtering and dimensionality reduction, are crucial for enhancing the utility of text indicators and can contribute to improving the accuracy of economic forecasts.

Analysis of promising countries for export using parametric and non-parametric methods based on ERGM: Focusing on the case of information communication and home appliance industries (ERGM 기반의 모수적 및 비모수적 방법을 활용한 수출 유망국가 분석: 정보통신 및 가전 산업 사례를 중심으로)

  • Jun, Seung-pyo;Seo, Jinny;Yoo, Jae-Young
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.175-196
    • /
    • 2022
  • Information and communication and home appliance industries, which were one of South Korea's main industries, are gradually losing their export share as their export competitiveness is weakening. This study objectively analyzed export competitiveness and suggested export-promising countries in order to help South Korea's information communication and home appliance industries improve exports. In this study, network properties, centrality, and structural hole analysis were performed during network analysis to evaluate export competitiveness. In order to select promising export countries, we proposed a new variable that can take into account the characteristics of an already established International Trade Network (ITN), that is, the Global Value Chain (GVC), in addition to the existing economic factors. The conditional log-odds for individual links derived from the Exponential Random Graph Model (ERGM) in the analysis of the cross-border trade network were assumed as a proxy variable that can indicate the export potential. In consideration of the possibility of ERGM linkage, a parametric approach and a non-parametric approach were used to recommend export-promising countries, respectively. In the parametric method, a regression analysis model was developed to predict the export value of the information and communication and home appliance industries in South Korea by additionally considering the link-specific characteristics of the network derived from the ERGM to the existing economic factors. Also, in the non-parametric approach, an abnormality detection algorithm based on the clustering method was used, and a promising export country was proposed as a method of finding outliers that deviate from two peers. According to the research results, the structural characteristic of the export network of the industry was a network with high transferability. Also, according to the centrality analysis result, South Korea's influence on exports was weak compared to its size, and the structural hole analysis result showed that export efficiency was weak. According to the model for recommending promising exporting countries proposed by this study, in parametric analysis, Iran, Ireland, North Macedonia, Angola, and Pakistan were promising exporting countries, and in nonparametric analysis, Qatar, Luxembourg, Ireland, North Macedonia and Pakistan were analyzed as promising exporting countries. There were differences in some countries in the two models. The results of this study revealed that the export competitiveness of South Korea's information and communication and home appliance industries in GVC was not high compared to the size of exports, and thus showed that exports could be further reduced. In addition, this study is meaningful in that it proposed a method to find promising export countries by considering GVC networks with other countries as a way to increase export competitiveness. This study showed that, from a policy point of view, the international trade network of the information communication and home appliance industries has an important mutual relationship, and although transferability is high, it may not be easily expanded to a three-party relationship. In addition, it was confirmed that South Korea's export competitiveness or status was lower than the export size ranking. This paper suggested that in order to improve the low out-degree centrality, it is necessary to increase exports to Italy or Poland, which had significantly higher in-degrees. In addition, we argued that in order to improve the centrality of out-closeness, it is necessary to increase exports to countries with particularly high in-closeness. In particular, it was analyzed that Morocco, UAE, Argentina, Russia, and Canada should pay attention as export countries. This study also provided practical implications for companies expecting to expand exports. The results of this study argue that companies expecting export expansion need to pay attention to countries with a relatively high potential for export expansion compared to the existing export volume by country. In particular, for companies that export daily necessities, countries that should pay attention to the population are presented, and for companies that export high-end or durable products, countries with high GDP, or purchasing power, relatively low exports are presented. Since the process and results of this study can be easily extended and applied to other industries, it is also expected to develop services that utilize the results of this study in the public sector.

The Relationship Between Son Preference and Fertility (남아 선호와 출산력간의 관계)

  • 이성용
    • Korea journal of population studies
    • /
    • v.26 no.1
    • /
    • pp.31-57
    • /
    • 2003
  • This study is intended to examine (l)whether the value of son-for example, old age security and succession of family lineage- causing son preference in the traditional society can be explained at the individual level, (2)whether women without son in the son preference country continue her childbearing until having at least one son or give up the desire of having a son at a certain level. To accomplish these purposes, the 1974 Korean National Fertility Survey data are analyzed by the quadratic hazard models controlling unobserved heterogeneity. Unlike ordinary regression model, even omitted variables that affect hazard rates and are uncorrelated with the included independent variables can distort the parameter estimates in the hazard model. Therefore the nonparametric maximum likelihood estimator(NPMLE) of a mixing distribution developed by Heckman and Singer is used to control unobserved heterogeneity. Based on the statistical result in this study, the value of son causing son preference is determined at the societal level, not at the individual level. And Korean women without a son did not continue endlessly childbearing during child bearing ages until having a son. In general, they gave up the desire having a son when she had born six daughters continuously. Thus, 30-40 years ago, the number of daughters that women without a son giving up the desire of son was six, which is about the level of total fertility rate during 1960s. In these days, we can often see many women who have only two or three daughters and do not any son. This means that the level of giving up the desire of son, which is one factor representing the strength of son preference, becomes lower. If the strength of son preference did not become much weaker, then the fertility rates in Korea could not reach the below replacement level.

Analysis of the Efficiency of Gyeonggi-do Senior Welfare Centers by DEA Model (DEA를 이용한 경기도 노인복지관 효율성 분석)

  • Kim, Keum Hwan;Pak, Ae Kyung;Ryu, Seo Hyun;Lee, Nam Sik
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.8 no.3
    • /
    • pp.165-177
    • /
    • 2013
  • The purpose of this study was to examine the efficiency of senior welfare centers and the cause of differences among senior welfare centers in that regard, and to investigate influential factors for the differences in efficiency and the size of the influence of the factors. What methods would be effective at assessing the efficiency of senior welfare centers by taking into account their circumstances was reviewed, andpost-hoc analyses were made by using data envelopment analysis(DEA) and DAE/AP Modified prosthetic which were useful tools to evaluate relative efficiency. After 20 senior welfare centers located in Gyeonggi-do were selected, their yearly operating data of 2009 were utilized. The purpose of this study was to examine the efficiency of senior welfare centers. The evaluation data released by the Gyeonggi Welfare Foundation were analyzed by DEA, which is one of nonparametric statistics, and it was possible to obtain significant results on the regional operating efficiency of social welfare centers in 14 metropolitan cities and provinces, the causes and degree of their inefficiency and what areas one could refer to. As the data for the counties were utilized in this study, it's not quite possible to produce accurate results on the relative efficiency of senior welfare centers, but this study could be said to be of significance in that it suggested how to evaluate the overall operating efficiency of senior welfare centers in the counties involving the degree of their operating inefficiency, what improvements should be made and what reference groups there might be and provided information on the usefulness of the DEA model.

  • PDF

Management Efficiency of Chestnut-Cultivating Households in Chungnam Province (충남지역 밤나무 재배 임가의 경영 효율성 분석)

  • Won, Hyun-Kyu;Jeon, Jun-Heon;Yoo, Byoung-Il;Lee, Seong-Youn;Lee, Jung-Min;Ji, Dong-Hyun
    • Journal of Korean Society of Forest Science
    • /
    • v.102 no.3
    • /
    • pp.390-397
    • /
    • 2013
  • The study, utilizing a data envelopment analysis (DEA) which is one of the nonparametric estimation methods, aims to evaluate the management efficiency of chestnut tree cultivators in such provinces in Chungchungnam-do as Cheong-yang, Gong-ju, Bu-yeo and so on. The analysis data of this study is based on inputs and outputs of 20 forestry households surveyed in the 2012 survey titled 'A Study on Current Level and Condition of Chestnut Cultivation and Management', which was conducted from March 2012 to October 2012. The elements of inputs are composed of management cost, harvesting cost, material cost, non-operation expenses and cultivation area, while the element of output is a gross margin only. Then the study analyzes a technical efficiency, a puretechnical efficiency and a scale efficiency using CCR and BCC model among DEA methods. Based on that, it also provides improvement methods for forestry households that turned out to be inefficient. In order to verify the result of DEA analysis, the study additionally compares a result of this efficiency study with that of chestnuts management standard diagnostic table. According to the result, the average value of technical efficiency analyzed was 0.667, proving to be inefficient in general. Given that the average value of pure-technical efficiency was 0.944 and that of scale efficiency was 0.703, it can be inferred that inefficiency exists in the field of scale, not in the field of cultivation techniques. As for forestry households with the efficiency score of 1, it is shown that there were 6 households that recorded 1 in the technical efficiency field and 13 households that recorded 1 in the pure technical efficiency. Meanwhile, there were 6 households that recorded 1 in all of the three aspects. In the comparison with the scores from chestnuts management standard diagnostic table, there were 5 households made a high score of over 80, among which are 3 households with score 1 in the technical efficiency. Also, the results of this study and the chestnuts management standard diagnostic table are proved to have the same result, both of them showing the same households that recorded the highest score and the lowest score. This means the management efficiency evaluation using DEA can be applied to the fieldwork along with the chestnuts management standard diagnostic table.