• Title/Summary/Keyword: curve estimation

Search Result 973, Processing Time 0.022 seconds

Dynamic forecasts of bankruptcy with Recurrent Neural Network model (RNN(Recurrent Neural Network)을 이용한 기업부도예측모형에서 회계정보의 동적 변화 연구)

  • Kwon, Hyukkun;Lee, Dongkyu;Shin, Minsoo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.139-153
    • /
    • 2017
  • Corporate bankruptcy can cause great losses not only to stakeholders but also to many related sectors in society. Through the economic crises, bankruptcy have increased and bankruptcy prediction models have become more and more important. Therefore, corporate bankruptcy has been regarded as one of the major topics of research in business management. Also, many studies in the industry are in progress and important. Previous studies attempted to utilize various methodologies to improve the bankruptcy prediction accuracy and to resolve the overfitting problem, such as Multivariate Discriminant Analysis (MDA), Generalized Linear Model (GLM). These methods are based on statistics. Recently, researchers have used machine learning methodologies such as Support Vector Machine (SVM), Artificial Neural Network (ANN). Furthermore, fuzzy theory and genetic algorithms were used. Because of this change, many of bankruptcy models are developed. Also, performance has been improved. In general, the company's financial and accounting information will change over time. Likewise, the market situation also changes, so there are many difficulties in predicting bankruptcy only with information at a certain point in time. However, even though traditional research has problems that don't take into account the time effect, dynamic model has not been studied much. When we ignore the time effect, we get the biased results. So the static model may not be suitable for predicting bankruptcy. Thus, using the dynamic model, there is a possibility that bankruptcy prediction model is improved. In this paper, we propose RNN (Recurrent Neural Network) which is one of the deep learning methodologies. The RNN learns time series data and the performance is known to be good. Prior to experiment, we selected non-financial firms listed on the KOSPI, KOSDAQ and KONEX markets from 2010 to 2016 for the estimation of the bankruptcy prediction model and the comparison of forecasting performance. In order to prevent a mistake of predicting bankruptcy by using the financial information already reflected in the deterioration of the financial condition of the company, the financial information was collected with a lag of two years, and the default period was defined from January to December of the year. Then we defined the bankruptcy. The bankruptcy we defined is the abolition of the listing due to sluggish earnings. We confirmed abolition of the list at KIND that is corporate stock information website. Then we selected variables at previous papers. The first set of variables are Z-score variables. These variables have become traditional variables in predicting bankruptcy. The second set of variables are dynamic variable set. Finally we selected 240 normal companies and 226 bankrupt companies at the first variable set. Likewise, we selected 229 normal companies and 226 bankrupt companies at the second variable set. We created a model that reflects dynamic changes in time-series financial data and by comparing the suggested model with the analysis of existing bankruptcy predictive models, we found that the suggested model could help to improve the accuracy of bankruptcy predictions. We used financial data in KIS Value (Financial database) and selected Multivariate Discriminant Analysis (MDA), Generalized Linear Model called logistic regression (GLM), Support Vector Machine (SVM), Artificial Neural Network (ANN) model as benchmark. The result of the experiment proved that RNN's performance was better than comparative model. The accuracy of RNN was high in both sets of variables and the Area Under the Curve (AUC) value was also high. Also when we saw the hit-ratio table, the ratio of RNNs that predicted a poor company to be bankrupt was higher than that of other comparative models. However the limitation of this paper is that an overfitting problem occurs during RNN learning. But we expect to be able to solve the overfitting problem by selecting more learning data and appropriate variables. From these result, it is expected that this research will contribute to the development of a bankruptcy prediction by proposing a new dynamic model.

DEVELOPMENT OF STATEWIDE TRUCK TRAFFIC FORECASTING METHOD BY USING LIMITED O-D SURVEY DATA (한정된 O-D조사자료를 이용한 주 전체의 트럭교통예측방법 개발)

  • 박만배
    • Proceedings of the KOR-KST Conference
    • /
    • 1995.02a
    • /
    • pp.101-113
    • /
    • 1995
  • The objective of this research is to test the feasibility of developing a statewide truck traffic forecasting methodology for Wisconsin by using Origin-Destination surveys, traffic counts, classification counts, and other data that are routinely collected by the Wisconsin Department of Transportation (WisDOT). Development of a feasible model will permit estimation of future truck traffic for every major link in the network. This will provide the basis for improved estimation of future pavement deterioration. Pavement damage rises exponentially as axle weight increases, and trucks are responsible for most of the traffic-induced damage to pavement. Consequently, forecasts of truck traffic are critical to pavement management systems. The pavement Management Decision Supporting System (PMDSS) prepared by WisDOT in May 1990 combines pavement inventory and performance data with a knowledge base consisting of rules for evaluation, problem identification and rehabilitation recommendation. Without a r.easonable truck traffic forecasting methodology, PMDSS is not able to project pavement performance trends in order to make assessment and recommendations in the future years. However, none of WisDOT's existing forecasting methodologies has been designed specifically for predicting truck movements on a statewide highway network. For this research, the Origin-Destination survey data avaiiable from WisDOT, including two stateline areas, one county, and five cities, are analyzed and the zone-to'||'&'||'not;zone truck trip tables are developed. The resulting Origin-Destination Trip Length Frequency (00 TLF) distributions by trip type are applied to the Gravity Model (GM) for comparison with comparable TLFs from the GM. The gravity model is calibrated to obtain friction factor curves for the three trip types, Internal-Internal (I-I), Internal-External (I-E), and External-External (E-E). ~oth "macro-scale" calibration and "micro-scale" calibration are performed. The comparison of the statewide GM TLF with the 00 TLF for the macro-scale calibration does not provide suitable results because the available 00 survey data do not represent an unbiased sample of statewide truck trips. For the "micro-scale" calibration, "partial" GM trip tables that correspond to the 00 survey trip tables are extracted from the full statewide GM trip table. These "partial" GM trip tables are then merged and a partial GM TLF is created. The GM friction factor curves are adjusted until the partial GM TLF matches the 00 TLF. Three friction factor curves, one for each trip type, resulting from the micro-scale calibration produce a reasonable GM truck trip model. A key methodological issue for GM. calibration involves the use of multiple friction factor curves versus a single friction factor curve for each trip type in order to estimate truck trips with reasonable accuracy. A single friction factor curve for each of the three trip types was found to reproduce the 00 TLFs from the calibration data base. Given the very limited trip generation data available for this research, additional refinement of the gravity model using multiple mction factor curves for each trip type was not warranted. In the traditional urban transportation planning studies, the zonal trip productions and attractions and region-wide OD TLFs are available. However, for this research, the information available for the development .of the GM model is limited to Ground Counts (GC) and a limited set ofOD TLFs. The GM is calibrated using the limited OD data, but the OD data are not adequate to obtain good estimates of truck trip productions and attractions .. Consequently, zonal productions and attractions are estimated using zonal population as a first approximation. Then, Selected Link based (SELINK) analyses are used to adjust the productions and attractions and possibly recalibrate the GM. The SELINK adjustment process involves identifying the origins and destinations of all truck trips that are assigned to a specified "selected link" as the result of a standard traffic assignment. A link adjustment factor is computed as the ratio of the actual volume for the link (ground count) to the total assigned volume. This link adjustment factor is then applied to all of the origin and destination zones of the trips using that "selected link". Selected link based analyses are conducted by using both 16 selected links and 32 selected links. The result of SELINK analysis by u~ing 32 selected links provides the least %RMSE in the screenline volume analysis. In addition, the stability of the GM truck estimating model is preserved by using 32 selected links with three SELINK adjustments, that is, the GM remains calibrated despite substantial changes in the input productions and attractions. The coverage of zones provided by 32 selected links is satisfactory. Increasing the number of repetitions beyond four is not reasonable because the stability of GM model in reproducing the OD TLF reaches its limits. The total volume of truck traffic captured by 32 selected links is 107% of total trip productions. But more importantly, ~ELINK adjustment factors for all of the zones can be computed. Evaluation of the travel demand model resulting from the SELINK adjustments is conducted by using screenline volume analysis, functional class and route specific volume analysis, area specific volume analysis, production and attraction analysis, and Vehicle Miles of Travel (VMT) analysis. Screenline volume analysis by using four screenlines with 28 check points are used for evaluation of the adequacy of the overall model. The total trucks crossing the screenlines are compared to the ground count totals. L V/GC ratios of 0.958 by using 32 selected links and 1.001 by using 16 selected links are obtained. The %RM:SE for the four screenlines is inversely proportional to the average ground count totals by screenline .. The magnitude of %RM:SE for the four screenlines resulting from the fourth and last GM run by using 32 and 16 selected links is 22% and 31 % respectively. These results are similar to the overall %RMSE achieved for the 32 and 16 selected links themselves of 19% and 33% respectively. This implies that the SELINICanalysis results are reasonable for all sections of the state.Functional class and route specific volume analysis is possible by using the available 154 classification count check points. The truck traffic crossing the Interstate highways (ISH) with 37 check points, the US highways (USH) with 50 check points, and the State highways (STH) with 67 check points is compared to the actual ground count totals. The magnitude of the overall link volume to ground count ratio by route does not provide any specific pattern of over or underestimate. However, the %R11SE for the ISH shows the least value while that for the STH shows the largest value. This pattern is consistent with the screenline analysis and the overall relationship between %RMSE and ground count volume groups. Area specific volume analysis provides another broad statewide measure of the performance of the overall model. The truck traffic in the North area with 26 check points, the West area with 36 check points, the East area with 29 check points, and the South area with 64 check points are compared to the actual ground count totals. The four areas show similar results. No specific patterns in the L V/GC ratio by area are found. In addition, the %RMSE is computed for each of the four areas. The %RMSEs for the North, West, East, and South areas are 92%, 49%, 27%, and 35% respectively, whereas, the average ground counts are 481, 1383, 1532, and 3154 respectively. As for the screenline and volume range analyses, the %RMSE is inversely related to average link volume. 'The SELINK adjustments of productions and attractions resulted in a very substantial reduction in the total in-state zonal productions and attractions. The initial in-state zonal trip generation model can now be revised with a new trip production's trip rate (total adjusted productions/total population) and a new trip attraction's trip rate. Revised zonal production and attraction adjustment factors can then be developed that only reflect the impact of the SELINK adjustments that cause mcreases or , decreases from the revised zonal estimate of productions and attractions. Analysis of the revised production adjustment factors is conducted by plotting the factors on the state map. The east area of the state including the counties of Brown, Outagamie, Shawano, Wmnebago, Fond du Lac, Marathon shows comparatively large values of the revised adjustment factors. Overall, both small and large values of the revised adjustment factors are scattered around Wisconsin. This suggests that more independent variables beyond just 226; population are needed for the development of the heavy truck trip generation model. More independent variables including zonal employment data (office employees and manufacturing employees) by industry type, zonal private trucks 226; owned and zonal income data which are not available currently should be considered. A plot of frequency distribution of the in-state zones as a function of the revised production and attraction adjustment factors shows the overall " adjustment resulting from the SELINK analysis process. Overall, the revised SELINK adjustments show that the productions for many zones are reduced by, a factor of 0.5 to 0.8 while the productions for ~ relatively few zones are increased by factors from 1.1 to 4 with most of the factors in the 3.0 range. No obvious explanation for the frequency distribution could be found. The revised SELINK adjustments overall appear to be reasonable. The heavy truck VMT analysis is conducted by comparing the 1990 heavy truck VMT that is forecasted by the GM truck forecasting model, 2.975 billions, with the WisDOT computed data. This gives an estimate that is 18.3% less than the WisDOT computation of 3.642 billions of VMT. The WisDOT estimates are based on the sampling the link volumes for USH, 8TH, and CTH. This implies potential error in sampling the average link volume. The WisDOT estimate of heavy truck VMT cannot be tabulated by the three trip types, I-I, I-E ('||'&'||'pound;-I), and E-E. In contrast, the GM forecasting model shows that the proportion ofE-E VMT out of total VMT is 21.24%. In addition, tabulation of heavy truck VMT by route functional class shows that the proportion of truck traffic traversing the freeways and expressways is 76.5%. Only 14.1% of total freeway truck traffic is I-I trips, while 80% of total collector truck traffic is I-I trips. This implies that freeways are traversed mainly by I-E and E-E truck traffic while collectors are used mainly by I-I truck traffic. Other tabulations such as average heavy truck speed by trip type, average travel distance by trip type and the VMT distribution by trip type, route functional class and travel speed are useful information for highway planners to understand the characteristics of statewide heavy truck trip patternS. Heavy truck volumes for the target year 2010 are forecasted by using the GM truck forecasting model. Four scenarios are used. Fo~ better forecasting, ground count- based segment adjustment factors are developed and applied. ISH 90 '||'&'||' 94 and USH 41 are used as example routes. The forecasting results by using the ground count-based segment adjustment factors are satisfactory for long range planning purposes, but additional ground counts would be useful for USH 41. Sensitivity analysis provides estimates of the impacts of the alternative growth rates including information about changes in the trip types using key routes. The network'||'&'||'not;based GMcan easily model scenarios with different rates of growth in rural versus . . urban areas, small versus large cities, and in-state zones versus external stations. cities, and in-state zones versus external stations.

  • PDF

Studies on the Derivation of the Instantaneous Unit Hydrograph for Small Watersheds of Main River Systems in Korea (한국주요빙계의 소유역에 대한 순간단위권 유도에 관한 연구 (I))

  • 이순혁
    • Magazine of the Korean Society of Agricultural Engineers
    • /
    • v.19 no.1
    • /
    • pp.4296-4311
    • /
    • 1977
  • This study was conducted to derive an Instantaneous Unit Hydrograph for the accurate and reliable unitgraph which can be used to the estimation and control of flood for the development of agricultural water resources and rational design of hydraulic structures. Eight small watersheds were selected as studying basins from Han, Geum, Nakdong, Yeongsan and Inchon River systems which may be considered as a main river systems in Korea. The area of small watersheds are within the range of 85 to 470$\textrm{km}^2$. It is to derive an accurate Instantaneous Unit Hydrograph under the condition of having a short duration of heavy rain and uniform rainfall intensity with the basic and reliable data of rainfall records, pluviographs, records of river stages and of the main river systems mentioned above. Investigation was carried out for the relations between measurable unitgraph and watershed characteristics such as watershed area, A, river length L, and centroid distance of the watershed area, Lca. Especially, this study laid emphasis on the derivation and application of Instantaneous Unit Hydrograph (IUH) by applying Nash's conceptual model and by using an electronic computer. I U H by Nash's conceptual model and I U H by flood routing which can be applied to the ungaged small watersheds were derived and compared with each other to the observed unitgraph. 1 U H for each small watersheds can be solved by using an electronic computer. The results summarized for these studies are as follows; 1. Distribution of uniform rainfall intensity appears in the analysis for the temporal rainfall pattern of selected heavy rainfall event. 2. Mean value of recession constants, Kl, is 0.931 in all watersheds observed. 3. Time to peak discharge, Tp, occurs at the position of 0.02 Tb, base length of hlrdrograph with an indication of lower value than that in larger watersheds. 4. Peak discharge, Qp, in relation to the watershed area, A, and effective rainfall, R, is found to be {{{{ { Q}_{ p} = { 0.895} over { { A}^{0.145 } } }}}} AR having high significance of correlation coefficient, 0.927, between peak discharge, Qp, and effective rainfall, R. Design chart for the peak discharge (refer to Fig. 15) with watershed area and effective rainfall was established by the author. 5. The mean slopes of main streams within the range of 1.46 meters per kilometer to 13.6 meter per kilometer. These indicate higher slopes in the small watersheds than those in larger watersheds. Lengths of main streams are within the range of 9.4 kilometer to 41.75 kilometer, which can be regarded as a short distance. It is remarkable thing that the time of flood concentration was more rapid in the small watersheds than that in the other larger watersheds. 6. Length of main stream, L, in relation to the watershed area, A, is found to be L=2.044A0.48 having a high significance of correlation coefficient, 0.968. 7. Watershed lag, Lg, in hrs in relation to the watershed area, A, and length of main stream, L, was derived as Lg=3.228 A0.904 L-1.293 with a high significance. On the other hand, It was found that watershed lag, Lg, could also be expressed as {{{{Lg=0.247 { ( { LLca} over { SQRT { S} } )}^{ 0.604} }}}} in connection with the product of main stream length and the centroid length of the basin of the watershed area, LLca which could be expressed as a measure of the shape and the size of the watershed with the slopes except watershed area, A. But the latter showed a lower correlation than that of the former in the significance test. Therefore, it can be concluded that watershed lag, Lg, is more closely related with the such watersheds characteristics as watershed area and length of main stream in the small watersheds. Empirical formula for the peak discharge per unit area, qp, ㎥/sec/$\textrm{km}^2$, was derived as qp=10-0.389-0.0424Lg with a high significance, r=0.91. This indicates that the peak discharge per unit area of the unitgraph is in inverse proportion to the watershed lag time. 8. The base length of the unitgraph, Tb, in connection with the watershed lag, Lg, was extra.essed as {{{{ { T}_{ b} =1.14+0.564( { Lg} over {24 } )}}}} which has defined with a high significance. 9. For the derivation of IUH by applying linear conceptual model, the storage constant, K, with the length of main stream, L, and slopes, S, was adopted as {{{{K=0.1197( {L } over { SQRT {S } } )}}}} with a highly significant correlation coefficient, 0.90. Gamma function argument, N, derived with such watershed characteristics as watershed area, A, river length, L, centroid distance of the basin of the watershed area, Lca, and slopes, S, was found to be N=49.2 A1.481L-2.202 Lca-1.297 S-0.112 with a high significance having the F value, 4.83, through analysis of variance. 10. According to the linear conceptual model, Formular established in relation to the time distribution, Peak discharge and time to peak discharge for instantaneous Unit Hydrograph when unit effective rainfall of unitgraph and dimension of watershed area are applied as 10mm, and $\textrm{km}^2$ respectively are as follows; Time distribution of IUH {{{{u(0, t)= { 2.78A} over {K GAMMA (N) } { e}^{-t/k } { (t.K)}^{N-1 } }}}} (㎥/sec) Peak discharge of IUH {{{{ {u(0, t) }_{max } = { 2.78A} over {K GAMMA (N) } { e}^{-(N-1) } { (N-1)}^{N-1 } }}}} (㎥/sec) Time to peak discharge of IUH tp=(N-1)K (hrs) 11. Through mathematical analysis in the recession curve of Hydrograph, It was confirmed that empirical formula of Gamma function argument, N, had connection with recession constant, Kl, peak discharge, QP, and time to peak discharge, tp, as {{{{{ K'} over { { t}_{ p} } = { 1} over {N-1 } - { ln { t} over { { t}_{p } } } over {ln { Q} over { { Q}_{p } } } }}}} where {{{{K'= { 1} over { { lnK}_{1 } } }}}} 12. Linking the two, empirical formulars for storage constant, K, and Gamma function argument, N, into closer relations with each other, derivation of unit hydrograph for the ungaged small watersheds can be established by having formulars for the time distribution and peak discharge of IUH as follows. Time distribution of IUH u(0, t)=23.2 A L-1S1/2 F(N, K, t) (㎥/sec) where {{{{F(N, K, t)= { { e}^{-t/k } { (t/K)}^{N-1 } } over { GAMMA (N) } }}}} Peak discharge of IUH) u(0, t)max=23.2 A L-1S1/2 F(N) (㎥/sec) where {{{{F(N)= { { e}^{-(N-1) } { (N-1)}^{N-1 } } over { GAMMA (N) } }}}} 13. The base length of the Time-Area Diagram for the IUH was given by {{{{C=0.778 { ( { LLca} over { SQRT { S} } )}^{0.423 } }}}} with correlation coefficient, 0.85, which has an indication of the relations to the length of main stream, L, centroid distance of the basin of the watershed area, Lca, and slopes, S. 14. Relative errors in the peak discharge of the IUH by using linear conceptual model and IUH by routing showed to be 2.5 and 16.9 percent respectively to the peak of observed unitgraph. Therefore, it confirmed that the accuracy of IUH using linear conceptual model was approaching more closely to the observed unitgraph than that of the flood routing in the small watersheds.

  • PDF