• Title/Summary/Keyword: In-Sample Prediction

Search Result 555, Processing Time 0.03 seconds

Default Prediction for Real Estate Companies with Imbalanced Dataset

  • Dong, Yuan-Xiang;Xiao, Zhi;Xiao, Xue
    • Journal of Information Processing Systems
    • /
    • v.10 no.2
    • /
    • pp.314-333
    • /
    • 2014
  • When analyzing default predictions in real estate companies, the number of non-defaulted cases always greatly exceeds the defaulted ones, which creates the two-class imbalance problem. This lowers the ability of prediction models to distinguish the default sample. In order to avoid this sample selection bias and to improve the prediction model, this paper applies a minority sample generation approach to create new minority samples. The logistic regression, support vector machine (SVM) classification, and neural network (NN) classification use an imbalanced dataset. They were used as benchmarks with a single prediction model that used a balanced dataset corrected by the minority samples generation approach. Instead of using prediction-oriented tests and the overall accuracy, the true positive rate (TPR), the true negative rate (TNR), G-mean, and F-score are used to measure the performance of default prediction models for imbalanced dataset. In this paper, we describe an empirical experiment that used a sampling of 14 default and 315 non-default listed real estate companies in China and report that most results using single prediction models with a balanced dataset generated better results than an imbalanced dataset.

The Predictive Ability of Accruals with Respect to Future Cash Flows : In-sample versus Out-of-Sample Prediction (발생액의 미래 현금흐름 예측력 : 표본 내 예측 대 표본 외 예측)

  • Oh, Won-Sun;Kim, Dong-Chool
    • Management & Information Systems Review
    • /
    • v.28 no.3
    • /
    • pp.69-98
    • /
    • 2009
  • This study investigates in-sample and out-of-sample predictive abilities of accruals and accruals components with respect to future cash flows using models developed by Barth et al.(2001). In tests, data collected fromda62 Korean KOSPI and KOSDAQ listed firms for ccr4-2007 are used. Results of in-sample prediction tests are similar with those of Barth et al.(2001). Their accrual components model is better than other three models(NI only model, CF only model and NI-total accruals model) in future cash flows predictive ability. That is, in the case of in-sample prediction, accrual components excluding amortization have additional information contents for future cash flows. But in out-of-sample tests, the results are different. The model including operational cash flows(CF only model) shows best out-of-sample predictive ability with respect to future cash flows among above four prediction models. The accrual components model of Barth et al.(2001) has worst out-of-sample predictive ability. The results are robust to sensitivity analyses. In conclusion, we can't find the evidence that accruals and accrual components have predictive ability with respect to future cash flows in out-of-sample prediction tests. This results are consistent with results of Lev et al.(2005), and inconsistent with the belief of accounting standards formulating organizations such as FASB and KASB.

  • PDF

Severity-based Software Quality Prediction using Class Imbalanced Data

  • Hong, Euy-Seok;Park, Mi-Kyeong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.4
    • /
    • pp.73-80
    • /
    • 2016
  • Most fault prediction models have class imbalance problems because training data usually contains much more non-fault class modules than fault class ones. This imbalanced distribution makes it difficult for the models to learn the minor class module data. Data imbalance is much higher when severity-based fault prediction is used. This is because high severity fault modules is a smaller subset of the fault modules. In this paper, we propose severity-based models to solve these problems using the three sampling methods, Resample, SpreadSubSample and SMOTE. Empirical results show that Resample method has typical over-fit problems, and SpreadSubSample method cannot enhance the prediction performance of the models. Unlike two methods, SMOTE method shows good performance in terms of AUC and FNR values. Especially J48 decision tree model using SMOTE outperforms other prediction models.

Bayes Prediction for Small Area Estimation

  • Lee, Sang-Eun
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.2
    • /
    • pp.407-416
    • /
    • 2001
  • Sample surveys are usually designed and analyzed to produce estimates for a large area or populations. Therefore, for the small area estimations, sample sizes are often not large enough to give adequate precision. Several small area estimation methods were proposed in recent years concerning with sample sizes. Here, we will compare simple Bayesian approach with Bayesian prediction for small area estimation based on linear regression model. The performance of the proposed method was evaluated through unemployment population data form Economic Active Population(EAP) Survey.

  • PDF

A Study on the Distress Prediction in the Fishery Industry (수산기업의 부실화 요인 및 예측에 관한 연구)

  • Lee, Yun-Won;Jang, Chang-Ik;Hong, Jae-Beom
    • Proceedings of the Fisheries Business Administration Society of Korea Conference
    • /
    • 2007.12a
    • /
    • pp.167-184
    • /
    • 2007
  • The objectives of this paper are to identify the causes of the corporate distress and to develop a distress prediction model with the financial information in fishery industry. In this study, the corporate distress is defined as economic failure and technical insolvency. Economic failure occurs by reduction, shut-down, or change of the business and technical insolvency results from failure to pay the financial debt of companies. The 33 distressed firms from 1991 to 2003 were composed by 14 economic failure companies, 15 technical insolvency companies. 4 companies applied to the both cases. The analysis of distress prediction of fishery companies were accomplished according to the distress definition. The analysis was carried out as two steps. The first step was the univariate analysis, which was used for checking the prediction power of individual financial variable. The t-test is used to identify the differences in financial variables between the distressed group and the non-distressed group. The second step was to develop distress prediction model with logistic regression. The variables showed the significant difference in univariate analysis were selected as the prediction variables. The financial ratios, used in the logistic regression model, were selected by backward elimination method. To test stability of the distress prediction model, the whole sample was divided as three sub-samples, period 1(1990$\sim$1993), period 2(1994$\sim$1997), period 3(1998$\sim$2002). The final model built from whole sample appled each three sub-samples. The results of the logistic analysis were as follows. the growth, profitability, stability ratios showed the significant effect on the distress. the some different result was found in the sub-sample (economic failure and technical insolvency). The growth and the profitability were important to predict the economic failure. The profitability and the activity were important to predict technical insolvency. It means that profitability is the really important factor to the fishery companies.

  • PDF

A Study on the Distress Prediction in the Fishery Industry (수산기업의 부실화 요인과 그 예측에 관한 연구)

  • Jang, Chang-Ick;Lee, Yun-Weon;Hong, Jae-Bum
    • The Journal of Fisheries Business Administration
    • /
    • v.39 no.2
    • /
    • pp.61-79
    • /
    • 2008
  • The objectives of this paper are to identify the causes of the corporate distress and to develop a distress prediction model with the financial information in fishery industry. In this study, the corporate distress is defined as economic failure and technical insolvency. Economic failure occurs by reduction, shut - down, or change of the business and technical insolvency results from failure to pay the financial debt of companies. The 33 distressed firms from 1991 to 2003 were composed by 14 economic failure companies, 15 technical insolvency companies. 4 companies applied to the both cases. The analysis of distress prediction of fishery companies were accomplished according to the distress definition. The analysis was carried out as two steps. The first step was the univariate analysis, which was used for checking the prediction power of individual financial variable. The t - test is used to identify the differences in financial variables between the distressed group and the non - distressed group. The second step was to develop distress prediction model with logistic regression. The variables showed the significant difference in univariate analysis were selected as the prediction variables. The financial ratios, used in the logistic regression model, were selected by backward elimination method. To test stability of the distress prediction model, the whole sample was divided as three sub-samples, period 1(1990 - 1993), period 2(1994 - 1997), period 3(1998 - 2002). The final model built from whole sample appled each three sub - samples. The results of the logistic analysis were as follows. the growth, profitability, stability ratios showed the significant effect on the distress. the some different result was found in the sub - sample (economic failure and technical insolvency). The growth and the profitability were important to predict the economic failure. The profitability and the activity were important to predict technical insolvency. It means that profitability is the really important factor to the fishery companies.

  • PDF

Characteristics and Prediction of Shear Strength for Unsaturated Residual Soil (풍화잔적토의 불포화전단강도 예측 및 특성연구)

  • 이인모;성상규;양일순
    • Proceedings of the Korean Geotechical Society Conference
    • /
    • 2000.11a
    • /
    • pp.377-384
    • /
    • 2000
  • The characteristics and prediction model of the shear strength for unsaturated residual soils was studied. In order to investigate the influence of the initial water content on the shear strength, unsaturated triaxial tests were carried out varying the initial water content, and the applicability of existing prediction models for the unsaturated shear strength was testified. It was shown that the soil - water characteristic curve and the shear strength of the unsaturated soil varied with the change of the initial water content. A sample compacted in the lower initial water content needs a higher suction to get the same degree of saturation while the shear strength of a sample with the lower initial water content displays a lower value. In order to apply the existing prediction models of the unsaturated shear strength to granite residual soils, a correction coefficient, α, on the internal friction angle, ø'was added.

  • PDF

Estimating the Important Components in Three Different Sample Types of Soybean by Near Infrared Reflectance Spectroscopy

  • Lee, Ho-Sun;Kim, Jung-Bong;Lee, Young-Yi;Lee, Sok-Young;Gwag, Jae-Gyun;Baek, Hyung-Jin;Kim, Chung-Kon;Yoon, Mun-Sup
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.56 no.1
    • /
    • pp.88-93
    • /
    • 2011
  • This experiment was carried out to find suitable sample type for the more accurate prediction and non-destructive way in the application of near infrared reflectance spectroscopy (NIRS) technique for estimation the protein, total amino acids, and total isoflavone of soybean by comparing three different sample types, single seed, whole seeds, and milled seeds powder. The coefficient of determination in calibration ($R^2$) and coefficient of determination in cross-validation (1-VR) for three components analyzed using NIRS revealed that milled powder sample type yielded the highest, followed by single seed, and the whole seeds as the lowest. The coefficient of determination in calibration for single seed was moderately low($R^2$ 0.70-0.84), while the calibration equation developed with NIRS data scanned with whole seeds showed the lowest accuracy and reliability compared with other sample groups. The scatter plot for NIRS data versus the reference data of whole seeds showed the widest data cloud, in contrary with the milled powder type which showed flatter data cloud. By comparison of NIRS results for total isoflavone, total amino acids, and protein of soybean seeds with three sample types, the powder sample could be estimated for the most accurate prediction. However, based from the results, the use of single bean samples, without grinding the seeds and in consideration with NIRS application for more nondestructive and faster prediction, is proven to be a promising strategy for soybean component estimation using NIRS.

Scalable Extension of HEVC for Flexible High-Quality Digital Video Content Services

  • Lee, Hahyun;Kang, Jung Won;Lee, Jinho;Choi, Jin Soo;Kim, Jinwoong;Sim, Donggyu
    • ETRI Journal
    • /
    • v.35 no.6
    • /
    • pp.990-1000
    • /
    • 2013
  • This paper describes the scalable extension of High Efficiency Video Coding (HEVC) to provide flexible high-quality digital video content services. The proposed scalable codec is designed on multi-loop decoding architecture to support inter-layer sample prediction and inter-layer motion parameter prediction. Inter-layer sample prediction is enabled by inserting the reconstructed picture of the reference layer (RL) into the decoded picture buffer of the enhancement layer (EL). To reduce the motion parameter redundancies between layers, the motion parameter of the RL is used as one of the candidates in merge mode and motion vector prediction in the EL. The proposed scalable extension can support scalabilities with minimum changes to the HEVC and provide average Bj${\o}$ntegaard delta bitrate gains of about 24% for spatial scalability and of about 21% for SNR scalability compared to simulcast coding with HEVC.

Evaluating Distress Prediction Models for Food Service Franchise Industry (외식프랜차이즈기업 부실예측모형 예측력 평가)

  • KIM, Si-Joong
    • Journal of Distribution Science
    • /
    • v.17 no.11
    • /
    • pp.73-79
    • /
    • 2019
  • Purpose: The purpose of this study was evaluated to compare the predictive power of distress prediction models by using discriminant analysis method and logit analysis method for food service franchise industry in Korea. Research design, data and methodology: Forty-six food service franchise industry with high sales volume in the 2017 were selected as the sample food service franchise industry for analysis. The fourteen financial ratios for analysis were calculated from the data in the 2017 statement of financial position and income statement of forty-six food service franchise industry in Korea. The fourteen financial ratios were used as sample data and analyzed by t-test. As a result seven statistically significant independent variables were chosen. The analysis method of the distress prediction model was performed by logit analysis and multiple discriminant analysis. Results: The difference between the average value of fourteen financial ratios of forty-six food service franchise industry was tested through t-test in order to extract variables that are classified as top-leveled and failure food service franchise industry among the financial ratios. As a result of the univariate test appears that the variables which differentiate the top-leveled food service franchise industry to failure food service industry are income to stockholders' equity, operating income to sales, current ratio, net income to assets, cash flows from operating activities, growth rate of operating income, and total assets turnover. The statistical significances of the seven financial ratio independent variables were also confirmed by logit analysis and discriminant analysis. Conclusions: The analysis results of the prediction accuracy of each distress prediction model in this study showed that the forecast accuracy of the prediction model by the discriminant analysis method was 84.8% and 89.1% by the logit analysis method, indicating that the logit analysis method has higher distress predictability than the discriminant analysis method. Comparing the previous distress prediction capability, which ranges from 75% to 85% by discriminant analysis and logit analysis, this study's prediction capacity, which is 84.8% in the discriminant analysis, and 89.1% in logit analysis, is found to belong to the range of previous study's prediction capacity range and is considered high number.