• Title/Summary/Keyword: Statistical Model Validation

Search Result 261, Processing Time 0.026 seconds

Bankruptcy prediction using an improved bagging ensemble (개선된 배깅 앙상블을 활용한 기업부도예측)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.121-139
    • /
    • 2014
  • Predicting corporate failure has been an important topic in accounting and finance. The costs associated with bankruptcy are high, so the accuracy of bankruptcy prediction is greatly important for financial institutions. Lots of researchers have dealt with the topic associated with bankruptcy prediction in the past three decades. The current research attempts to use ensemble models for improving the performance of bankruptcy prediction. Ensemble classification is to combine individually trained classifiers in order to gain more accurate prediction than individual models. Ensemble techniques are shown to be very useful for improving the generalization ability of the classifier. Bagging is the most commonly used methods for constructing ensemble classifiers. In bagging, the different training data subsets are randomly drawn with replacement from the original training dataset. Base classifiers are trained on the different bootstrap samples. Instance selection is to select critical instances while deleting and removing irrelevant and harmful instances from the original set. Instance selection and bagging are quite well known in data mining. However, few studies have dealt with the integration of instance selection and bagging. This study proposes an improved bagging ensemble based on instance selection using genetic algorithms (GA) for improving the performance of SVM. GA is an efficient optimization procedure based on the theory of natural selection and evolution. GA uses the idea of survival of the fittest by progressively accepting better solutions to the problems. GA searches by maintaining a population of solutions from which better solutions are created rather than making incremental changes to a single solution to the problem. The initial solution population is generated randomly and evolves into the next generation by genetic operators such as selection, crossover and mutation. The solutions coded by strings are evaluated by the fitness function. The proposed model consists of two phases: GA based Instance Selection and Instance based Bagging. In the first phase, GA is used to select optimal instance subset that is used as input data of bagging model. In this study, the chromosome is encoded as a form of binary string for the instance subset. In this phase, the population size was set to 100 while maximum number of generations was set to 150. We set the crossover rate and mutation rate to 0.7 and 0.1 respectively. We used the prediction accuracy of model as the fitness function of GA. SVM model is trained on training data set using the selected instance subset. The prediction accuracy of SVM model over test data set is used as fitness value in order to avoid overfitting. In the second phase, we used the optimal instance subset selected in the first phase as input data of bagging model. We used SVM model as base classifier for bagging ensemble. The majority voting scheme was used as a combining method in this study. This study applies the proposed model to the bankruptcy prediction problem using a real data set from Korean companies. The research data used in this study contains 1832 externally non-audited firms which filed for bankruptcy (916 cases) and non-bankruptcy (916 cases). Financial ratios categorized as stability, profitability, growth, activity and cash flow were investigated through literature review and basic statistical methods and we selected 8 financial ratios as the final input variables. We separated the whole data into three subsets as training, test and validation data set. In this study, we compared the proposed model with several comparative models including the simple individual SVM model, the simple bagging model and the instance selection based SVM model. The McNemar tests were used to examine whether the proposed model significantly outperforms the other models. The experimental results show that the proposed model outperforms the other models.

Development of Nutrition Quotient for Korean adults: item selection and validation of factor structure (한국 성인을 위한 영양지수 개발과 타당도 검증)

  • Lee, Jung-Sug;Kim, Hye-Young;Hwang, Ji-Yun;Kwon, Sehyug;Chung, Hae Rang;Kwak, Tong-Kyung;Kang, Myung-Hee;Choi, Young-Sun
    • Journal of Nutrition and Health
    • /
    • v.51 no.4
    • /
    • pp.340-356
    • /
    • 2018
  • Purpose: This study was conducted to develop a nutrition quotient (NQ) to assess overall dietary quality and food behaviors of Korean adults. Methods: The NQ was developed in three steps: item generation, item reduction, and validation. Candidate items of the NQ checklist were derived from a systematic literature review, expert in-depth interviews, statistical analyses of the Korea National Health and Nutrition Examination Survey (2010 ~ 2013) data, and national nutrition policies and recommendations. A total of 368 adults (19 ~ 64 years) participated in a one-day dietary record survey and responded to 43 items in the food behavior checklist. Pearson's correlation coefficients between responses to the checklist items and nutritional intake status of the adults were calculated. Item reduction was performed, and 24 items were selected for a nationwide survey. A total of 1,053 nationwide adult subjects completed the checklist questionnaires. Exploratory and confirmatory factor analyses were performed to develop a final NQ model. Results: The 21 checklist items were used as final items for NQ. Checklist items were composed of four factors: nutrition balance (seven items), food diversity (three items), moderation for the amount of food intake (six items), and dietary behavior (five items). The four-factor structure accounted for 41.8% of the total variance. Indicator tests of the NQ model suggested an adequate model fit (GRI = 0.9693, adjusted GFI = 0.9617, RMR = 0.0054, SRMR = 0.0897, p < 0.05), and item loadings were significant for all subscales. Standardized path coefficients were used as weights of the items. The NQ and four-factor scores were calculated according to the obtained weights of the questionnaire items. Conclusion: NQ for adults would be a useful tool for assessing adult dietary quality and food behavior. Further investigations of adult NQ are needed to reflect changes in their food behavior, environment, and prevalence of chronic diseases.

Statistical Analysis of Protein Content in Wheat Germplasm Based on Near-infrared Reflectance Spectroscopy (밀 유전자원의 근적외선분광분석 예측모델에 의한 단백질 함량 변이분석)

  • Oh, Sejong;Choi, Yu Mi;Yoon, Hyemyeong;Lee, Sukyeung;Yoo, Eunae;Hyun, Do Yoon;Shin, Myoung-Jae;Lee, Myung Chul;Chae, Byungsoo
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.64 no.4
    • /
    • pp.353-365
    • /
    • 2019
  • A near-infrared reflectance spectroscopy (NIRS) prediction model was set to establish a rapid analysis system of wheat germplasm and provide statistical information on the characteristics of protein contents. The variability index value (VIV) of calibration resources was 0.80, the average protein content was 13.2%, and the content range was from 7.0% to 13.2%. After measuring the near-infrared spectra of calibration resources, the NIRS prediction model was developed through a regression analysis between protein content and spectra data, and then optimized by excluding outliers. The standard error of calibration, R2, and the slope of the optimized model were 0.132, 0.997, and 1.000 respectively, and those of external validation results were 0.994, 0.191, and 1.013, respectively. Based on these results, a developed NIRS model could be applied to the rapid analysis of protein in wheat. The distribution of NIRS protein content of 6,794 resources were analyzed using a normal distribution analysis. The VIV was 0.79, the average protein was 12.1%, and the content range of resources accounting for 42.1% and 68% of the total accessions were 10-13% and 9.5-14.6%, respectively. The composition of total resources was classified into breeding line (3,128), landrace (2,705), and variety (961). The VIV in breeding line was 0.80, the protein average was 11.8%, and the contents of 68% of total resources ranged from 9.2% to 14.5%. The VIV in landrace was 0.76, the protein average was 12.1%, and the content range of resources of 68% of total accessions was 9.8-14.4%. The VIV in variety was 0.80, the protein average was 12.8%, and the accessions representing 68% of total resources ranged from 10.2% to 15.4%. These results should be helpful to the related experts of wheat breeding.

Discrimination of geographical origin for soybeans using ED-XRF (ED-XRF (Energy Dispersive X-ray Fluorescence spectrometer)를 이용한 콩 원산지 판별)

  • Lee, Ji-Hye;Kang, Dong-Jin;Jang, Eun-Hee;Hur, Suel-Hye;Shin, Byeung-Kon;Han, Guk-Tak;Lee, Seong-Hun
    • Korean Journal of Food Science and Technology
    • /
    • v.52 no.2
    • /
    • pp.125-129
    • /
    • 2020
  • In this study we developed a method for determining the geographic origin of soybeans by combining energy dispersive X-ray fluorescence spectrometry with statistical analysis. In 2018, 197 soybean samples (100 Korean domestic samples and 97 foreign samples) were collected for the construction of a geographic origin model. The mineral concentrations of 26 elements were measured and determined via the fundamental parameters approach. One-way analysis of variance, t-test, and canonical discriminant analysis were employed to reveal five elements (P, Ni, Br, Zn, and Mn) that could be used for the determination of geographic origins. The sensitivity, specificity, and efficiency for the above method were 91.0, 95.9, and 93.4%, respectively. Validation results from 60 samples collected in 2019 showed a predictive rate of 93.3% for Korean domestic soybeans and 100.0% for foreign soybeans. In conclusion, the combination of energy dispersive X-ray fluorescence spectrometry and chemometrics could be used to effectively determine the geographic origin of soybeans.

Development of a surrogate model based on temperature for estimation of evapotranspiration and its use for drought index applicability assessment (증발산 산정을 위한 온도기반의 대체모형 개발 및 가뭄지수 적용성 평가)

  • Kim, Ho-Jun;Kim, Kyoungwook;Kwon, Hyun-Han
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.11
    • /
    • pp.969-983
    • /
    • 2021
  • Evapotranspiration, one of the hydrometeorological components, is considered an important variable for water resource planning and management and is primarily used as input data for hydrological models such as water balance models. The FAO56 PM method has been recommended as a standard approach to estimate the reference evapotranspiration with relatively high accuracy. However, the FAO56 PM method is often challenging to apply because it requires considerable hydrometeorological variables. In this perspective, the Hargreaves equation has been widely adopted to estimate the reference evapotranspiration. In this study, a set of parameters of the Hargreaves equation was calibrated with relatively long-term data within a Bayesian framework. Statistical index (CC, RMSE, IoA) is used to validate the model. RMSE for monthly results reduced from 7.94 ~ 24.91 mm/month to 7.94 ~ 24.91 mm/month for the validation period. The results confirmed that the accuracy was significantly improved compared to the existing Hargreaves equation. Further, the evaporative demand drought index (EDDI) based on the evaporative demand (E0) was proposed. To confirm the effectiveness of the EDDI, this study evaluated the estimated EDDI for the recent drought events from 2014 to 2015 and 2018, along with precipitation and SPI. As a result of the evaluation of the Han-river watershed in 2018, the weekly EDDI increased to more than 2 and it was confirmed that EDDI more effectively detects the onset of drought caused by heatwaves. EDDI can be used as a drought index, particularly for heatwave-driven flash drought monitoring and along with SPI.

Accuracy of artificial intelligence-assisted landmark identification in serial lateral cephalograms of Class III patients who underwent orthodontic treatment and two-jaw orthognathic surgery

  • Hong, Mihee;Kim, Inhwan;Cho, Jin-Hyoung;Kang, Kyung-Hwa;Kim, Minji;Kim, Su-Jung;Kim, Yoon-Ji;Sung, Sang-Jin;Kim, Young Ho;Lim, Sung-Hoon;Kim, Namkug;Baek, Seung-Hak
    • The korean journal of orthodontics
    • /
    • v.52 no.4
    • /
    • pp.287-297
    • /
    • 2022
  • Objective: To investigate the pattern of accuracy change in artificial intelligence-assisted landmark identification (LI) using a convolutional neural network (CNN) algorithm in serial lateral cephalograms (Lat-cephs) of Class III (C-III) patients who underwent two-jaw orthognathic surgery. Methods: A total of 3,188 Lat-cephs of C-III patients were allocated into the training and validation sets (3,004 Lat-cephs of 751 patients) and test set (184 Lat-cephs of 46 patients; subdivided into the genioplasty and non-genioplasty groups, n = 23 per group) for LI. Each C-III patient in the test set had four Lat-cephs: initial (T0), pre-surgery (T1, presence of orthodontic brackets [OBs]), post-surgery (T2, presence of OBs and surgical plates and screws [S-PS]), and debonding (T3, presence of S-PS and fixed retainers [FR]). After mean errors of 20 landmarks between human gold standard and the CNN model were calculated, statistical analysis was performed. Results: The total mean error was 1.17 mm without significant difference among the four time-points (T0, 1.20 mm; T1, 1.14 mm; T2, 1.18 mm; T3, 1.15 mm). In comparison of two time-points ([T0, T1] vs. [T2, T3]), ANS, A point, and B point showed an increase in error (p < 0.01, 0.05, 0.01, respectively), while Mx6D and Md6D showeda decrease in error (all p < 0.01). No difference in errors existed at B point, Pogonion, Menton, Md1C, and Md1R between the genioplasty and non-genioplasty groups. Conclusions: The CNN model can be used for LI in serial Lat-cephs despite the presence of OB, S-PS, FR, genioplasty, and bone remodeling.

Study of the Derive of Core Habitats for Kirengeshoma koreana Nakai Using HSI and MaxEnt (HSI와 MaxEnt를 통한 나도승마 핵심서식지 발굴 연구)

  • Sun-Ryoung Kim;Rae-Ha Jang;Jae-Hwa Tho;Min-Han Kim;Seung-Woon Choi;Young-Jun Yoon
    • Korean Journal of Environment and Ecology
    • /
    • v.37 no.6
    • /
    • pp.450-463
    • /
    • 2023
  • The objective of this study is to derive the core habitat of the Kirengeshoma koreana Nakai utilizing Habitat Suitability Index (HSI) and Maximum Entropy (MaxEnt) models. Expert-based models have been criticized for their subjective criteria, while statistical models face difficulties in on-site validation and integration of expert opinions. To address these limitations, both models were employed, and their outcomes were overlaid to derive the core habitat. Five variables were identified through a comprehensive literature review and spatial analysis based on appearance coordinates. The environmental variables encompass vegetation zone, forest type, crown density, annual precipitation, and effective soil depth. Through surveys involving six experts, importance rankings and SI (Suitability Index) scores were established for each variable, subsequently facilitating the creation of an HSI map. Using the same variables, the MaxEnt model was also executed, resulting in a corresponding map, which was merged to construct the definitive core habitat map. Out of 16 observed locations of K. koreana, 15 were situated within the identified core habitat. Furthermore, an area historically known to host K. koreana but not verified in the present, Mt. Yeongchwi, was found to lack a core habitat. These findings suggest that the developed models exhibit a high degree of accuracy and effectively reflect the current ecological landscape.

Reactive and Proactive Aggression, the Validation of the Reactive-Proactive Questionnaire (RPQ): Focusing on ESEM and Rasch (반응적 공격성과 주도적 공격성, Reactive-Proactive Questionnaire(RPQ) 타당화 연구: ESEM과 Rasch를 중심으로)

  • Seonyoung Park;Jonghan Sea
    • Korean Journal of Culture and Social Issue
    • /
    • v.30 no.2
    • /
    • pp.159-192
    • /
    • 2024
  • The purpose of this study is to validate the Reactive-Proactive Aggression Questionnaire (RPQ), a tool for measuring reactive-proactive aggression, in the context of South Korea. A thorough translation was conducted in collaboration with the original author. An exploratory factor analysis (EFA), exploratory structural equation modeling (ESEM), rating scale model (Rasch), differential item functioning (DIF), and convergent validity were performed on a sample of 510 South Korean individuals. The results revealed a two-factor structure of reactive and proactive aggression after removing one item showing dual loading. Rating scale analysis based on the Rasch model indicated the appropriateness of the 3-point Likert scale, with all items meeting fit criteria. Although the separation index and separation reliability of proactive aggression was marginally lower, the overall discrimination between participants and items was satisfactory. Examination of participant-item distribution indicated a suitable alignment between reactive aggression and participant ability levels, whereas proactive aggression exhibited slightly elevated item difficulty. Furthermore, three items were found to function differently based on gender. A moderate but statistically significant positive correlation was found between the Barratt Impulsiveness Scale-11-R (Korean version) and RPQ from the results of the convergent validity evaluation. Overall, this study employed rigorous statistical methods to validate the suitability of the RPQ for use in Korea, taking cultural nuances into account, and introduced the concepts of reactive and proactive aggression to the Korean general population.

Comparison of Artificial Neural Network and Empirical Models to Determine Daily Reference Evapotranspiration (기준 일증발산량 산정을 위한 인공신경망 모델과 경험모델의 적용 및 비교)

  • Choi, Yonghun;Kim, Minyoung;O'Shaughnessy, Susan;Jeon, Jonggil;Kim, Youngjin;Song, Weon Jung
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.60 no.6
    • /
    • pp.43-54
    • /
    • 2018
  • The accurate estimation of reference crop evapotranspiration ($ET_o$) is essential in irrigation water management to assess the time-dependent status of crop water use and irrigation scheduling. The importance of $ET_o$ has resulted in many direct and indirect methods to approximate its value and include pan evaporation, meteorological-based estimations, lysimetry, soil moisture depletion, and soil water balance equations. Artificial neural networks (ANNs) have been intensively implemented for process-based hydrologic modeling due to their superior performance using nonlinear modeling, pattern recognition, and classification. This study adapted two well-known ANN algorithms, Backpropagation neural network (BPNN) and Generalized regression neural network (GRNN), to evaluate their capability to accurately predict $ET_o$ using daily meteorological data. All data were obtained from two automated weather stations (Chupungryeong and Jangsu) located in the Yeongdong-gun (2002-2017) and Jangsu-gun (1988-2017), respectively. Daily $ET_o$ was calculated using the Penman-Monteith equation as the benchmark method. These calculated values of $ET_o$ and corresponding meteorological data were separated into training, validation and test datasets. The performance of each ANN algorithm was evaluated against $ET_o$ calculated from the benchmark method and multiple linear regression (MLR) model. The overall results showed that the BPNN algorithm performed best followed by the MLR and GRNN in a statistical sense and this could contribute to provide valuable information to farmers, water managers and policy makers for effective agricultural water governance.

Mechanism-based View of Innovative Capability Building in POSCO (메커니즘 관점에서 본 조직변신과 포스코의 혁신패턴 연구)

  • Kim, So-Hyung
    • Journal of Distribution Science
    • /
    • v.11 no.6
    • /
    • pp.59-65
    • /
    • 2013
  • Purpose - Studies of mechanism as a competitive strategy, a relatively new field in the study of strategic management research, has recently drawn the attention of the business management scholars. The literature has so far proposed the subjective-based view, environment-based view, and the resource-based view in its analyses of firm management. Hence, it is highly likely for the firm management to be reasonably thought of as a combination of and interaction among the three key elements of subject, environment, and resources this is the mechanism-based view (MBV). It is reasonable to consider firm management to be the combination of and interaction among the three key elements of subject, environment, and resources. The overall dynamic process that integrates these three elements and creates functional harmony is identified as the mechanism, the principle of firm management. Much of the extant literatures on MBV has mainly focused on case studies, a qualitative approach prone to subjectivity of the researcher, although the intuition from the study may lead to meaningful insights into a firm-specific mechanism. This study's focus is also on case analysis, but it still attempts a quantitative approach in order to reach a scientific and systematic understanding of the MBV. Research design, data, and methodology - I used both a qualitative and quantitative approach to a single model, given the complexity of the innovation processes. I conducted in-depth interviews with POSCO employees-20 from general management, two from human resources, eight from information technology, five from finance and accounting, and five from production and logistics management. Once the innovative events were selected, the interview results were double-checked by the interviewees themselves to ensure the accuracy of the answers recorded. Based on the interview, I then conducted statistical validation using the survey results as well. Results - This study analyzes the building process of innovation and the effect of the mechanism pattern on innovation by examining the case of POSCO, which has survived over the past 21 years. I apply a new analytical tool to study mechanism innovation types, perform a new classification, and describe the interrelationships among the mechanism factors. This process allows me to see how the "Subject"factor interacts with the other factors. I found that, in the innovation process of the adoption stage, Subject had a mediating effect but that the mediating effect of resource and performance was smaller than the effect of Subject on performance alone. During the implementation stage, the mediating effect of Subject increased. Conclusion - Therefore, I have confirmed that the subject utilizes resources reasonably and efficiently. I have also advanced mechanism studies: whereas the field's research methods have been largely confined to single case studies, I have used both qualitative and quantitative methods to examine the relationships among mechanisms.