• 제목/요약/키워드: Out-of-Sample Prediction

Search Result 91, Processing Time 0.02 seconds

Comparative Study of Dimension Reduction Methods for Highly Imbalanced Overlapping Churn Data

  • Lee, Sujee;Koo, Bonhyo;Jung, Kyu-Hwan
    • Industrial Engineering and Management Systems
    • /
    • v.13 no.4
    • /
    • pp.454-462
    • /
    • 2014
  • Retention of possible churning customer is one of the most important issues in customer relationship management, so companies try to predict churn customers using their large-scale high-dimensional data. This study focuses on dealing with large data sets by reducing the dimensionality. By using six different dimension reduction methods-Principal Component Analysis (PCA), factor analysis (FA), locally linear embedding (LLE), local tangent space alignment (LTSA), locally preserving projections (LPP), and deep auto-encoder-our experiments apply each dimension reduction method to the training data, build a classification model using the mapped data and then measure the performance using hit rate to compare the dimension reduction methods. In the result, PCA shows good performance despite its simplicity, and the deep auto-encoder gives the best overall performance. These results can be explained by the characteristics of the churn prediction data that is highly correlated and overlapped over the classes. We also proposed a simple out-of-sample extension method for the nonlinear dimension reduction methods, LLE and LTSA, utilizing the characteristic of the data.

Application of Near Infrared Spectroscopy for Nondestructive Evaluation of Nitrogen Content in Ginseng

  • Lin, Gou-lin;Sohn, Mi-Ryeong;Kim, Eun-Ok;Kwon, Young-Kil;Cho, Rae-Kwang
    • Proceedings of the Korean Society of Near Infrared Spectroscopy Conference
    • /
    • 2001.06a
    • /
    • pp.1528-1528
    • /
    • 2001
  • Ginseng cultivated in different country or growing condition has generally different components such as saponin and protein, and it relates to efficacy and action. Protein content assumes by nitrogen content in ginseng radix. Nitrogen content could be determined by chemical analysis such as kjeldahl or extraction methods. However, these methods require long analysis time and result environmental pollution and sample damage. In this work we investigated possibility of non-destructive determination of nitrogen content in ginseng radix using near-infrared spectroscopy. Ginseng radix, root of Panax ginseng C. A. Meyer, was studied. Total 120 samples were used in this study and it was consisted of 6 sample sets, 4, 5 and 6-year-old Korea ginseng and 7, 8 and 9-year-old China ginseng, respectively. Each sample set has 20 sample. Nigrogen content was measured by electronic analysis. NIR reflectance spectra were collected over the 1100 to 2500 nm spectral region with a InfraAlyzer 500C (Bran+Luebbe, Germany) equipped with a halogen lapmp and PbS detector and data were collected every 2 nm data point intervals. The calibration models were carried out by multiple linear regression (MLR) and partial least squares (PLS) analysis using IDAS and SESAME software. Result of electronic analysis, Korean ginseng were different mean value in nitrogen content of China ginseng. Ginseng tend to generally decrease the nitrogen content according as cultivation year is over 6 years. The MLR calibration model with 8 wavelengths using IDAS software accurately predicted nitrogen contents with correlation coefficient (R) and standard error of prediction of 0.985 and 0.855%, respectively. In case of SESAME software, the MLR calibration with 9 wavelength was selected the best calibration, R and SEP were 0.972 and 0.596%, respectively. The PLSR calibration model result in 0.969 of R and 0.630 of RMSEP. This study shows the NIR spectroscopy could be applied to determine the nitrogen content in ginseng radix with high accuracy.

  • PDF

Supply models for stability of supply-demand in the Korean pork market

  • Chunghyeon, Kim;Hyungwoo, Lee ;Tongjoo, Suh
    • Korean Journal of Agricultural Science
    • /
    • v.49 no.3
    • /
    • pp.679-690
    • /
    • 2022
  • As the supply and demand of pork has become a significant concern in Korea, controlling it has become a critical challenge for the industry. However, compared to the demand for pork, which has relatively stable consumption, it is not easy to maintain a stable supply. As the preparation of measures for a supply-demand crisis response and supply control in the pig industry has emerged as an important task, it has become necessary to establish a stable supply model and create an appropriate manual. In this study, a pork supply prediction model is constructed using reported data from the pig traceability system. Based on the derived results, a method for determining the supply-demand crisis stage using a statistical approach was proposed. From the results of the analysis, working days, African swine fever, heat wave, and Covid-19 were shown to affect the number of pigs graded in the market. A test of the performance of the model showed that both in-sample error rate and out-sample error rate were between 0.3 - 7.6%, indicating a high level of predictive power. Applying the forecast, the distribution of the confidence interval of the predicted value was established, and the supply crisis stage was identified, evaluating supply-demand conditions.

Analysis on Food Waste Compost by Near Infrared Reflectance Spectroscopy(NIRS) (Near Infrared Reflectance Spectroscopy(NIRS)에 의한 음식물 쓰레기 퇴비 분석에 관한 연구)

  • Lee Hyo-Won;Kil Dong-Yong
    • Korean Journal of Organic Agriculture
    • /
    • v.13 no.3
    • /
    • pp.281-289
    • /
    • 2005
  • In order to find out an alternative way of analysis of food waste compost, the Near Infrared Reflectance Spectroscopy(NIRS) was used for the compost assessment because the technics has been known as non-detructive, cost-effective and rapid method. One hundred thirty six compost samples were collected from Incheon food waste compost factory at Namdong Indurial Complex. The samples were analyzed for nitrogen, organic matter (OM), ash, P, and K using Kjedahl, ignition method, and acid extraction with spectrophotometer, respectively. The samples were scanned using FOSS NIRSystem of Model 6500 scanning mono-chromator with wavelength from $400\~2,400nm$ at 2nm interval. Modified partial Least Squares(MPLS) was applied to develop the most reliable calibration model between NIR spectra and sample components such as nitrogen, ash, OM, P, and K. The regression was validated using validation set(n=30). Multiple correlation coefficient($R^2$) and standard error of prediction(SEP) for nitrogen, ash, organic matter, OM/N ratio, P and K were 0.87, 0.06, 0.72, 1.07, 0.68, 1.05, 0.89, 0.31, 0.77, 0.06, and 0.64, 0.07, respectively. The results of this experiment indicates that NIRS is reliable analytical method to assess some components of feed waste compost, also suggests that feasibility of NIRS can be Justified in case of various sample collection around the year.

  • PDF

Development of Prediction Model for Flexibly-reconfigurable Roll Forming based on Experimental Study (실험적 연구를 통한 비정형롤판재성형 예측 모델 개발)

  • Park, J.W.;Kil, M.G.;Yoon, J.S.;Kang, B.S.;Lee, K.
    • Transactions of Materials Processing
    • /
    • v.26 no.6
    • /
    • pp.341-347
    • /
    • 2017
  • Flexibly-reconfigurable roll forming (FRRF) is a novel sheet metal forming technology conducive to produce multi-curvature surfaces by controlling strain distribution along longitudinal direction. Reconfigurable rollers could be arranged to implement a kind of punch die set. By utilizing these reconfigurable rollers, desired curved surface can be formed. In FRRF process, three-dimensional surface is formed from two-dimensional curve. Thus, it is difficult to predict the forming result. In this study, a regression analysis was suggested to construct a predictive model for a longitudinal curvature of FRRF process. To facilitate investigation, input parameters affecting the longitudinal curvature of FRRF were determined as maximum compression value, curvature radius in the transverse direction, and initial blank width. Three-factor three-level full factorial experimental design was utilized and 27 experiments using FRRF apparatus were performed to obtain sample data of the regression model. Regression analysis was carried out using experimental results as sample data. The model used for regression analysis was a quadratic nonlinear regression model. Determination factor and root mean square root error were calculated to confirm the conformity of this model. Through goodness of fit test, this regression predictive model was verified.

Probability Sampling to Select Polling Places in Exit Poll (출구조사를 위한 투표소 확률추출 방법)

  • Kim, Young-Won;Uhm, Yoon-Hee
    • Survey Research
    • /
    • v.6 no.2
    • /
    • pp.1-32
    • /
    • 2005
  • The accuracy of exit poll mainly depends on the sampling method of voting places. For exit poll, we propose a probability sampling method of selecting voting places as an alternative to the bellwether polling place sampling. Through an empirical study based on the 2004 general election data, the efficiency of the suggested systematic sampling from ordered voting places was evaluated in terms of mean prediction error and it turns out that the proposed sampling method outperformed the bellwether polling places sampling. We also calculated the variance of estimator from the proposed sampling, and considered the sample size problem to guarantee the target precision using the design effect of the proposed sample design.

  • PDF

Evaluation of Corporate Distress Prediction Power using the Discriminant Analysis: The Case of First-Class Hotels in Seoul (판별분석에 의한 기업부실예측력 평가: 서울지역 특1급 호텔 사례 분석)

  • Kim, Si-Joong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.10
    • /
    • pp.520-526
    • /
    • 2016
  • This study aims to develop a distress prediction model, in order to evaluate the distress prediction power for first-class hotels and to calculate the average financial ratio in the Seoul area by using the financial ratios of hotels in 2015. The sample data was collected from 19 first-class hotels in Seoul and the financial ratios extracted from 14 of these 19 hotels. The results show firstly that the seven financial ratios, viz. the current ratio, total borrowings and bonds payable to total assets, interest coverage ratio to operating income, operating income to sales, net income to stockholders' equity, ratio of cash flows from operating activities to sales and total assets turnover, enable the top-level corporations to be discriminated from the failed corporations and, secondly, by using these seven financial ratios, a discriminant function which classifies the corporations into top-level and failed ones is estimated by linear multiple discriminant analysis. The accuracy of prediction of this discriminant capability turned out to be 87.9%. The accuracy of the estimates obtained by discriminant analysis indicates that the distress prediction model's distress prediction power is 78.95%. According to the analysis results, hotel management groups which administrate low level corporations need to focus on the classification of these seven financial ratios. Furthermore, hotel corporations have very different financial structures and failure prediction indicators from other industries. In accordance with this finding, for the development of credit evaluation systems for such hotel corporations, there is a need for systems to be developed that reflect hotel corporations' financial features.

A Study on Choice Behavior of Theme Park Visitors - Application of Nested Logit Model - (주제공원 이용자들의 선택행동 추정에 관한 연구 -Nested Logit Model의 적용)

  • 홍성권
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.24 no.4
    • /
    • pp.96-111
    • /
    • 1997
  • This study was carried out to identify users' choice behavior of theme parks. overland. Lotte World, Seoul Land, Dreamland and Children's Grand Park were selected as study areas. Both multinomial logic model(MNL), nested logic model(NMNL) and joint logit model wet$.$e test using a choice-based sample collected on study areas. Hausman-McFadden test showed that the MNL is not appropriate because the IIA assumption is violated. To avoid the problematic IIA assumption, the NMNL was tested. It splits similar alternatives into groups and nests separate decisions into hierarchical order to avoid the IIA assumption. Cluster analysis and discriminant analysis were conducted to find applicable nest structures. The inclusive value coefficient was 0.7788. It meant that sufficient condition of this model is met and users' choice behavior can be better understood by NMNL than MNL. The $\rho$2 value and accuracy of prediction of this model were 0.402 and 46.33% , respectively. Several comments were suggested to make the NMNL to be more reliable for future research on users' choice behavior of theme park.

  • PDF

Prediction of Jominy Curve using Artificial Neural Network (인공 신경망 모델을 활용한 조미니 곡선 예측)

  • Lee, Woonjae;Lee, Seok-Jae
    • Journal of the Korean Society for Heat Treatment
    • /
    • v.31 no.1
    • /
    • pp.1-5
    • /
    • 2018
  • This work demonstrated the application of an artificial neural network model for predicting the Jominy hardness curve by considering 13 alloying elements in low alloy steels. End-quench Jominy tests were carried out according to ASTM A255 standard method for 1197 samples. The hardness values of Jominy sample were measured at different points from the quenched end. The developed artificial neural network model predicted the Jominy curve with high accuracy ($R^2=0.9969$ for training and $R^2=0.9956$ for verification). In addition, the model was used to investigate the average sensitivity of input variables to hardness change.

Neural network heterogeneous autoregressive models for realized volatility

  • Kim, Jaiyool;Baek, Changryong
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.6
    • /
    • pp.659-671
    • /
    • 2018
  • In this study, we consider the extension of the heterogeneous autoregressive (HAR) model for realized volatility by incorporating a neural network (NN) structure. Since HAR is a linear model, we expect that adding a neural network term would explain the delicate nonlinearity of the realized volatility. Three neural network-based HAR models, namely HAR-NN, $HAR({\infty})-NN$, and HAR-AR(22)-NN are considered with performance measured by evaluating out-of-sample forecasting errors. The results of the study show that HAR-NN provides a slightly wider interval than traditional HAR as well as shows more peaks and valleys on the turning points. It implies that the HAR-NN model can capture sharper changes due to higher volatility than the traditional HAR model. The HAR-NN model for prediction interval is therefore recommended to account for higher volatility in the stock market. An empirical analysis on the multinational realized volatility of stock indexes shows that the HAR-NN that adds daily, weekly, and monthly volatility averages to the neural network model exhibits the best performance.