• Title/Summary/Keyword: data sampling

Search Result 5,029, Processing Time 0.029 seconds

A Hybrid Under-sampling Approach for Better Bankruptcy Prediction (부도예측 개선을 위한 하이브리드 언더샘플링 접근법)

  • Kim, Taehoon;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.173-190
    • /
    • 2015
  • The purpose of this study is to improve bankruptcy prediction models by using a novel hybrid under-sampling approach. Most prior studies have tried to enhance the accuracy of bankruptcy prediction models by improving the classification methods involved. In contrast, we focus on appropriate data preprocessing as a means of enhancing accuracy. In particular, we aim to develop an effective sampling approach for bankruptcy prediction, since most prediction models suffer from class imbalance problems. The approach proposed in this study is a hybrid under-sampling method that combines the k-Reverse Nearest Neighbor (k-RNN) and one-class support vector machine (OCSVM) approaches. k-RNN can effectively eliminate outliers, while OCSVM contributes to the selection of informative training samples from majority class data. To validate our proposed approach, we have applied it to data from H Bank's non-external auditing companies in Korea, and compared the performances of the classifiers with the proposed under-sampling and random sampling data. The empirical results show that the proposed under-sampling approach generally improves the accuracy of classifiers, such as logistic regression, discriminant analysis, decision tree, and support vector machines. They also show that the proposed under-sampling approach reduces the risk of false negative errors, which lead to higher misclassification costs.

Gibbs Sampling for Double Seasonal Autoregressive Models

  • Amin, Ayman A.;Ismail, Mohamed A.
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.6
    • /
    • pp.557-573
    • /
    • 2015
  • In this paper we develop a Bayesian inference for a multiplicative double seasonal autoregressive (DSAR) model by implementing a fast, easy and accurate Gibbs sampling algorithm. We apply the Gibbs sampling to approximate empirically the marginal posterior distributions after showing that the conditional posterior distribution of the model parameters and the variance are multivariate normal and inverse gamma, respectively. The proposed Bayesian methodology is illustrated using simulated examples and real-world time series data.

Variable sampling interval control charts for variance-covariance matrix

  • Chang, Duk-Joon;Shin, Jae-Kyoung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.4
    • /
    • pp.741-747
    • /
    • 2009
  • Properties of multivariate Shewhart and EWMA (Exponentially Weighted Moving Average) control charts for monitoring variance-covariance matrix of quality variables are investigated. Performances of the proposed charts are evaluated for matched fixed sampling interval (FSI) and variable sampling interval (VSI) charts in terms of average time to signal (ATS) and average number of samples to signal (ANSS). Average number of swiches (ANSW) of the proposed VSI charts are also investigated.

  • PDF

Multivariate EWMA Control Charts for the Variance-Covariance Matrix with Variable Sampling Intervals (가변추출간격상(假變抽出間格上)에서 분산(分散)-공분산(共分散) 행례(行例)에 대한 다변량(多變量) 기하이동평균(幾何移動平均) 처리원(處理圓))

  • Cho, Gyo-Young
    • Journal of the Korean Data and Information Science Society
    • /
    • v.4
    • /
    • pp.31-44
    • /
    • 1993
  • Multivariate exponentially weighted moving average (EWMA) control charts for monitoring the variance-covariance matrix are investigated. A variable sampling interval (VSI) feature is considered in these charts. Multivariate EWMA control charts for monitoring the variance-covariance matrix are compared on the basis of their average time to signal (ATS) performances. The numerical results show that multivariate VSI EWMA control charts are more efficient than corrsponding multivariate fixed sampling interval (FSI) EWMA control charts.

  • PDF

A New Estimator of Population Mean Based on Centered Balanced Systematic Sampling

  • Kim, Hyuk-Joo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.11 no.1
    • /
    • pp.91-101
    • /
    • 2000
  • We propose a new method for estimating the mean of a population which has a linear trend. The suggested estimator is based on the centered balanced systematic sampling method and the concept of interpolation and extrapolation. The efficiency of the proposed method is compared with that of existing methods.

  • PDF

Properties of variable sampling interval control charts

  • Chang, Duk-Joon;Heo, Sun-Yeong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.4
    • /
    • pp.819-829
    • /
    • 2010
  • Properties of multivariate variable sampling interval (VSI) Shewhart and CUSUM charts for monitoring mean vector of related quality variables are investigated. To evaluate average time to signal (ATS) and average number of switches (ANSW) of the proposed charts, Markov chain approaches and simulations are applied. Performances of the proposed charts are also investigated both when the process is in-control and when it is out-of-control.

Conditional Sampling Measurement to Identify Flame Structures in Turbulent Combustion (난류 화염 구조 규명을 위한 조건 평균 측정법)

  • Huh Kang Y.
    • Journal of the Korean Society of Visualization
    • /
    • v.2 no.1
    • /
    • pp.8-11
    • /
    • 2004
  • Conditional sampling measurement is required for conditional averages as well as unconditional Favre averages to resolve different flame structures of turbulent combustion. A Favre average can be obtained as an integral of conditional average and Favre PDF in terms of the mixture fraction, which is a preferred choice as a sampling variable in diffusion controlled turbulent combustion. MILD combustion data are presented as an example for a conditionally averaged data set and comparison with CMC calculation results.

  • PDF

Output regulation of linear sampled-data systems (선형 샘플치 시스템의 출력 조절)

  • 정선태
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.34S no.8
    • /
    • pp.65-73
    • /
    • 1997
  • The effects of time-sampling on linear output regulation problem is ivestigated. It is found that the solvability of linear output regulation problem is generally not robust with respect to time-sampling although the solvability of that for single inut and single output linear systems and the solvability of linear robust output regulation problem are preserved under time-sampling. The resutls imply that one needs to seek a better approximate sampled-data output regulator.

  • PDF

A Study on Estimating Population Mean by Use of Interpolation and Extrapolation with Balanced Systematic Sampling

  • Kim, Hyuk-Joo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.10 no.1
    • /
    • pp.91-102
    • /
    • 1999
  • A new method is developed for estimating the mean of a population which has a linear trend. The suggested estimator is based on the balanced systematic sampling method and the concept of interpolation and extrapolation. The efficiency of the proposed method is compared with that of conventional methods.

  • PDF

The Effect of Sampling Rate on Statistical Properties of Extreme Wave (파랑자료의 sampling rate가 극한파의 통계에 미치는 영향)

  • Kim, Do Young
    • Journal of the Korean Society for Marine Environment & Energy
    • /
    • v.16 no.1
    • /
    • pp.36-41
    • /
    • 2013
  • In this paper time series wave data are simulated using wave spectrum with random phases of the wave signal. The simulated wave signals are used to study the effect of the sampling rate on the ocean wave characteristics. Effect of sampling rate on wave data which include extreme wave such as freak waves are examined and various wave characteristics including abnormality index (AI), kurtosis of wave profile and maximum wave height are examined. Various wave heights are decreased as the sampling rate decreases. The zero-th moment of the wave spectrum does not affect much on the sampling rate but the second moment are greately affected on the sampling rate. The error due to the sampling rate is decreases as the wave period increases. The error in significant wave height based on the wave spectrum $H_s$ is smaller than that on the time domain method $H_{1/3}$. AI index and kurtosis of wave profile do not deviate much from the exact date as long as the sampling rate is greater than 1 Hz. Ocean wave measurement with the sampling frequency higher than 1 Hz will result the error less than 5% in estimating the height of extreme waves.