• Title/Summary/Keyword: resampling techniques

Search Result 27, Processing Time 0.023 seconds

A comparative study of the Gini coefficient estimators based on the regression approach

  • Mirzaei, Shahryar;Borzadaran, Gholam Reza Mohtashami;Amini, Mohammad;Jabbari, Hadi
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.4
    • /
    • pp.339-351
    • /
    • 2017
  • Resampling approaches were the first techniques employed to compute a variance for the Gini coefficient; however, many authors have shown that an analysis of the Gini coefficient and its corresponding variance can be obtained from a regression model. Despite the simplicity of the regression approach method to compute a standard error for the Gini coefficient, the use of the proposed regression model has been challenging in economics. Therefore in this paper, we focus on a comparative study among the regression approach and resampling techniques. The regression method is shown to overestimate the standard error of the Gini index. The simulations show that the Gini estimator based on the modified regression model is also consistent and asymptotically normal with less divergence from normal distribution than other resampling techniques.

REGENERATIVE BOOTSTRAP FOR SIMULATION OUTPUT ANALYSIS

  • Kim, Yun-Bae
    • Proceedings of the Korea Society for Simulation Conference
    • /
    • 2001.05a
    • /
    • pp.169-169
    • /
    • 2001
  • With the aid of fast computing power, resampling techniques are being introduced for simulation output analysis (SOA). Autocorrelation among the output from discrete-event simulation prohibit the direct application of resampling schemes (Threshold bootstrap, Binary bootstrap, Stationary bootstrap, etc) extend its usage to time-series data such as simulation output. We present a new method for inference from a regenerative process, regenerative bootstrap, that equals or exceeds the performance of classical regenerative method and approximation regeneration techniques. Regenerative bootstrap saves computation time and overcomes the problem of scarce regeneration cycles. Computational results are provided using M/M/1 model.

  • PDF

Improving the Performance of Threshold Bootstrap for Simulation Output Analysis (시뮬레이션 출력분석을 위한 임계값 부트스트랩의 성능개선)

  • Kim, Yun-Bae
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.23 no.4
    • /
    • pp.755-767
    • /
    • 1997
  • Analyzing autocorrelated data set is still an open problem. Developing on easy and efficient method for severe positive correlated data set, which is common in simulation output, is vital for the simulation society. Bootstrap is on easy and powerful tool for constructing non-parametric inferential procedures in modern statistical data analysis. Conventional bootstrap algorithm requires iid assumption in the original data set. Proper choice of resampling units for generating replicates has much to do with the structure of the original data set, iid data or autocorrelated. In this paper, a new bootstrap resampling scheme is proposed to analyze the autocorrelated data set : the Threshold Bootstrap. A thorough literature search of bootstrap method focusing on the case of autocorrelated data set is also provided. Theoretical foundations of Threshold Bootstrap is studied and compared with other leading bootstrap sampling techniques for autocorrelated data sets. The performance of TB is reported using M/M/1 queueing model, else the comparison of other resampling techniques of ARMA data set is also reported.

  • PDF

Analysis of Recurrent Gap Time Data with a Binary Time-Varying Covariate

  • Kim, Yang-Jin
    • Communications for Statistical Applications and Methods
    • /
    • v.21 no.5
    • /
    • pp.387-393
    • /
    • 2014
  • Recurrent gap times are analyzed with diverse methods under several assumptions such as a marginal model or a frailty model. Several resampling techniques have been recently suggested to estimate the covariate effect; however, these approaches can be applied with a time-fixed covariate. According to simulation results, these methods result in biased estimates for a time-varying covariate which is often observed in a longitudinal study. In this paper, we extend a resampling method by incorporating new weights and sampling scheme. Simulation studies are performed to compare the suggested method with previous resampling methods. The proposed method is applied to estimate the effect of an educational program on traffic conviction data where a program participation occurs in the middle of the study.

A Comparison of Ensemble Methods Combining Resampling Techniques for Class Imbalanced Data (데이터 전처리와 앙상블 기법을 통한 불균형 데이터의 분류모형 비교 연구)

  • Leea, Hee-Jae;Lee, Sungim
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.3
    • /
    • pp.357-371
    • /
    • 2014
  • There are many studies related to imbalanced data in which the class distribution is highly skewed. To address the problem of imbalanced data, previous studies deal with resampling techniques which correct the skewness of the class distribution in each sampled subset by using under-sampling, over-sampling or hybrid-sampling such as SMOTE. Ensemble methods have also alleviated the problem of class imbalanced data. In this paper, we compare around a dozen algorithms that combine the ensemble methods and resampling techniques based on simulated data sets generated by the Backbone model, which can handle the imbalance rate. The results on various real imbalanced data sets are also presented to compare the effectiveness of algorithms. As a result, we highly recommend the resampling technique combining ensemble methods for imbalanced data in which the proportion of the minority class is less than 10%. We also find that each ensemble method has a well-matched sampling technique. The algorithms which combine bagging or random forest ensembles with random undersampling tend to perform well; however, the boosting ensemble appears to perform better with over-sampling. All ensemble methods combined with SMOTE outperform in most situations.

Classification for Imbalanced Breast Cancer Dataset Using Resampling Methods

  • Hana Babiker, Nassar
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.1
    • /
    • pp.89-95
    • /
    • 2023
  • Analyzing breast cancer patient files is becoming an exciting area of medical information analysis, especially with the increasing number of patient files. In this paper, breast cancer data is collected from Khartoum state hospital, and the dataset is classified into recurrence and no recurrence. The data is imbalanced, meaning that one of the two classes have more sample than the other. Many pre-processing techniques are applied to classify this imbalanced data, resampling, attribute selection, and handling missing values, and then different classifiers models are built. In the first experiment, five classifiers (ANN, REP TREE, SVM, and J48) are used, and in the second experiment, meta-learning algorithms (Bagging, Boosting, and Random subspace). Finally, the ensemble model is used. The best result was obtained from the ensemble model (Boosting with J48) with the highest accuracy 95.2797% among all the algorithms, followed by Bagging with J48(90.559%) and random subspace with J48(84.2657%). The breast cancer imbalanced dataset was classified into recurrence, and no recurrence with different classified algorithms and the best result was obtained from the ensemble model.

Speed Enhancement Technique for Ray Casting using 2D Resampling (2차원 리샘플링에 기반한 광선추적법의 속도 향상 기법)

  • Lee, Rae-Kyoung;Ihm, In-Sung
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.8
    • /
    • pp.691-700
    • /
    • 2000
  • The standard volume ray-tracing, optimized with octree, needs to repeatedly traverse hierarchical structures for each ray that often leads to redundant computations. It also employs the expensive 3D interpolation for producing high quality images. In this paper, we present a new ray-casting method that efficiently computes shaded colors and opacities at resampling points by traversing octree only once. This method traverses volume data in object-order, finds resampling points on slices incrementally, and performs resampling based on 2D interpolation. While the early ray-termination, which is one of the most effective optimization techniques, is not easily combined with object-order methods, we solved this problem using a dynamic data structure in image space. Considering that our new method is easy to implement, and need little additional memory, it will be used as very effective volume method that fills the performance gap between ray-casting and shear-warping.

  • PDF

ALGORITHM OF REVISED-OTFTOOL

  • Chung Eun-Jung;Kim Hyor-Young;Rhee Myung-Hyun
    • Journal of Astronomy and Space Sciences
    • /
    • v.23 no.3
    • /
    • pp.269-288
    • /
    • 2006
  • We revised the OTFTOOL which was developed in Five College Radio Astronomy Observatory (FCRAO) for the On-The-Fly (OTF) observation. Besides the improvement of data resampling function of conventional OTFTOOL, we added a new SELF referencing mode and data pre-reduction function. Since OTF observation data have a large redundancy, we can choose and use only good quality samples excluding bad samples. Sorting out the bad samples is based on the floating level, rms level, antenna trajectory, elevation, $T_{sys}$, and number of samples. And, spikes are also removed. Referencing method can be chosen between CLASSICAL mode in which the references are taken from the OFFs observation and ELLIPSOIDAL mode in which the references are taken from the inner source free region (this is named as SELF reference). Baseline is subtracted with the source free channel windows and the baseline order chosen by the user. Passing through these procedures, the raw OTF data will be an FITS datacube. The revised-OTFTOOL maximizes the advantages of OTF observation by sorting out the bad samples in the earliest stage. And the new self-referencing method, the ELLIPSOIDAL mode, is very powerful to reduce the data. Moreover since it is possible to see the datacube at once without moving them into other data reduction programs, it is very useful and convenient to check whether the data resampling works well or not. We expect that the revised-OTFTOOL can be applied to the facilities of the OTF observation like SRAO, NRAO, and FCRAO.

Comparison of Loss Function for Multi-Class Classification of Collision Events in Imbalanced Black-Box Video Data (불균형 블랙박스 동영상 데이터에서 충돌 상황의 다중 분류를 위한 손실 함수 비교)

  • Euisang Lee;Seokmin Han
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.24 no.1
    • /
    • pp.49-54
    • /
    • 2024
  • Data imbalance is a common issue encountered in classification problems, stemming from a significant disparity in the number of samples between classes within the dataset. Such data imbalance typically leads to problems in classification models, including overfitting, underfitting, and misinterpretation of performance metrics. Methods to address this issue include resampling, augmentation, regularization techniques, and adjustment of loss functions. In this paper, we focus on loss function adjustment, particularly comparing the performance of various configurations of loss functions (Cross Entropy, Balanced Cross Entropy, two settings of Focal Loss: 𝛼 = 1 and 𝛼 = Balanced, Asymmetric Loss) on Multi-Class black-box video data with imbalance issues. The comparison is conducted using the I3D, and R3D_18 models.