• 제목/요약/키워드: resampling techniques

검색결과 27건 처리시간 0.019초

A comparative study of the Gini coefficient estimators based on the regression approach

  • Mirzaei, Shahryar;Borzadaran, Gholam Reza Mohtashami;Amini, Mohammad;Jabbari, Hadi
    • Communications for Statistical Applications and Methods
    • /
    • 제24권4호
    • /
    • pp.339-351
    • /
    • 2017
  • Resampling approaches were the first techniques employed to compute a variance for the Gini coefficient; however, many authors have shown that an analysis of the Gini coefficient and its corresponding variance can be obtained from a regression model. Despite the simplicity of the regression approach method to compute a standard error for the Gini coefficient, the use of the proposed regression model has been challenging in economics. Therefore in this paper, we focus on a comparative study among the regression approach and resampling techniques. The regression method is shown to overestimate the standard error of the Gini index. The simulations show that the Gini estimator based on the modified regression model is also consistent and asymptotically normal with less divergence from normal distribution than other resampling techniques.

REGENERATIVE BOOTSTRAP FOR SIMULATION OUTPUT ANALYSIS

  • Kim, Yun-Bae
    • 한국시뮬레이션학회:학술대회논문집
    • /
    • 한국시뮬레이션학회 2001년도 춘계 학술대회 논문집
    • /
    • pp.169-169
    • /
    • 2001
  • With the aid of fast computing power, resampling techniques are being introduced for simulation output analysis (SOA). Autocorrelation among the output from discrete-event simulation prohibit the direct application of resampling schemes (Threshold bootstrap, Binary bootstrap, Stationary bootstrap, etc) extend its usage to time-series data such as simulation output. We present a new method for inference from a regenerative process, regenerative bootstrap, that equals or exceeds the performance of classical regenerative method and approximation regeneration techniques. Regenerative bootstrap saves computation time and overcomes the problem of scarce regeneration cycles. Computational results are provided using M/M/1 model.

  • PDF

시뮬레이션 출력분석을 위한 임계값 부트스트랩의 성능개선 (Improving the Performance of Threshold Bootstrap for Simulation Output Analysis)

  • 김윤배
    • 대한산업공학회지
    • /
    • 제23권4호
    • /
    • pp.755-767
    • /
    • 1997
  • Analyzing autocorrelated data set is still an open problem. Developing on easy and efficient method for severe positive correlated data set, which is common in simulation output, is vital for the simulation society. Bootstrap is on easy and powerful tool for constructing non-parametric inferential procedures in modern statistical data analysis. Conventional bootstrap algorithm requires iid assumption in the original data set. Proper choice of resampling units for generating replicates has much to do with the structure of the original data set, iid data or autocorrelated. In this paper, a new bootstrap resampling scheme is proposed to analyze the autocorrelated data set : the Threshold Bootstrap. A thorough literature search of bootstrap method focusing on the case of autocorrelated data set is also provided. Theoretical foundations of Threshold Bootstrap is studied and compared with other leading bootstrap sampling techniques for autocorrelated data sets. The performance of TB is reported using M/M/1 queueing model, else the comparison of other resampling techniques of ARMA data set is also reported.

  • PDF

Analysis of Recurrent Gap Time Data with a Binary Time-Varying Covariate

  • Kim, Yang-Jin
    • Communications for Statistical Applications and Methods
    • /
    • 제21권5호
    • /
    • pp.387-393
    • /
    • 2014
  • Recurrent gap times are analyzed with diverse methods under several assumptions such as a marginal model or a frailty model. Several resampling techniques have been recently suggested to estimate the covariate effect; however, these approaches can be applied with a time-fixed covariate. According to simulation results, these methods result in biased estimates for a time-varying covariate which is often observed in a longitudinal study. In this paper, we extend a resampling method by incorporating new weights and sampling scheme. Simulation studies are performed to compare the suggested method with previous resampling methods. The proposed method is applied to estimate the effect of an educational program on traffic conviction data where a program participation occurs in the middle of the study.

데이터 전처리와 앙상블 기법을 통한 불균형 데이터의 분류모형 비교 연구 (A Comparison of Ensemble Methods Combining Resampling Techniques for Class Imbalanced Data)

  • 이희재;이성임
    • 응용통계연구
    • /
    • 제27권3호
    • /
    • pp.357-371
    • /
    • 2014
  • 최근 들어 데이터 마이닝의 분류문제에 있어 목표변수의 불균형 문제가 많은 관심을 받고 있다. 이러한 문제를 해결하기 위해, 이전 연구들은 원 자료에 대하여 데이터 전처리 과정을 실시했는데, 전처리 과정에는 목표변수의 다수계급을 소수계급의 비율에 맞게 조정하는 과소표집법, 소수계급을 복원추출하여 다수계급의 비율에 맞게 조정하는 과대표집법, 소수계급에 K-최근접 이웃 방법 등을 활용하여 과대표집법을 적용 후 다수계급에는 과소표집법을 적용한 하이브리드 기법 등이 있다. 또한 앙상블 기법도 이러한 불균형 데이터의 분류 성능을 높일 수 있다고 알려져 있어, 본 논문에서는 데이터의 전처리 과정과 앙상블 기법을 함께 고려한 여러 모형들을 사용하여, 불균형 자료에 대한 이들모형의 분류성능을 비교평가한다.

Classification for Imbalanced Breast Cancer Dataset Using Resampling Methods

  • Hana Babiker, Nassar
    • International Journal of Computer Science & Network Security
    • /
    • 제23권1호
    • /
    • pp.89-95
    • /
    • 2023
  • Analyzing breast cancer patient files is becoming an exciting area of medical information analysis, especially with the increasing number of patient files. In this paper, breast cancer data is collected from Khartoum state hospital, and the dataset is classified into recurrence and no recurrence. The data is imbalanced, meaning that one of the two classes have more sample than the other. Many pre-processing techniques are applied to classify this imbalanced data, resampling, attribute selection, and handling missing values, and then different classifiers models are built. In the first experiment, five classifiers (ANN, REP TREE, SVM, and J48) are used, and in the second experiment, meta-learning algorithms (Bagging, Boosting, and Random subspace). Finally, the ensemble model is used. The best result was obtained from the ensemble model (Boosting with J48) with the highest accuracy 95.2797% among all the algorithms, followed by Bagging with J48(90.559%) and random subspace with J48(84.2657%). The breast cancer imbalanced dataset was classified into recurrence, and no recurrence with different classified algorithms and the best result was obtained from the ensemble model.

2차원 리샘플링에 기반한 광선추적법의 속도 향상 기법 (Speed Enhancement Technique for Ray Casting using 2D Resampling)

  • 이래경;임인성
    • 한국정보과학회논문지:시스템및이론
    • /
    • 제27권8호
    • /
    • pp.691-700
    • /
    • 2000
  • 볼륨 데이타에 대한 팔진트리와 같은 계층 자료구조를 사용하는 광선 추적법은 모든 광선이 계층구조를 순회하는 것으로 인한 중복된 계산을 포함하고 있으며, 좋은 화질의 영상을 얻기 위한 3차원 보간으로 인하여 많은 계산 비용을 요구한다. 본 논문은 볼륨 데이타의 계층구조에 대한 중복된 방문을 피하고, 오직 한 번만 계층구조를 방문하면서 효과적으로 광선의 리샘플링 지점을 결정하여 색상과 투병도를 구하는 볼륨 렌더링 알고리듬을 제안한다. 이 방법은 물체 순서로 광선 추적법을 수행하면서, 각 복셀 주위에서의 리샘플링 지점을 점진적으로 찾아가면서 각 슬라이스 상에서의 2차원 보간에 기반을 둔 리샘플링을 수행한다. 또한 물체 순서 렌더링에서는 조기 광선 종결과 같은 최적화 기법을 구현하기 힘든데, 영상공간에서의 동적 자료구조를 이용하여 이를 효과적으로 해결하였다 본 논문이 제안한 방법은 구현하기 쉽고 속도 향상을 위하여 추가적으로 요구되는 메모리가 매우 적기 때문에 광선 추적법과 쉬어 와핑 방법 사이의 성능 차이를 메워주는 효과적인 방법으로 사용될 수 있을 것이다.

  • PDF

ALGORITHM OF REVISED-OTFTOOL

  • Chung Eun-Jung;Kim Hyor-Young;Rhee Myung-Hyun
    • Journal of Astronomy and Space Sciences
    • /
    • 제23권3호
    • /
    • pp.269-288
    • /
    • 2006
  • We revised the OTFTOOL which was developed in Five College Radio Astronomy Observatory (FCRAO) for the On-The-Fly (OTF) observation. Besides the improvement of data resampling function of conventional OTFTOOL, we added a new SELF referencing mode and data pre-reduction function. Since OTF observation data have a large redundancy, we can choose and use only good quality samples excluding bad samples. Sorting out the bad samples is based on the floating level, rms level, antenna trajectory, elevation, $T_{sys}$, and number of samples. And, spikes are also removed. Referencing method can be chosen between CLASSICAL mode in which the references are taken from the OFFs observation and ELLIPSOIDAL mode in which the references are taken from the inner source free region (this is named as SELF reference). Baseline is subtracted with the source free channel windows and the baseline order chosen by the user. Passing through these procedures, the raw OTF data will be an FITS datacube. The revised-OTFTOOL maximizes the advantages of OTF observation by sorting out the bad samples in the earliest stage. And the new self-referencing method, the ELLIPSOIDAL mode, is very powerful to reduce the data. Moreover since it is possible to see the datacube at once without moving them into other data reduction programs, it is very useful and convenient to check whether the data resampling works well or not. We expect that the revised-OTFTOOL can be applied to the facilities of the OTF observation like SRAO, NRAO, and FCRAO.

불균형 블랙박스 동영상 데이터에서 충돌 상황의 다중 분류를 위한 손실 함수 비교 (Comparison of Loss Function for Multi-Class Classification of Collision Events in Imbalanced Black-Box Video Data)

  • 이의상;한석민
    • 한국인터넷방송통신학회논문지
    • /
    • 제24권1호
    • /
    • pp.49-54
    • /
    • 2024
  • 데이터 불균형은 분류 문제에서 흔히 마주치는 문제로, 데이터셋 내의 클래스간 샘플 수의 현저한 차이에서 기인한다. 이러한 데이터 불균형은 일반적으로 분류 모델에서 과적합, 과소적합, 성능 지표의 오해 등의 문제를 야기한다. 이를 해결하기 위한 방법으로는 Resampling, Augmentation, 규제 기법, 손실 함수 조정 등이 있다. 본 논문에서는 손실 함수 조정에 대해 다루며 특히, 불균형 문제를 가진 Multi-Class 블랙박스 동영상 데이터에서 여러 구성의 손실 함수(Cross Entropy, Balanced Cross Entropy, 두 가지 Focal Loss 설정: 𝛼 = 1 및 𝛼 = Balanced, Asymmetric Loss)의 성능을 I3D, R3D_18 모델을 활용하여 비교하였다.