• 제목/요약/키워드: bootstrap-based selection

검색결과 18건 처리시간 0.027초

Nonparametric Kernel Regression Function Estimation with Bootstrap Method

  • Kim, Dae-Hak
    • Journal of the Korean Statistical Society
    • /
    • 제22권2호
    • /
    • pp.361-368
    • /
    • 1993
  • In recent years, kernel type estimates are abundant. In this paper, we propose a bandwidth selection method for kernel regression of fixed design based on bootstrap procedure. Mathematical properties of proposed bootstrap-based bandwidth selection method are discussed. Performance of the proposed method for small sample case is compared with that of cross-validation method via a simulation study.

  • PDF

Bandwidth Selection for Local Smoothing Jump Detector

  • Park, Dong-Ryeon
    • Communications for Statistical Applications and Methods
    • /
    • 제16권6호
    • /
    • pp.1047-1054
    • /
    • 2009
  • Local smoothing jump detection procedure is a popular method for detecting jump locations and the performance of the jump detector heavily depends on the choice of the bandwidth. However, little work has been done on this issue. In this paper, we propose the bootstrap bandwidth selection method which can be used for any kernel-based or local polynomial-based jump detector. The proposed bandwidth selection method is fully data-adaptive and its performance is evaluated through a simulation study and a real data example.

A Novel Text Sample Selection Model for Scene Text Detection via Bootstrap Learning

  • Kong, Jun;Sun, Jinhua;Jiang, Min;Hou, Jian
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권2호
    • /
    • pp.771-789
    • /
    • 2019
  • Text detection has been a popular research topic in the field of computer vision. It is difficult for prevalent text detection algorithms to avoid the dependence on datasets. To overcome this problem, we proposed a novel unsupervised text detection algorithm inspired by bootstrap learning. Firstly, the text candidate in a novel form of superpixel is proposed to improve the text recall rate by image segmentation. Secondly, we propose a unique text sample selection model (TSSM) to extract text samples from the current image and eliminate database dependency. Specifically, to improve the precision of samples, we combine maximally stable extremal regions (MSERs) and the saliency map to generate sample reference maps with a double threshold scheme. Finally, a multiple kernel boosting method is developed to generate a strong text classifier by combining multiple single kernel SVMs based on the samples selected from TSSM. Experimental results on standard datasets demonstrate that our text detection method is robust to complex backgrounds and multilingual text and shows stable performance on different standard datasets.

Developing a Molecular Prognostic Predictor of a Cancer based on a Small Sample

  • Kim Inyoung;Lee Sunho;Rha Sun Young;Kim Byungsoo
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2004년도 학술발표논문집
    • /
    • pp.195-198
    • /
    • 2004
  • One Important problem in a cancer microarray study is to identify a set of genes from which a molecular prognostic indicator can be developed. In parallel with this problem is to validate the chosen set of genes. We develop in this note a K-fold cross validation procedure by combining a 'pre-validation' technique and a bootstrap resampling procedure in the Cox regression . The pre-validation technique predicts the microarray predictor of a case without having seen the true class level of the case. It was suggested by Tibshirani and Efron (2002) to avoid the possible over-fitting in the regression in which a microarray based predictor is employed. The bootstrap resampling procedure for the Cox regression was proposed by Sauerbrei and Schumacher (1992) as a means of overcoming the instability of a stepwise selection procedure. We apply this K-fold cross validation to the microarray data of 92 gastric cancers of which the experiment was conducted at Cancer Metastasis Research Center, Yonsei University. We also share some of our experience on the 'false positive' result due to the information leak.

  • PDF

Variable Bandwidth Selection for Kernel Regression

  • Kim, Dae-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • 제5권1호
    • /
    • pp.11-20
    • /
    • 1994
  • In recent years, nonparametric kernel estimation of regresion function are abundant and widely applicable to many areas of statistics. Most of modern researches concerned with the fixed global bandwidth selection which can be used in the estimation of regression function with all the same value for all x. In this paper, we propose a method for selecting locally varing bandwidth based on bootstrap method in kernel estimation of fixed design regression. Performance of proposed bandwidth selection method for finite sample case is conducted via Monte Carlo simulation study.

  • PDF

붓스트랩 방법을 활용한 SVM 기반 유전자 선택 기법 (Gene Selection Based on Support Vector Machine using Bootstrap)

  • 송석헌;김경희;박창이;구자용
    • 응용통계연구
    • /
    • 제20권3호
    • /
    • pp.531-540
    • /
    • 2007
  • 본 연구에서는 유전자 선택 방법으로 최근 이용되는 SVM-RFE 알고리즘은 단순히 가중치의 절대값을 유전자 선택 기준으로 사용하여 유전자 값의 변동성을 고려하지 못하므로 가중치의 절대값을 그것의 표준오차로 나눈 보완된 통계량, B-RFE 알고리즘을 새로운 기준으로 제안하였다. 두 방법을 모의실험을 통해서 비교한 결과 본 연구에서 제안한 B-RFE 알고리즘이 더 의미 있는 순위를 도출하였다.

Optimal bandwidth in nonparametric classification between two univariate densities

  • ;강기훈
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2002년도 춘계 학술발표회 논문집
    • /
    • pp.1-5
    • /
    • 2002
  • We consider the problem of optimal bandwidth choice for nonparametric classification, based on kernel density estimators, where the problem of interest is distinguishing between two univariate distributions. When the densities intersect at a single point, optimal bandwidth choice depends on curvatures of the densities at that point. The problem of empirical bandwidth selection and classifying data in the tails of a distribution are also addressed.

  • PDF

상호정보 추정을 위한 k-최근접이웃 기반방법 (k-Nearest Neighbor-Based Approach for the Estimation of Mutual Information)

  • 차운옥;허문열
    • Communications for Statistical Applications and Methods
    • /
    • 제15권6호
    • /
    • pp.977-991
    • /
    • 2008
  • 본 논문에서는 연속형 변수에 대한 결합확률분포를 추정하지 않고도 상호정보(MI) 추정량을 구할 수 있는 k-최근접이웃 기반방법에 대하여 연구하였다. 변수가 동일한 값들을 가지는 경우 k-최근접이웃을 구할 때 생기는 문제점을 해결하기 위하여 지터링(jittering)과 붓스트랩(bootstrap) 방법을 제안하였다. 몬테칼로 모의실험과 실제 데이터에 대한 실험을 수행한 결과, k=1과 같이 작은 값을 사용한 k-최근접이웃 기반방법에 의해 효율적인 MI 추정량을 구할 수 있었다. k-최근접이웃 기반방법은 연속형 설명변수, 범주형 또는 연속형인 목적변수 형태의 데이터에 적용할 수 있으며, 목적변수에 영향을 주는 중요한 설명변수의 순서를 구할 수 있을 뿐만 아니라 다차원에도 적용할 수 있기 때문에 중요변수의 집합을 구하는 변수 선택(feature subset selection) 문제에도 적용할 수 있다.

A data-adaptive maximum penalized likelihood estimation for the generalized extreme value distribution

  • Lee, Youngsaeng;Shin, Yonggwan;Park, Jeong-Soo
    • Communications for Statistical Applications and Methods
    • /
    • 제24권5호
    • /
    • pp.493-505
    • /
    • 2017
  • Maximum likelihood estimation (MLE) of the generalized extreme value distribution (GEVD) is known to sometimes over-estimate the positive value of the shape parameter for the small sample size. The maximum penalized likelihood estimation (MPLE) with Beta penalty function was proposed by some researchers to overcome this problem. But the determination of the hyperparameters (HP) in Beta penalty function is still an issue. This paper presents some data adaptive methods to select the HP of Beta penalty function in the MPLE framework. The idea is to let the data tell us what HP to use. For given data, the optimal HP is obtained from the minimum distance between the MLE and MPLE. A bootstrap-based method is also proposed. These methods are compared with existing approaches. The performance evaluation experiments for GEVD by Monte Carlo simulation show that the proposed methods work well for bias and mean squared error. The methods are applied to Blackstone river data and Korean heavy rainfall data to show better performance over MLE, the method of L-moments estimator, and existing MPLEs.

란게르한스 세포의 출현횟수에 대한 통계적 고찰 (A statistical consideration on the number of occurrences of langerhans cells)

  • 이기원
    • 응용통계연구
    • /
    • 제5권2호
    • /
    • pp.271-282
    • /
    • 1992
  • 자궁경부암을 대상으로 란게르한스 세포와 악성변화의 연관성을 연구할 때 사용할 수 있는 통계적 방법을 제시하였다. 포아슨 확률모형에 바탕을 두어 설정된 여러 가능한 부모형 가운데 관찰치에 가장 적합한 모형을 AIC유형의 모형선택 기준에 의하여 선택하였다. 모형선택 기준의 표본분포는 불스트?을 이용하여 근사시키고 추정량의 표본분포는 정규근사를 이용하여 구하였다.

  • PDF