• Title/Summary/Keyword: bootstrap-based selection

Search Result 18, Processing Time 0.023 seconds

Nonparametric Kernel Regression Function Estimation with Bootstrap Method

  • Kim, Dae-Hak
    • Journal of the Korean Statistical Society
    • /
    • v.22 no.2
    • /
    • pp.361-368
    • /
    • 1993
  • In recent years, kernel type estimates are abundant. In this paper, we propose a bandwidth selection method for kernel regression of fixed design based on bootstrap procedure. Mathematical properties of proposed bootstrap-based bandwidth selection method are discussed. Performance of the proposed method for small sample case is compared with that of cross-validation method via a simulation study.

  • PDF

Bandwidth Selection for Local Smoothing Jump Detector

  • Park, Dong-Ryeon
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.6
    • /
    • pp.1047-1054
    • /
    • 2009
  • Local smoothing jump detection procedure is a popular method for detecting jump locations and the performance of the jump detector heavily depends on the choice of the bandwidth. However, little work has been done on this issue. In this paper, we propose the bootstrap bandwidth selection method which can be used for any kernel-based or local polynomial-based jump detector. The proposed bandwidth selection method is fully data-adaptive and its performance is evaluated through a simulation study and a real data example.

A Novel Text Sample Selection Model for Scene Text Detection via Bootstrap Learning

  • Kong, Jun;Sun, Jinhua;Jiang, Min;Hou, Jian
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.2
    • /
    • pp.771-789
    • /
    • 2019
  • Text detection has been a popular research topic in the field of computer vision. It is difficult for prevalent text detection algorithms to avoid the dependence on datasets. To overcome this problem, we proposed a novel unsupervised text detection algorithm inspired by bootstrap learning. Firstly, the text candidate in a novel form of superpixel is proposed to improve the text recall rate by image segmentation. Secondly, we propose a unique text sample selection model (TSSM) to extract text samples from the current image and eliminate database dependency. Specifically, to improve the precision of samples, we combine maximally stable extremal regions (MSERs) and the saliency map to generate sample reference maps with a double threshold scheme. Finally, a multiple kernel boosting method is developed to generate a strong text classifier by combining multiple single kernel SVMs based on the samples selected from TSSM. Experimental results on standard datasets demonstrate that our text detection method is robust to complex backgrounds and multilingual text and shows stable performance on different standard datasets.

Developing a Molecular Prognostic Predictor of a Cancer based on a Small Sample

  • Kim Inyoung;Lee Sunho;Rha Sun Young;Kim Byungsoo
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2004.11a
    • /
    • pp.195-198
    • /
    • 2004
  • One Important problem in a cancer microarray study is to identify a set of genes from which a molecular prognostic indicator can be developed. In parallel with this problem is to validate the chosen set of genes. We develop in this note a K-fold cross validation procedure by combining a 'pre-validation' technique and a bootstrap resampling procedure in the Cox regression . The pre-validation technique predicts the microarray predictor of a case without having seen the true class level of the case. It was suggested by Tibshirani and Efron (2002) to avoid the possible over-fitting in the regression in which a microarray based predictor is employed. The bootstrap resampling procedure for the Cox regression was proposed by Sauerbrei and Schumacher (1992) as a means of overcoming the instability of a stepwise selection procedure. We apply this K-fold cross validation to the microarray data of 92 gastric cancers of which the experiment was conducted at Cancer Metastasis Research Center, Yonsei University. We also share some of our experience on the 'false positive' result due to the information leak.

  • PDF

Variable Bandwidth Selection for Kernel Regression

  • Kim, Dae-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • v.5 no.1
    • /
    • pp.11-20
    • /
    • 1994
  • In recent years, nonparametric kernel estimation of regresion function are abundant and widely applicable to many areas of statistics. Most of modern researches concerned with the fixed global bandwidth selection which can be used in the estimation of regression function with all the same value for all x. In this paper, we propose a method for selecting locally varing bandwidth based on bootstrap method in kernel estimation of fixed design regression. Performance of proposed bandwidth selection method for finite sample case is conducted via Monte Carlo simulation study.

  • PDF

Gene Selection Based on Support Vector Machine using Bootstrap (붓스트랩 방법을 활용한 SVM 기반 유전자 선택 기법)

  • Song, Seuck-Heun;Kim, Kyoung-Hee;Park, Chang-Yi;Koo, Ja-Yong
    • The Korean Journal of Applied Statistics
    • /
    • v.20 no.3
    • /
    • pp.531-540
    • /
    • 2007
  • The recursive feature elimination for support vector machine is known to be useful in selecting relevant genes. Since the criterion for choosing relevant genes is the absolute value of a coefficient, the recursive feature elimination may suffer from a scaling problem. We propose a modified version of the recursive feature elimination algorithm using bootstrap. In our method, the criterion for determining relevant genes is the absolute value of a coefficient divided by its standard error, which accounts for statistical variability of the coefficient. Through numerical examples, we illustrate that our method is effective in gene selection.

Optimal bandwidth in nonparametric classification between two univariate densities

  • Hall, Peter;Kang, Kee-Hoon
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2002.05a
    • /
    • pp.1-5
    • /
    • 2002
  • We consider the problem of optimal bandwidth choice for nonparametric classification, based on kernel density estimators, where the problem of interest is distinguishing between two univariate distributions. When the densities intersect at a single point, optimal bandwidth choice depends on curvatures of the densities at that point. The problem of empirical bandwidth selection and classifying data in the tails of a distribution are also addressed.

  • PDF

k-Nearest Neighbor-Based Approach for the Estimation of Mutual Information (상호정보 추정을 위한 k-최근접이웃 기반방법)

  • Cha, Woon-Ock;Huh, Moon-Yul
    • Communications for Statistical Applications and Methods
    • /
    • v.15 no.6
    • /
    • pp.977-991
    • /
    • 2008
  • This study is about the k-nearest neighbor-based approach for the estimation of mutual information when the type of target variable is categorical and continuous. The results of Monte-Carlo simulation and experiments with real-world data show that k=1 is preferable. In practical application with real world data, our study shows that jittering and bootstrapping is needed.

A data-adaptive maximum penalized likelihood estimation for the generalized extreme value distribution

  • Lee, Youngsaeng;Shin, Yonggwan;Park, Jeong-Soo
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.5
    • /
    • pp.493-505
    • /
    • 2017
  • Maximum likelihood estimation (MLE) of the generalized extreme value distribution (GEVD) is known to sometimes over-estimate the positive value of the shape parameter for the small sample size. The maximum penalized likelihood estimation (MPLE) with Beta penalty function was proposed by some researchers to overcome this problem. But the determination of the hyperparameters (HP) in Beta penalty function is still an issue. This paper presents some data adaptive methods to select the HP of Beta penalty function in the MPLE framework. The idea is to let the data tell us what HP to use. For given data, the optimal HP is obtained from the minimum distance between the MLE and MPLE. A bootstrap-based method is also proposed. These methods are compared with existing approaches. The performance evaluation experiments for GEVD by Monte Carlo simulation show that the proposed methods work well for bias and mean squared error. The methods are applied to Blackstone river data and Korean heavy rainfall data to show better performance over MLE, the method of L-moments estimator, and existing MPLEs.

A statistical consideration on the number of occurrences of langerhans cells (란게르한스 세포의 출현횟수에 대한 통계적 고찰)

  • 이기원
    • The Korean Journal of Applied Statistics
    • /
    • v.5 no.2
    • /
    • pp.271-282
    • /
    • 1992
  • A statistical method to investigate the relationship between the occurrence of Langerahans cells and neoplastic transformation of uterine cerivx. The best fitting submodel which satisfies the selection criterion similar in type to AIC is selected among the possible submodels based on Poisson probability models. A bootstrap method is used to approximate the sampling distribution of the selection criterion and the usual normal approximation is used to find the asymptotic distribution of the estimated rates.

  • PDF