• Title/Summary/Keyword: small sample size problem

Search Result 56, Processing Time 0.038 seconds

A Resampling Method for Small Sample Size Problems in Face Recognition using LDA (LDA를 이용한 얼굴인식에서의 Small Sample Size문제 해결을 위한 Resampling 방법)

  • Oh, Jae-Hyun;Kwak, Jo-Jun
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.46 no.2
    • /
    • pp.78-88
    • /
    • 2009
  • In many face recognition problems, the number of available images is limited compared to the dimension of the input space which is usually equal to the number of pixels. This problem is called as the 'small sample size' problem and regularization methods are typically used to solve this problem in feature extraction methods such as LDA. By using regularization methods, the modified within class matrix becomes nonsingu1ar and LDA can be performed in its original form. However, in the process of adding a scaled version of the identity matrix to the original within scatter matrix, the scale factor should be set heuristically and the performance of the recognition system depends on highly the value of the scalar factor. By using the proposed resampling method, we can generate a set of images similar to but slightly different from the original image. With the increased number of images, the small sample size problem is alleviated and the classification performance increases. Unlike regularization method, the resampling method does not suffer from the heuristic setting of the parameter producing better performance.

On Optimizing Dissimilarity-Based Classifier Using Multi-level Fusion Strategies (다단계 퓨전기법을 이용한 비유사도 기반 식별기의 최적화)

  • Kim, Sang-Woon;Duin, Robert P. W.
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.45 no.5
    • /
    • pp.15-24
    • /
    • 2008
  • For high-dimensional classification tasks, such as face recognition, the number of samples is smaller than the dimensionality of the samples. In such cases, a problem encountered in linear discriminant analysis-based methods for dimension reduction is what is known as the small sample size (SSS) problem. Recently, to solve the SSS problem, a way of employing a dissimilarity-based classification(DBC) has been investigated. In DBC, an object is represented based on the dissimilarity measures among representatives extracted from training samples instead of the feature vector itself. In this paper, we propose a new method of optimizing DBCs using multi-level fusion strategies(MFS), in which fusion strategies are employed to represent features as well as to design classifiers. Our experimental results for benchmark face databases demonstrate that the proposed scheme achieves further improved classification accuracies.

Elongated Radial Basis Function for Nonlinear Representation of Face Data

  • Kim, Sang-Ki;Yu, Sun-Jin;Lee, Sang-Youn
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.7C
    • /
    • pp.428-434
    • /
    • 2011
  • Recently, subspace analysis has raised its performance to a higher level through the adoption of kernel-based nonlinearity. Especially, the radial basis function, based on its nonparametric nature, has shown promising results in face recognition. However, due to the endemic small sample size problem of face data, the conventional kernel-based feature extraction methods have difficulty in data representation. In this paper, we introduce a novel variant of the RBF kernel to alleviate this problem. By adopting the concept of the nearest feature line classifier, we show both effectiveness and generalizability of the proposed method, particularly regarding the small sample size issue.

Sample Size Determination for Comparing Tail Probabilities (극소 비율의 비교에 대한 표본수 결정)

  • Lee, Ji-An;Song, Hae-Hiang
    • The Korean Journal of Applied Statistics
    • /
    • v.20 no.1
    • /
    • pp.183-194
    • /
    • 2007
  • The problem of calculating the sample sizes for comparing two independent binomial proportions is studied, when one of two probabilities or both are smaller than 0.05. The use of Whittemore(1981)'s corrected sample size formula for small response probability, which is derived based oB multiple logistic regression, demonstrates much larger sample sizes compared to those by the asymptotic normal method, which is derived for the comparison of response probabilities belonging to the normal range. Therefore, applied statisticians need to be careful in sample size determination with small response probability to ensure intended power during a planning stage of clinical trials. The results of this study describe that the use of the sample size formula in the textbooks might sometimes be risky.

Smoothed Local PC0A by BYY data smoothing learning

  • Liu, Zhiyong;Xu, Lei
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2001.10a
    • /
    • pp.109.3-109
    • /
    • 2001
  • The so-called curse of dimensionality arises when Gaussian mixture is used on high-dimensional small-sample-size data, since the number of free elements that needs to be specied in each covariance matrix of Gaussian mixture increases exponentially with the number of dimension d. In this paper, by constraining the covariance matrix in its decomposed orthonormal form we get a local PCA model so as to reduce the number of free elements needed to be specified. Moreover, to cope with the small sample size problem, we adopt BYY data smoothing learning which is a regularization over maximum likelihood learning obtained from BYY harmony learning to implement this local PCA model.

  • PDF

An Improved method of Two Stage Linear Discriminant Analysis

  • Chen, Yarui;Tao, Xin;Xiong, Congcong;Yang, Jucheng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.3
    • /
    • pp.1243-1263
    • /
    • 2018
  • The two-stage linear discrimination analysis (TSLDA) is a feature extraction technique to solve the small size sample problem in the field of image recognition. The TSLDA has retained all subspace information of the between-class scatter and within-class scatter. However, the feature information in the four subspaces may not be entirely beneficial for classification, and the regularization procedure for eliminating singular metrics in TSLDA has higher time complexity. In order to address these drawbacks, this paper proposes an improved two-stage linear discriminant analysis (Improved TSLDA). The Improved TSLDA proposes a selection and compression method to extract superior feature information from the four subspaces to constitute optimal projection space, where it defines a single Fisher criterion to measure the importance of single feature vector. Meanwhile, Improved TSLDA also applies an approximation matrix method to eliminate the singular matrices and reduce its time complexity. This paper presents comparative experiments on five face databases and one handwritten digit database to validate the effectiveness of the Improved TSLDA.

A CONSISTENT AND BIAS CORRECTED EXTENSION OF AKAIKE'S INFORMATION CRITERION(AIC) : AICbc(k)

  • Kwon, Soon H.;Ueno, M.;Sugeno, M.
    • Journal of the Korean Society for Industrial and Applied Mathematics
    • /
    • v.2 no.1
    • /
    • pp.41-60
    • /
    • 1998
  • This paper derives a consistent and bias corrected extension of Akaike's Information Criterion (AIC), $AIC_{bc}$, based on Kullback-Leibler information. This criterion has terms that penalize the overparametrization more strongly for small and large samples than that of AIC. The overfitting problem of the asymptotically efficient model selection criteria for small and large samples will be overcome. The $AIC_{bc}$ also provides a consistent model order selection. Thus, it is widely applicable to data with small and/or large sample sizes, and to cases where the number of free parameters is a relatively large fraction of the sample size. Relationships with other model selection criteria such as $AIC_c$ of Hurvich, CAICF of Bozdogan and etc. are discussed. Empirical performances of the $AIC_{bc}$ are studied and discussed in better model order choices of a linear regression model using a Monte Carlo experiment.

  • PDF

A Bayesian inference for fixed effect panel probit model

  • Lee, Seung-Chun
    • Communications for Statistical Applications and Methods
    • /
    • v.23 no.2
    • /
    • pp.179-187
    • /
    • 2016
  • The fixed effects panel probit model faces "incidental parameters problem" because it has a property that the number of parameters to be estimated will increase with sample size. The maximum likelihood estimation fails to give a consistent estimator of slope parameter. Unlike the panel regression model, it is not feasible to find an orthogonal reparameterization of fixed effects to get a consistent estimator. In this note, a hierarchical Bayesian model is proposed. The model is essentially equivalent to the frequentist's random effects model, but the individual specific effects are estimable with the help of Gibbs sampling. The Bayesian estimator is shown to reduce reduced the small sample bias. The maximum likelihood estimator in the random effects model is also efficient, which contradicts Green (2004)'s conclusion.

Choosing between the Exact and the Approximate Confidence Intervals: For the Difference of Two Independent Binomial Proportions

  • Lee, Seung-Chun
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.2
    • /
    • pp.363-372
    • /
    • 2009
  • The difference of two independent binomial proportions is frequently of interest in biomedical research. The interval estimation may be an important tool for the inferential problem. Many confidence intervals have been proposed. They can be classified into the class of exact confidence intervals or the class of approximate confidence intervals. Ore may prefer exact confidence interval s in that they guarantee the minimum coverage probability greater than the nominal confidence level. However, someone, for example Agresti and Coull (1998) claims that "approximation is better than exact." It seems that when sample size is large, the approximate interval is more preferable to the exact interval. However, the choice is not clear when sample, size is small. In this note, an exact confidence and an approximate confidence interval, which were recommended by Santner et al. (2007) and Lee (2006b), respectively, are compared in terms of the coverage probability and the expected length.

Two-sample Linear Rank Tests for Efficient Edge Detection in Noisy Images (잡음영상에서 효과적인 에지검출을 위한 이표본 선형 순위 검정법)

  • Lim Dong-Hoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.11 no.4 s.42
    • /
    • pp.9-15
    • /
    • 2006
  • In this paper we propose Wilcoxon test, Median test and Van der Waerden test such as linear rank tests in two-sample location problem for detecting edges effectively in noisy images. These methods are based on detecting image intensity changes between two pixel neighborhoods using an edge-height model to perform effectively on noisy images. The neighborhood size used here is small and its shape is varied adaptively according to edge orientations. We compare and analysis the performance of these statistical edge detectors on both natural images and synthetic images with and without noise.

  • PDF