• Title/Summary/Keyword: Sparse data

Search Result 408, Processing Time 0.022 seconds

A Nonparametric Goodness-of-Fit Test for Sparse Multinomial Data

  • Baek, Jang-Sun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.2
    • /
    • pp.303-311
    • /
    • 2003
  • We consider the problem of testing cell probabilities in sparse multinomial data. Aerts, et al.(2000) presented $T_1=\sum\limits_{i=1}^k(\hat{p}_i-p_i)^2$ as a test statistic with the local polynomial estimator $(\hat{p}_i$, and showed its asymptotic distribution. When there are cell probabilities with relatively much different sizes, the same contribution of the difference between the estimator and the hypothetical probability at each cell in their test statistic would not be proper to measure the total goodness-of-fit. We consider a Pearson type of goodness-of-fit test statistic, $T=\sum\limits_{i=1}^k(\hat{p}_i-p_i)^2/p_i$ instead, and show it follows an asymptotic normal distribution.

  • PDF

Sparse-View CT Image Recovery Using Two-Step Iterative Shrinkage-Thresholding Algorithm

  • Chae, Byung Gyu;Lee, Sooyeul
    • ETRI Journal
    • /
    • v.37 no.6
    • /
    • pp.1251-1258
    • /
    • 2015
  • We investigate an image recovery method for sparse-view computed tomography (CT) using an iterative shrinkage algorithm based on a second-order approach. The two-step iterative shrinkage-thresholding (TwIST) algorithm including a total variation regularization technique is elucidated to be more robust than other first-order methods; it enables a perfect restoration of an original image even if given only a few projection views of a parallel-beam geometry. We find that the incoherency of a projection system matrix in CT geometry sufficiently satisfies the exact reconstruction principle even when the matrix itself has a large condition number. Image reconstruction from fan-beam CT can be well carried out, but the retrieval performance is very low when compared to a parallel-beam geometry. This is considered to be due to the matrix complexity of the projection geometry. We also evaluate the image retrieval performance of the TwIST algorithm -sing measured projection data.

Pertussis Toxin Inhibits Colchicine-Induced DNA Synthesis in Human Fibroblast

  • Jang, Won-Hee;Rhee, In-Ja
    • Archives of Pharmacal Research
    • /
    • v.17 no.3
    • /
    • pp.199-203
    • /
    • 1994
  • Several lines evidence indicate that microtubule depolymerization initiates DNA synthesis or enhances the effects of serum or purified growth factors in many types of fibroblasts. Yet little is known about the intracellular events responsible for the mitogenic effect of microtubule disrupting agents. The effects of antitubulin agents on DNA synthesis in sparse and dense cultures in the presence or absence of serum and possible involvement of G-proteins in their mitotic action were examined. In these studies, colchicine by itself appeared to be mitogenic only for confluent quiesecent human lung fibroblasts. In sparse culture, however, colchicine inhibited serum-stimulated DNA synthesis. Colcemid, another antitubulin agent, showed similar effects of growth inhibition and stimulation in sparse and confluent cultures while lumicolhicine, inactive colchicine, did not. The mitogenic effect of two antitubulin agents, colchicine and colcemid, was partially inhibited by pertussis toxin. These data suggest that microtubular integrity is associated with the expression of either negative or positive control on DNA synthesis and mitogenic effect of antitubulin agents may be partially mediated by pertussis toxin-sensitive G protein.

  • PDF

A Sparse Data Preprocessing Using Support Vector Regression (Support Vector Regression을 이용한 희소 데이터의 전처리)

  • Jun, Sung-Hae;Park, Jung-Eun;Oh, Kyung-Whan
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.6
    • /
    • pp.789-792
    • /
    • 2004
  • In various fields as web mining, bioinformatics, statistical data analysis, and so forth, very diversely missing values are found. These values make training data to be sparse. Largely, the missing values are replaced by predicted values using mean and mode. We can used the advanced missing value imputation methods as conditional mean, tree method, and Markov Chain Monte Carlo algorithm. But general imputation models have the property that their predictive accuracy is decreased according to increase the ratio of missing in training data. Moreover the number of available imputations is limited by increasing missing ratio. To settle this problem, we proposed statistical learning theory to preprocess for missing values. Our statistical learning theory is the support vector regression by Vapnik. The proposed method can be applied to sparsely training data. We verified the performance of our model using the data sets from UCI machine learning repository.

Feature selection for text data via sparse principal component analysis (희소주성분분석을 이용한 텍스트데이터의 단어선택)

  • Won Son
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.6
    • /
    • pp.501-514
    • /
    • 2023
  • When analyzing high dimensional data such as text data, if we input all the variables as explanatory variables, statistical learning procedures may suffer from over-fitting problems. Furthermore, computational efficiency can deteriorate with a large number of variables. Dimensionality reduction techniques such as feature selection or feature extraction are useful for dealing with these problems. The sparse principal component analysis (SPCA) is one of the regularized least squares methods which employs an elastic net-type objective function. The SPCA can be used to remove insignificant principal components and identify important variables from noisy observations. In this study, we propose a dimension reduction procedure for text data based on the SPCA. Applying the proposed procedure to real data, we find that the reduced feature set maintains sufficient information in text data while the size of the feature set is reduced by removing redundant variables. As a result, the proposed procedure can improve classification accuracy and computational efficiency, especially for some classifiers such as the k-nearest neighbors algorithm.

On Adaptation to Sparse Design in Bivariate Local Linear Regression

  • Hall, Peter;Seifert, Burkhardt;Turlach, Berwin A.
    • Journal of the Korean Statistical Society
    • /
    • v.30 no.2
    • /
    • pp.231-246
    • /
    • 2001
  • Local linear smoothing enjoys several excellent theoretical and numerical properties, an in a range of applications is the method most frequently chosen for fitting curves to noisy data. Nevertheless, it suffers numerical problems in places where the distribution of design points(often called predictors, or explanatory variables) is spares. In the case of univariate design, several remedies have been proposed for overcoming this problem, of which one involves adding additional ″pseudo″ design points in places where the orignal design points were too widely separated. This approach is particularly well suited to treating sparse bivariate design problem, and in fact attractive, elegant geometric analogues of unvariate imputation and interpolation rules are appropriate for that case. In the present paper we introduce and develop pseudo dta rules for bivariate design, and apply them to real data.

  • PDF

On Speaker Adaptations with Sparse Training Data for Improved Speaker Verification

  • Ahn, Sung-Joo;Kang, Sun-Mee;Ko, Han-Seok
    • Speech Sciences
    • /
    • v.7 no.1
    • /
    • pp.31-37
    • /
    • 2000
  • This paper concerns effective speaker adaptation methods to solve the over-training problem in speaker verification, which frequently occurs when modeling a speaker with sparse training data. While various speaker adaptations have already been applied to speech recognition, these methods have not yet been formally considered in speaker verification. This paper proposes speaker adaptation methods using a combination of MAP and MLLR adaptations, which are successfully used in speech recognition, and applies to speaker verification. Experimental results show that the speaker verification system using a weighted MAP and MLLR adaptation outperforms that of the conventional speaker models without adaptation by a factor of up to 5 times. From these results, we show that the speaker adaptation method achieves significantly better performance even when only small training data is available for speaker verification.

  • PDF

Sparse Kernel Regression using IRWLS Procedure

  • Park, Hye-Jung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.3
    • /
    • pp.735-744
    • /
    • 2007
  • Support vector machine(SVM) is capable of providing a more complete description of the linear and nonlinear relationships among random variables. In this paper we propose a sparse kernel regression(SKR) to overcome a weak point of SVM, which is, the steep growth of the number of support vectors with increasing the number of training data. The iterative reweighted least squares(IRWLS) procedure is used to solve the optimal problem of SKR with a Laplacian prior. Furthermore, the generalized cross validation(GCV) function is introduced to select the hyper-parameters which affect the performance of SKR. Experimental results are then presented which illustrate the performance of the proposed procedure.

  • PDF

Geostatistical Integration of Different Sources of Elevation and its Effect on Landslide Hazard Mapping

  • Park, No-Wook;Kyriakidis, Phaedon C.
    • Korean Journal of Remote Sensing
    • /
    • v.24 no.5
    • /
    • pp.453-462
    • /
    • 2008
  • The objective of this paper is to compare the prediction performances of different landslide hazard maps based on topographic data stemming from different sources of elevation. The geostatistical framework of kriging, which can properly integrate spatial data with different accuracy, is applied for generating more reliable elevation estimates from both sparse elevation spot heights and exhaustive ASTER-based elevation values. A case study from Boeun, Korea illustrates that the integration of elevation and slope maps derived from different data yielded different prediction performances for landslide hazard mapping. The landslide hazard map constructed by using the elevation and the associated slope maps based on geostatistical integration of spot heights and ASTER-based elevation resulted in the best prediction performance. Landslide hazard mapping using elevation and slope maps derived from the interpolation of only sparse spot heights showed the worst prediction performance.