• Title/Summary/Keyword: Sparse Data Set

Search Result 47, Processing Time 0.021 seconds

Sparse Data Cleaning using Multiple Imputations

  • Jun, Sung-Hae;Lee, Seung-Joo;Oh, Kyung-Whan
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.4 no.1
    • /
    • pp.119-124
    • /
    • 2004
  • Real data as web log file tend to be incomplete. But we have to find useful knowledge from these for optimal decision. In web log data, many useful things which are hyperlink information and web usages of connected users may be found. The size of web data is too huge to use for effective knowledge discovery. To make matters worse, they are very sparse. We overcome this sparse problem using Markov Chain Monte Carlo method as multiple imputations. This missing value imputation changes spare web data to complete. Our study may be a useful tool for discovering knowledge from data set with sparseness. The more sparseness of data in increased, the better performance of MCMC imputation is good. We verified our work by experiments using UCI machine learning repository data.

Combing data representation by Sparse Autoencoder and the well-known load balancing algorithm, ProGReGA-KF (Sparse Autoencoder의 데이터 특징 추출과 ProGReGA-KF를 결합한 새로운 부하 분산 알고리즘)

  • Kim, Chayoung;Park, Jung-min;Kim, Hye-young
    • Journal of Korea Game Society
    • /
    • v.17 no.5
    • /
    • pp.103-112
    • /
    • 2017
  • In recent years, expansions and advances of the Internet of Things (IoTs) in a distributed MMOGs (massively multiplayer online games) architecture have resulted in massive growth of data in terms of server workloads. We propose a combing Sparse Autoencoder and one of platforms in MMOGs, ProGReGA. In the process of Sparse Autoencoder, data representation with respect to enhancing the feature is excluded from this set of data. In the process of load balance, the graceful degradation of ProGReGA can exploit the most relevant and less redundant feature of the data representation. We find out that the proposed algorithm have become more stable.

ASSVD: Adaptive Sparse Singular Value Decomposition for High Dimensional Matrices

  • Ding, Xiucai;Chen, Xianyi;Zou, Mengling;Zhang, Guangxing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.6
    • /
    • pp.2634-2648
    • /
    • 2020
  • In this paper, an adaptive sparse singular value decomposition (ASSVD) algorithm is proposed to estimate the signal matrix when only one data matrix is observed and there is high dimensional white noise, in which we assume that the signal matrix is low-rank and has sparse singular vectors, i.e. it is a simultaneously low-rank and sparse matrix. It is a structured matrix since the non-zero entries are confined on some small blocks. The proposed algorithm estimates the singular values and vectors separable by exploring the structure of singular vectors, in which the recent developments in Random Matrix Theory known as anisotropic Marchenko-Pastur law are used. And then we prove that when the signal is strong in the sense that the signal to noise ratio is above some threshold, our estimator is consistent and outperforms over many state-of-the-art algorithms. Moreover, our estimator is adaptive to the data set and does not require the variance of the noise to be known or estimated. Numerical simulations indicate that ASSVD still works well when the signal matrix is not very sparse.

Feature Extraction via Sparse Difference Embedding (SDE)

  • Wan, Minghua;Lai, Zhihui
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.7
    • /
    • pp.3594-3607
    • /
    • 2017
  • The traditional feature extraction methods such as principal component analysis (PCA) cannot obtain the local structure of the samples, and locally linear embedding (LLE) cannot obtain the global structure of the samples. However, a common drawback of existing PCA and LLE algorithm is that they cannot deal well with the sparse problem of the samples. Therefore, by integrating the globality of PCA and the locality of LLE with a sparse constraint, we developed an improved and unsupervised difference algorithm called Sparse Difference Embedding (SDE), for dimensionality reduction of high-dimensional data in small sample size problems. Significantly differing from the existing PCA and LLE algorithms, SDE seeks to find a set of perfect projections that can not only impact the locality of intraclass and maximize the globality of interclass, but can also simultaneously use the Lasso regression to obtain a sparse transformation matrix. This characteristic makes SDE more intuitive and more powerful than PCA and LLE. At last, the proposed algorithm was estimated through experiments using the Yale and AR face image databases and the USPS handwriting digital databases. The experimental results show that SDE outperforms PCA LLE and UDP attributed to its sparse discriminating characteristics, which also indicates that the SDE is an effective method for face recognition.

Image Denoising Using Nonlocal Similarity and 3D Filtering (비지역적 유사성 및 3차원 필터링 기반 영상 잡음제거)

  • Kim, Seehyun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.10
    • /
    • pp.1886-1891
    • /
    • 2017
  • Denoising which is one of major research topics in the image processing deals with recovering the noisy images. Natural images are well known not only for their local but also nonlocal similarity. Patterns of unique edges and texture which are crucial for understanding the image are repeated over the nonlocal region. In this paper, a nonlocal similarity based denoising algorithm is proposed. First for every blocks of the noisy image, nonlocal similar blocks are gathered to construct a overcomplete data set which are inherently sparse in the transform domain due to the characteristics of the images. Then, the sparse transform coefficients are filtered to suppress the non-sparse additive noise. Finally, the image is recovered by aggregating the overcomplete estimates of each pixel. Performance experiments with several images show that the proposed algorithm outperforms the conventional methods in removing the additive Gaussian noise effectively while preserving the image details.

Hierarchically penalized sparse principal component analysis (계층적 벌점함수를 이용한 주성분분석)

  • Kang, Jongkyeong;Park, Jaeshin;Bang, Sungwan
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.1
    • /
    • pp.135-145
    • /
    • 2017
  • Principal component analysis (PCA) describes the variation of multivariate data in terms of a set of uncorrelated variables. Since each principal component is a linear combination of all variables and the loadings are typically non-zero, it is difficult to interpret the derived principal components. Sparse principal component analysis (SPCA) is a specialized technique using the elastic net penalty function to produce sparse loadings in principal component analysis. When data are structured by groups of variables, it is desirable to select variables in a grouped manner. In this paper, we propose a new PCA method to improve variable selection performance when variables are grouped, which not only selects important groups but also removes unimportant variables within identified groups. To incorporate group information into model fitting, we consider a hierarchical lasso penalty instead of the elastic net penalty in SPCA. Real data analyses demonstrate the performance and usefulness of the proposed method.

CONSTRUCTIONS OF REGULAR SPARSE ANTI-MAGIC SQUARES

  • Chen, Guangzhou;Li, Wen;Xin, Bangying;Zhong, Ming
    • Bulletin of the Korean Mathematical Society
    • /
    • v.59 no.3
    • /
    • pp.617-642
    • /
    • 2022
  • For positive integers n and d with d < n, an n × n array A based on 𝒳 = {0, 1, …, nd} is called a sparse anti-magic square of order n with density d, denoted by SAMS(n, d), if each non-zero element of X occurs exactly once in A, and its row-sums, column-sums and two main diagonal-sums constitute a set of 2n + 2 consecutive integers. An SAMS(n, d) is called regular if there are exactly d non-zero elements in each row, each column and each main diagonal. In this paper, we investigate the existence of regular sparse anti-magic squares of order n ≡ 1, 5 (mod 6), and prove that there exists a regular SAMS(n, d) for any n ≥ 5, n ≡ 1, 5 (mod 6) and d with 2 ≤ d ≤ n - 1.

Efficient Mining of Frequent Itemsets in a Sparse Data Set (희소 데이터 집합에서 효율적인 빈발 항목집합 탐사 기법)

  • Park In-Chang;Chang Joong-Hyuk;Lee Won-Suk
    • The KIPS Transactions:PartD
    • /
    • v.12D no.6 s.102
    • /
    • pp.817-828
    • /
    • 2005
  • The main research problems in a mining frequent itemsets are reducing memory usage and processing time of the mining process, and most of the previous algorithms for finding frequent itemsets are based on an Apriori-property, and they are multi-scan algorithms. Moreover, their processing time are greatly increased as the length of a maximal frequent itemset. To overcome this drawback, another approaches had been actively proposed in previous researches to reduce the processing time. However, they are not efficient on a sparse .data set This paper proposed an efficient mining algorithm for finding frequent itemsets. A novel tree structure, called an $L_2$-tree, was proposed int, and an efficient mining algorithm of frequent itemsets using $L_2$-tree, called an $L_2$-traverse algorithm was also proposed. An $L_2$-tree is constructed from $L_2$, i.e., a set of frequent itemsets of size 2, and an $L_2$-traverse algorithm can find its mining result in a short time by traversing the $L_2$-tree once. To reduce the processing more, this paper also proposed an optimized algorithm $C_3$-traverse, which removes previously an itemset in $L_2$ not to be a frequent itemsets of size 3. Through various experiments, it was verified that the proposed algorithms were efficient in a sparse data set.

Feature selection for text data via sparse principal component analysis (희소주성분분석을 이용한 텍스트데이터의 단어선택)

  • Won Son
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.6
    • /
    • pp.501-514
    • /
    • 2023
  • When analyzing high dimensional data such as text data, if we input all the variables as explanatory variables, statistical learning procedures may suffer from over-fitting problems. Furthermore, computational efficiency can deteriorate with a large number of variables. Dimensionality reduction techniques such as feature selection or feature extraction are useful for dealing with these problems. The sparse principal component analysis (SPCA) is one of the regularized least squares methods which employs an elastic net-type objective function. The SPCA can be used to remove insignificant principal components and identify important variables from noisy observations. In this study, we propose a dimension reduction procedure for text data based on the SPCA. Applying the proposed procedure to real data, we find that the reduced feature set maintains sufficient information in text data while the size of the feature set is reduced by removing redundant variables. As a result, the proposed procedure can improve classification accuracy and computational efficiency, especially for some classifiers such as the k-nearest neighbors algorithm.

A Study on the Validation Test for Open Set Face Recognition Method with a Dummy Class (더미 클래스를 가지는 열린 집합 얼굴 인식 방법의 유효성 검증에 대한 연구)

  • Ahn, Jung-Ho;Choi, KwonTaeg
    • Journal of Digital Contents Society
    • /
    • v.18 no.3
    • /
    • pp.525-534
    • /
    • 2017
  • The open set recognition method should be used for the cases that the classes of test data are not known completely in the training phase. So it is required to include two processes of classification and the validation test. This kind of research is very necessary for commercialization of face recognition modules, but few domestic researches results about it have been published. In this paper, we propose an open set face recognition method that includes two sequential validation phases. In the first phase, with dummy classes we perform classification based on sparse representation. Here, when the test data is classified into a dummy class, we conclude that the data is invalid. If the data is classified into one of the regular training classes, for second validation test we extract four features and apply them for the proposed decision function. In experiments, we proposed a simulation method for open set recognition and showed that the proposed validation test outperform SCI of the well-known validation method