• Title/Summary/Keyword: sparse data matrix

Search Result 70, Processing Time 0.031 seconds

A small review and further studies on the LASSO

  • Kwon, Sunghoon;Han, Sangmi;Lee, Sangin
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.5
    • /
    • pp.1077-1088
    • /
    • 2013
  • High-dimensional data analysis arises from almost all scientific areas, evolving with development of computing skills, and has encouraged penalized estimations that play important roles in statistical learning. For the past years, various penalized estimations have been developed, and the least absolute shrinkage and selection operator (LASSO) proposed by Tibshirani (1996) has shown outstanding ability, earning the first place on the development of penalized estimation. In this paper, we first introduce a number of recent advances in high-dimensional data analysis using the LASSO. The topics include various statistical problems such as variable selection and grouped or structured variable selection under sparse high-dimensional linear regression models. Several unsupervised learning methods including inverse covariance matrix estimation are presented. In addition, we address further studies on new applications which may establish a guideline on how to use the LASSO for statistical challenges of high-dimensional data analysis.

Experimental Comparisons of Simplex Method Program's Speed with Various Memory Referencing Techniques and Data Structures (여러 가지 컴퓨터 메모리 참조 방법과 자료구조에 대한 단체법 프로그램 수행 속도의 비교)

  • Park, Chan-Kyoo;Lim, Sung-Mook;Kim, Woo-Jae;Park, Soon-Dal
    • IE interfaces
    • /
    • v.11 no.2
    • /
    • pp.149-157
    • /
    • 1998
  • In this paper, various techniques considering the characteristics of computer memory management are suggested, which can be used in the implementation of simplex method. First, reduction technique of indirect addressing, redundant references of memory, and scatter/gather technique are implemented, and the effectiveness of the techniques is shown. Loop-unrolling technique, which exploits the arithmetic operation mechanism of computer, is also implemented. Second, a subroutine frequently called is written in low-level language, and the effectiveness is proved by experimental results. Third, row-column linked list and Gustavson's data structure are compared as the data structure for the large sparse matrix in LU form. Last, buffering technique and memory-mapped file which can be used in reading large data file are implemented and the effectiveness is shown.

  • PDF

Feature-selection algorithm based on genetic algorithms using unstructured data for attack mail identification (공격 메일 식별을 위한 비정형 데이터를 사용한 유전자 알고리즘 기반의 특징선택 알고리즘)

  • Hong, Sung-Sam;Kim, Dong-Wook;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.20 no.1
    • /
    • pp.1-10
    • /
    • 2019
  • Since big-data text mining extracts many features and data, clustering and classification can result in high computational complexity and low reliability of the analysis results. In particular, a term document matrix obtained through text mining represents term-document features, but produces a sparse matrix. We designed an advanced genetic algorithm (GA) to extract features in text mining for detection model. Term frequency inverse document frequency (TF-IDF) is used to reflect the document-term relationships in feature extraction. Through a repetitive process, a predetermined number of features are selected. And, we used the sparsity score to improve the performance of detection model. If a spam mail data set has the high sparsity, detection model have low performance and is difficult to search the optimization detection model. In addition, we find a low sparsity model that have also high TF-IDF score by using s(F) where the numerator in fitness function. We also verified its performance by applying the proposed algorithm to text classification. As a result, we have found that our algorithm shows higher performance (speed and accuracy) in attack mail classification.

An Efficient Ordering Method and Data Structure of the Interior Point Method (Putting Emphasis on the Minimum Deficiency Ordering (내부점기법에 있어서 효율적인 순서화와 자료구조(최소부족순서화를 중심으로))

  • 박순달;김병규;성명기
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.21 no.3
    • /
    • pp.63-74
    • /
    • 1996
  • Ordering plays an important role in solving an LP problem with sparse matrix by the interior point method. Since ordering is NP-complete, we try to find an efficient method. The objective of this paper is to present an efficient heuristic ordering method for implementation of the minimum deficiency method. Both the ordering method and the data structure play important roles in implementation. First we define a new heuristic pseudo-deficiency ordering method and a data structure for the method-quotient graph and cliqued storage. Next we show an experimental result in terms of time and nonzero numbers by NETLIB problems.

  • PDF

A Study on Bias Effect on Model Selection Criteria in Graphical Lasso

  • Choi, Young-Geun;Jeong, Seyoung;Yu, Donghyeon
    • Quantitative Bio-Science
    • /
    • v.37 no.2
    • /
    • pp.133-141
    • /
    • 2018
  • Graphical lasso is one of the most popular methods to estimate a sparse precision matrix, which is an inverse of a covariance matrix. The objective function of graphical lasso imposes an ${\ell}_1$-penalty on the (vectorized) precision matrix, where a tuning parameter controls the strength of the penalization. The selection of the tuning parameter is practically and theoretically important since the performance of the estimation depends on an appropriate choice of tuning parameter. While information criteria (e.g. AIC, BIC, or extended BIC) have been widely used, they require an asymptotically unbiased estimator to select optimal tuning parameter. Thus, the biasedness of the ${\ell}_1$-regularized estimate in the graphical lasso may lead to a suboptimal tuning. In this paper, we propose a two-staged bias-correction procedure for the graphical lasso, where the first stage runs the usual graphical lasso and the second stage reruns the procedure with an additional constraint that zero estimates at the first stage remain zero. Our simulation and real data example show that the proposed bias correction improved on both edge recovery and estimation error compared to the single-staged graphical lasso.

Coding-based Storage Design for Continuous Data Collection in Wireless Sensor Networks

  • Zhan, Cheng;Xiao, Fuyuan
    • Journal of Communications and Networks
    • /
    • v.18 no.3
    • /
    • pp.493-501
    • /
    • 2016
  • In-network storage is an effective technique for avoiding network congestion and reducing power consumption in continuous data collection in wireless sensor networks. In recent years, network coding based storage design has been proposed as a means to achieving ubiquitous access that permits any query to be satisfied by a few random (nearby) storage nodes. To maintain data consistency in continuous data collection applications, the readings of a sensor over time must be sent to the same set of storage nodes. In this paper, we present an efficient approach to updating data at storage nodes to maintain data consistency at the storage nodes without decoding out the old data and re-encoding with new data. We studied a transmission strategy that identifies a set of storage nodes for each source sensor that minimizes the transmission cost and achieves ubiquitous access by transmitting sparsely using the sparse matrix theory. We demonstrate that the problem of minimizing the cost of transmission with coding is NP-hard. We present an approximation algorithm based on regarding every storage node with memory size B as B tiny nodes that can store only one packet. We analyzed the approximation ratio of the proposed approximation solution, and compared the performance of the proposed coding approach with other coding schemes presented in the literature. The simulation results confirm that significant performance improvement can be achieved with the proposed transmission strategy.

Documents recommendation using large citation data (거대 인용 자료를 이용한 문서 추천 방법)

  • Chae, Minwoo;Kang, Minsoo;Kim, Yongdai
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.5
    • /
    • pp.999-1011
    • /
    • 2013
  • In this research, we propose a document recommendation method which can find documents that are relatively important to a specific document based on citation information. The key idea is parameter tuning in the Neumann kernal which is an intermediate between a measure of importance (HITS) and of relatedness (co-citation). Our method properly selects the tuning parameter ${\gamma}$ in the Neumann kernal minimizing the prediction error in future citation. We also discuss some comutational issues needed for analysing large citation data. Finally, results of analyzing patents data from the US Patent Office are given.

Study on Principal Sentiment Analysis of Social Data (소셜 데이터의 주된 감성분석에 대한 연구)

  • Jang, Phil-Sik
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.12
    • /
    • pp.49-56
    • /
    • 2014
  • In this paper, we propose a method for identifying hidden principal sentiments among large scale texts from documents, social data, internet and blogs by analyzing standard language, slangs, argots, abbreviations and emoticons in those words. The IRLBA(Implicitly Restarted Lanczos Bidiagonalization Algorithm) is used for principal component analysis with large scale sparse matrix. The proposed system consists of data acquisition, message analysis, sentiment evaluation, sentiment analysis and integration and result visualization modules. The suggested approaches would help to improve the accuracy and expand the application scope of sentiment analysis in social data.

Design of a Recommendation System for Improving Deep Neural Network Performance

  • Juhyoung Sung;Kiwon Kwon;Byoungchul Song
    • Journal of Internet Computing and Services
    • /
    • v.25 no.1
    • /
    • pp.49-56
    • /
    • 2024
  • There have been emerging many use-cases applying recommendation systems especially in online platform. Although the performance of recommendation systems is affected by a variety of factors, selecting appropriate features is difficult since most of recommendation systems have sparse data. Conventional matrix factorization (MF) method is a basic way to handle with problems in the recommendation systems. However, the MF based scheme cannot reflect non-linearity characteristics well. As deep learning technology has been attracted widely, a deep neural network (DNN) framework based collaborative filtering (CF) was introduced to complement the non-linearity issue. However, there is still a problem related to feature embedding for use as input to the DNN. In this paper, we propose an effective method using singular value decomposition (SVD) based feature embedding for improving the DNN performance of recommendation algorithms. We evaluate the performance of recommendation systems using MovieLens dataset and show the proposed scheme outperforms the existing methods. Moreover, we analyze the performance according to the number of latent features in the proposed algorithm. We expect that the proposed scheme can be applied to the generalized recommendation systems.

Eigen-analysis of SSR in Power Systems with Modular Network Model Equations (Modular 네트워크 모델 구성에 의한 전력계통 SSR 현상의 고유치해석)

  • Nam, Hae-Kon;Kim, Yong-Gu;Shim, Kwan-Shik
    • The Transactions of the Korean Institute of Electrical Engineers A
    • /
    • v.48 no.10
    • /
    • pp.1239-1246
    • /
    • 1999
  • This paper presents a new algorithm to construct the modular network model for SSR analysis by simply applying KCL to each node and KVL to all branches connected to the node sequentially. This method has advantages that the model can be derived directly from the system data for transient stability study and turbine/generator shaft model, the resulted model in the form of augmented state matrix is very sparse, and thus efficient SSR study of a large scale system becomes possible. The proposed algorithm is verified with the IEEE First and Second Benchmark models.

  • PDF