• Title/Summary/Keyword: data sparsity

Search Result 175, Processing Time 0.023 seconds

An Exploratory Study for Decreasing Error of Prediction Value of Recommended System on User Based

  • Lee, Hee-Choon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.1
    • /
    • pp.77-86
    • /
    • 2006
  • This study is to investigate the error of prediction value with related variables from the recommended system and to examine the error of prediction value with related variables. To decrease the error on the collaborative recommended system on user based, this research explored the effects on the prediction related response pair between raters' demographic variables and Pearson's coefficient and sparsity. The result shows comparative analysis between existing error of prediction value and conditioned one.

  • PDF

Performance Analysis of Similarity Reflecting Jaccard Index for Solving Data Sparsity in Collaborative Filtering (협력필터링의 데이터 희소성 해결을 위한 자카드 지수 반영의 유사도 성능 분석)

  • Lee, Soojung
    • The Journal of Korean Association of Computer Education
    • /
    • v.19 no.4
    • /
    • pp.59-66
    • /
    • 2016
  • It has been studied to reflect the number of co-rated items for solving data sparsity problem in collaborative filtering systems. A well-known method of Jaccard index allowed performance improvement, when combined with previous similarity measures. However, the degree of performance improvement when combined with existing similarity measures in various data environments are seldom analyzed, which is the objective of this study. Jaccard index as a sole similarity measure yielded much higher prediction quality than traditional measures and very high recommendation quality in a sparse dataset. In general, previous similarity measures combined with Jaccard index improved performance regardless of dataset characteristics. Especially, cosine similarity achieved the highest improvement in sparse datasets, while similarity of Mean Squared Difference degraded prediction quality in denser sets. Therefore, one needs to consider characteristics of data environment and similarity measures before combining Jaccard index for similarity use.

Sparse Document Data Clustering Using Factor Score and Self Organizing Maps (인자점수와 자기조직화지도를 이용한 희소한 문서데이터의 군집화)

  • Jun, Sung-Hae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.22 no.2
    • /
    • pp.205-211
    • /
    • 2012
  • The retrieved documents have to be transformed into proper data structure for the clustering algorithms of statistics and machine learning. A popular data structure for document clustering is document-term matrix. This matrix has the occurred frequency value of a term in each document. There is a sparsity problem in this matrix because most frequencies of the matrix are 0 values. This problem affects the clustering performance. The sparseness of document-term matrix decreases the performance of clustering result. So, this research uses the factor score by factor analysis to solve the sparsity problem in document clustering. The document-term matrix is transformed to document-factor score matrix using factor scores in this paper. Also, the document-factor score matrix is used as input data for document clustering. To compare the clustering performances between document-term matrix and document-factor score matrix, this research applies two typed matrices to self organizing map (SOM) clustering.

WV-BTM: A Technique on Improving Accuracy of Topic Model for Short Texts in SNS (WV-BTM: SNS 단문의 주제 분석을 위한 토픽 모델 정확도 개선 기법)

  • Song, Ae-Rin;Park, Young-Ho
    • Journal of Digital Contents Society
    • /
    • v.19 no.1
    • /
    • pp.51-58
    • /
    • 2018
  • As the amount of users and data of NS explosively increased, research based on SNS Big data became active. In social mining, Latent Dirichlet Allocation(LDA), which is a typical topic model technique, is used to identify the similarity of each text from non-classified large-volume SNS text big data and to extract trends therefrom. However, LDA has the limitation that it is difficult to deduce a high-level topic due to the semantic sparsity of non-frequent word occurrence in the short sentence data. The BTM study improved the limitations of this LDA through a combination of two words. However, BTM also has a limitation that it is impossible to calculate the weight considering the relation with each subject because it is influenced more by the high frequency word among the combined words. In this paper, we propose a technique to improve the accuracy of existing BTM by reflecting semantic relation between words.

Adaptive lasso in sparse vector autoregressive models (Adaptive lasso를 이용한 희박벡터자기회귀모형에서의 변수 선택)

  • Lee, Sl Gi;Baek, Changryong
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.1
    • /
    • pp.27-39
    • /
    • 2016
  • This paper considers variable selection in the sparse vector autoregressive (sVAR) model where sparsity comes from setting small coefficients to exact zeros. In the estimation perspective, Davis et al. (2015) showed that the lasso type of regularization method is successful because it provides a simultaneous variable selection and parameter estimation even for time series data. However, their simulations study reports that the regular lasso overestimates the number of non-zero coefficients, hence its finite sample performance needs improvements. In this article, we show that the adaptive lasso significantly improves the performance where the adaptive lasso finds the sparsity patterns superior to the regular lasso. Some tuning parameter selections in the adaptive lasso are also discussed from the simulations study.

Compressive Sensing-Based L1-SVD DOA Estimation (압축센싱기법 기반 L1-SVD 도래각 추정)

  • Cho, Yunseong;Paik, Ji-Woong;Lee, Joon-Ho;Ko, Yo Han;Cho, Sung-Woo
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.27 no.4
    • /
    • pp.388-394
    • /
    • 2016
  • There have been many studies on the direction-of-arrival(DOA) estimation algorithm using antenna arrays. Beamforming, Capon's method, maximum likelihood, MUSIC algorithms are the main algorithms for the DOA estimation. Recently, compressive sensing-based DOA estimation algorithm exploiting the sparsity of the incident signals has attracted much attention in the signal processing community. In this paper, the performance of the L1-SVD algorithm, which is based on fitting of the data matrix, is compared with that of the MUSIC algorithm.

Using Genre Rating Information for Similarity Estimation in Collaborative Filtering

  • Lee, Soojung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.12
    • /
    • pp.93-100
    • /
    • 2019
  • Similarity computation is very crucial to performance of memory-based collaborative filtering systems. These systems make use of user ratings to recommend products to customers in online commercial sites. For better recommendation, most similar users to the active user need to be selected for their references. There have been numerous similarity measures developed in literature, most of which suffer from data sparsity or cold start problems. This paper intends to extract preference information as much as possible from user ratings to compute more reliable similarity even in a sparse data condition, as compared to previous similarity measures. We propose a new similarity measure which relies not only on user ratings but also on movie genre information provided by the dataset. Performance experiments of the proposed measure and previous relevant measures are conducted to investigate their performance. As a result, it is found that the proposed measure yields better or comparable achievements in terms of major performance metrics.

Collaborative Filtering for Credit Card Recommendation based on Multiple User Profiles (신용카드 추천을 위한 다중 프로파일 기반 협업필터링)

  • Lee, Won Cheol;Yoon, Hyoup Sang;Jeong, Seok Bong
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.40 no.4
    • /
    • pp.154-163
    • /
    • 2017
  • Collaborative filtering, one of the most widely used techniques to build recommender systems, is based on the idea that users with similar preferences can help one another find useful items. Credit card user behavior analytics show that most customers hold three or less credit cards without duplicates. This behavior is one of the most influential factors to data sparsity. The 'cold-start' problem caused by data sparsity prevents recommender system from providing recommendation properly in the personalized credit card recommendation scenario. We propose a personalized credit card recommender system to address the cold-start problem, using multiple user profiles. The proposed system consists of a training process and an application process using five user profiles. In the training process, the five user profiles are transformed to five user networks based on the cosine similarity, and an integrated user network is derived by weighted sum of each user network. The application process selects k-nearest neighbors (users) from the integrated user network derived in the training process, and recommends three of the most frequently used credit card by the k-nearest neighbors. In order to demonstrate the performance of the proposed system, we conducted experiments with real credit card user data and calculated the F1 Values. The F1 value of the proposed system was compared with that of the existing recommendation techniques. The results show that the proposed system provides better recommendation than the existing techniques. This paper not only contributes to solving the cold start problem that may occur in the personalized credit card recommendation scenario, but also is expected for financial companies to improve customer satisfactions and increase corporate profits by providing recommendation properly.

Regularized Optimization of Collaborative Filtering for Recommander System based on Big Data (빅데이터 기반 추천시스템을 위한 협업필터링의 최적화 규제)

  • Park, In-Kyu;Choi, Gyoo-Seok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.1
    • /
    • pp.87-92
    • /
    • 2021
  • Bias, variance, error and learning are important factors for performance in modeling a big data based recommendation system. The recommendation model in this system must reduce complexity while maintaining the explanatory diagram. In addition, the sparsity of the dataset and the prediction of the system are more likely to be inversely proportional to each other. Therefore, a product recommendation model has been proposed through learning the similarity between products by using a factorization method of the sparsity of the dataset. In this paper, the generalization ability of the model is improved by applying the max-norm regularization as an optimization method for the loss function of this model. The solution is to apply a stochastic projection gradient descent method that projects a gradient. The sparser data became, it was confirmed that the propsed regularization method was relatively effective compared to the existing method through lots of experiment.

A comparative study between various LU update methods in the simplex method (단체법에서 여러가지 상하 분해요소 수정방법들의 비교)

  • 임성묵;김기태;박순달
    • Journal of the military operations research society of Korea
    • /
    • v.29 no.1
    • /
    • pp.28-42
    • /
    • 2003
  • The simplex method requires basis update in each iteration, which is the most time consuming process. Several methods have been developed for the update of basis which is represented in LU factorized form, such as Bartels-Golub's method, Forrest-Tomlin's method, Reid's method, Saunders's method, etc. In this research, we compare between the updating methods in terms of sparsity, data structure and computing time issues. The analysis is mainly based on the computational experience.