• Title/Summary/Keyword: Data Sparsity Problem

Search Result 83, Processing Time 0.024 seconds

The Effect of Data Sparsity on Prediction Accuracy in Recommender System (추천시스템의 희소성이 예측 정확도에 미치는 영향에 관한 연구)

  • Kim, Sun-Ok;Lee, Seok-Jun
    • Journal of Internet Computing and Services
    • /
    • v.8 no.6
    • /
    • pp.95-102
    • /
    • 2007
  • Recommender System based on the Collaborative Filtering has a problem of trust of the prediction accuracy because of its problem of sparsity. If the sparsity of a preference value is large, it causes a problem on a process of a choice of neighbors and also lowers the prediction accuracy. In this article, a change of MAE based on the sparsity is studied, groups are classified by sparsity and then, the significant difference among MAEs of classified groups is analyzed. To improve the accuracy of prediction among groups by the problem of sparsity, We studied the improvement of an accurate prediction for recommending system through reducing sparsity by sorting sparsity items, and replacing the average preference among them that has a lot of respondents with the preference evaluation value.

  • PDF

Method to Improve Data Sparsity Problem of Collaborative Filtering Using Latent Attribute Preference (잠재적 속성 선호도를 이용한 협업 필터링의 데이터 희소성 문제 개선 방법)

  • Kwon, Hyeong-Joon;Hong, Kwang-Seok
    • Journal of Internet Computing and Services
    • /
    • v.14 no.5
    • /
    • pp.59-67
    • /
    • 2013
  • In this paper, we propose the LAR_CF, latent attribute rating-based collaborative filtering, that is robust to data sparsity problem which is one of traditional problems caused of decreasing rating prediction accuracy. As compared with that existing collaborative filtering method uses a preference rating rated by users as feature vector to calculate similarity between objects, the proposed method improves data sparsity problem using unique attributes of two target objects with existing explicit preference. We consider MovieLens 100k dataset and its item attributes to evaluate the LAR_CF. As a result of artificial data sparsity and full-rating experiments, we confirmed that rating prediction accuracy can be improved rating prediction accuracy in data sparsity condition by the LAR_CF.

Development of a Personalized Similarity Measure using Genetic Algorithms for Collaborative Filtering

  • Lee, Soojung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.12
    • /
    • pp.219-226
    • /
    • 2018
  • Collaborative filtering has been most popular approach to recommend items in online recommender systems. However, collaborative filtering is known to suffer from data sparsity problem. As a simple way to overcome this problem in literature, Jaccard index has been adopted to combine with the existing similarity measures. We analyze performance of such combination in various data environments. We also find optimal weights of factors in the combination using a genetic algorithm to formulate a similarity measure. Furthermore, optimal weights are searched for each user independently, in order to reflect each user's different rating behavior. Performance of the resulting personalized similarity measure is examined using two datasets with different data characteristics. It presents overall superiority to previous measures in terms of recommendation and prediction qualities regardless of the characteristics of the data environment.

Power Failure Sensitivity Analysis via Grouped L1/2 Sparsity Constrained Logistic Regression

  • Li, Baoshu;Zhou, Xin;Dong, Ping
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.8
    • /
    • pp.3086-3101
    • /
    • 2021
  • To supply precise marketing and differentiated service for the electric power service department, it is very important to predict the customers with high sensitivity of electric power failure. To solve this problem, we propose a novel grouped 𝑙1/2 sparsity constrained logistic regression method for sensitivity assessment of electric power failure. Different from the 𝑙1 norm and k-support norm, the proposed grouped 𝑙1/2 sparsity constrained logistic regression method simultaneously imposes the inter-class information and tighter approximation to the nonconvex 𝑙0 sparsity to exploit multiple correlated attributions for prediction. Firstly, the attributes or factors for predicting the customer sensitivity of power failure are selected from customer sheets, such as customer information, electric consuming information, electrical bill, 95598 work sheet, power failure events, etc. Secondly, all these samples with attributes are clustered into several categories, and samples in the same category are assumed to be sharing similar properties. Then, 𝑙1/2 norm constrained logistic regression model is built to predict the customer's sensitivity of power failure. Alternating direction of multipliers (ADMM) algorithm is finally employed to solve the problem by splitting it into several sub-problems effectively. Experimental results on power electrical dataset with about one million customer data from a province validate that the proposed method has a good prediction accuracy.

Multiview-based Spectral Weighted and Low-Rank for Row-sparsity Hyperspectral Unmixing

  • Zhang, Shuaiyang;Hua, Wenshen;Liu, Jie;Li, Gang;Wang, Qianghui
    • Current Optics and Photonics
    • /
    • v.5 no.4
    • /
    • pp.431-443
    • /
    • 2021
  • Sparse unmixing has been proven to be an effective method for hyperspectral unmixing. Hyperspectral images contain rich spectral and spatial information. The means to make full use of spectral information, spatial information, and enhanced sparsity constraints are the main research directions to improve the accuracy of sparse unmixing. However, many algorithms only focus on one or two of these factors, because it is difficult to construct an unmixing model that considers all three factors. To address this issue, a novel algorithm called multiview-based spectral weighted and low-rank row-sparsity unmixing is proposed. A multiview data set is generated through spectral partitioning, and then spectral weighting is imposed on it to exploit the abundant spectral information. The row-sparsity approach, which controls the sparsity by the l2,0 norm, outperforms the single-sparsity approach in many scenarios. Many algorithms use convex relaxation methods to solve the l2,0 norm to avoid the NP-hard problem, but this will reduce sparsity and unmixing accuracy. In this paper, a row-hard-threshold function is introduced to solve the l2,0 norm directly, which guarantees the sparsity of the results. The high spatial correlation of hyperspectral images is associated with low column rank; therefore, the low-rank constraint is adopted to utilize spatial information. Experiments with simulated and real data prove that the proposed algorithm can obtain better unmixing results.

A Movie Recommendation System processing High-Dimensional Data with Fuzzy-AHP and Fuzzy Association Rules (퍼지 AHP와 퍼지 연관규칙을 이용하여 고차원 데이터를 처리하는 영화 추천 시스템)

  • Oh, Jae-Taek;Lee, Sang-Yong
    • Journal of Digital Convergence
    • /
    • v.17 no.2
    • /
    • pp.347-353
    • /
    • 2019
  • Recent recommendation systems are developing toward the utilization of high-dimensional data. However, high-dimensional data can increase algorithm complexity by expanding dimensions and be lower the accuracy of recommended items. In addition, it can cause the problem of data sparsity and make it difficult to provide users with proper recommended items. This study proposed an algorithm that classify users' subjective data with objective criteria with fuzzy-AHP and make use of rules with repetitive patterns through fuzzy association rules. Trying to check how problems with high-dimensional data would be mitigated by the algorithm, we performed 5-fold cross validation according to the changing number of users. The results show that the algorithm-applied system recorded accuracy that was 12.5% higher than that of the fuzzy-AHP-applied system and mitigated the problem of data sparsity.

Development of Web-based Intelligent Recommender Systems using Advanced Data Mining Techniques (개선된 데이터 마이닝 기술에 의한 웹 기반 지능형 추천시스템 구축)

  • Kim Kyoung-Jae;Ahn Hyunchul
    • Journal of Information Technology Applications and Management
    • /
    • v.12 no.3
    • /
    • pp.41-56
    • /
    • 2005
  • Product recommender system is one of the most popular techniques for customer relationship management. In addition, collaborative filtering (CF) has been known to be one of the most successful recommendation techniques in product recommender systems. However, CF has some limitations such as sparsity and scalability problems. This study proposes hybrid cluster analysis and case-based reasoning (CBR) to address these problems. CBR may relieve the sparsity problem because it recommends products using customer profile and transaction data, but it may still give rise to scalability problem. Thus, this study uses cluster analysis to reduce search space prior to CBR for scalability Problem. For cluster analysis, this study employs hybrid genetic and K-Means algorithms to avoid possibility of convergence in local minima of typical cluster analyses. This study also develops a Web-based prototype system to test the superiority of the proposed model.

  • PDF

An Agent-based Approach for Distributed Collaborative Filtering (분산 협력 필터링에 대한 에이전트 기반 접근 방법)

  • Kim, Byeong-Man;Li, Qing;Howe Adele E.;Yeo, Dong-Gyu
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.11
    • /
    • pp.953-964
    • /
    • 2006
  • Due to the usefulness of the collaborative filtering, it has been widely used in both the research and commercial field. However, there are still some challenges for it to be more efficient, especially the scalability problem, the sparsity problem and the cold start problem. In this paper. we address these problems and provide a novel distributed approach based on agents collaboration for the problems. We have tried to solve the scalability problem by making each agent save its users ratings and broadcast them to the users friends so that only friends ratings and his own ratings are kept in an agents local database. To reduce quality degradation of recommendation caused by the lack of rating data, we introduce a method using friends opinions instead of real rating data when they are not available. We also suggest a collaborative filtering algorithm based on user profile to provide new users with recommendation service. Experiments show that our suggested approach is helpful to the new user problem as well as is more scalable than traditional centralized CF filtering systems and alleviate the sparsity problem.

A Movie Recommendation Method Using Rating Difference Between Items (항목 간 선호도 차이를 이용한 영화 추천 방법)

  • Oh, Se-Chang;Choi, Min
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.11
    • /
    • pp.2602-2608
    • /
    • 2013
  • User-based and item-based method have been developed as the solutions of the movie recommendation problem. However, these methods are faced with the sparsity problem and the problem of not reflecting user's rating respectively. In order to solve these problems, there is a research on the combination of the two methods using the concept of similarity. In reality, it is not free from the problem of sparsity, since it has a lot of parameters to be calculated. In this study, we propose a recommendation method using rating difference between items in order to complement this problem. This method is relatively free from the problem of sparsity, since it has less parameters to be calculated. And it can get more accurate results by reflecting the users rating to calculate the parameters. In experiments for the proposed method, the initial error is large, but the performance has been quickly stabilized after. In addition, it showed a 0.0538 lower average error compared to the existing method using similarity.

Recommendations Based on Listwise Learning-to-Rank by Incorporating Social Information

  • Fang, Chen;Zhang, Hengwei;Zhang, Ming;Wang, Jindong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.1
    • /
    • pp.109-134
    • /
    • 2018
  • Collaborative Filtering (CF) is widely used in recommendation field, which can be divided into rating-based CF and learning-to-rank based CF. Although many methods have been proposed based on these two kinds of CF, there still be room for improvement. Firstly, the data sparsity problem still remains a big challenge for CF algorithms. Secondly, the malicious rating given by some illegal users may affect the recommendation accuracy. Existing CF algorithms seldom took both of the two observations into consideration. In this paper, we propose a recommendation method based on listwise learning-to-rank by incorporating users' social information. By taking both ratings and order of items into consideration, the Plackett-Luce model is presented to find more accurate similar users. In order to alleviate the data sparsity problem, the improved matrix factorization model by integrating the influence of similar users is proposed to predict the rating. On the basis of exploring the trust relationship between users according to their social information, a listwise learning-to-rank algorithm is proposed to learn an optimal ranking model, which can output the recommendation list more consistent with the user preference. Comprehensive experiments conducted on two public real-world datasets show that our approach not only achieves high recommendation accuracy in relatively short runtime, but also is able to reduce the impact of malicious ratings.