• Title/Summary/Keyword: Data sparsity

Search Result 172, Processing Time 0.024 seconds

Data Sparsity and Performance in Collaborative Filtering-based Recommendation

  • Kim Jong-Woo;Lee Hong-Joo
    • Management Science and Financial Engineering
    • /
    • v.11 no.3
    • /
    • pp.19-45
    • /
    • 2005
  • Collaborative filtering is one of the most common methods that e-commerce sites and Internet information services use to personalize recommendations. Collaborative filtering has the advantage of being able to use even sparse evaluation data to predict preference scores for new products. To date, however, no in-depth investigation has been conducted on how the data sparsity effect in customers' evaluation data affects collaborative filtering-based recommendation performance. In this study, we analyzed the sparsity effect and used a hybrid method based on customers' evaluations and purchases collected from an online bookstore. Results indicated that recommendation performance decreased monotonically as sparsity increased, and that performance was more sensitive to sparsity in evaluation data rather than in purchase data. Results also indicated that the hybrid use of two different types of data (customers' evaluations and purchases) helped to improve the recommendation performance when evaluation data were highly sparse.

Sparsity Effect on Collaborative Filtering-based Personalized Recommendation (협업 필터링 기반 개인화 추천에서의 평가자료의 희소 정도의 영향)

  • Kim, Jong-Woo;Bae, Se-Jin;Lee, Hong-Joo
    • Asia pacific journal of information systems
    • /
    • v.14 no.2
    • /
    • pp.131-149
    • /
    • 2004
  • Collaborative filtering is one of popular techniques for personalized recommendation in e-commerce sites. An advantage of collaborative filtering is that the technique can work with sparse evaluation data to predict preference scores of new alternative contents or advertisements. There is, however, no in-depth study about the sparsity effect of customer's evaluation data to the performance of recommendation. In this study, we investigate the sparsity effect and hybrid usages of customers' evaluation data and purchase data using an experiment result. The result of the analysis shows that the performance of recommendation decreases monotonically as the sparsity increases, and also the hybrid usage of two different types of data; customers' evaluation data and purchase data helps to increase the performance of recommendation in sparsity situation.

The Effect of Data Sparsity on Prediction Accuracy in Recommender System (추천시스템의 희소성이 예측 정확도에 미치는 영향에 관한 연구)

  • Kim, Sun-Ok;Lee, Seok-Jun
    • Journal of Internet Computing and Services
    • /
    • v.8 no.6
    • /
    • pp.95-102
    • /
    • 2007
  • Recommender System based on the Collaborative Filtering has a problem of trust of the prediction accuracy because of its problem of sparsity. If the sparsity of a preference value is large, it causes a problem on a process of a choice of neighbors and also lowers the prediction accuracy. In this article, a change of MAE based on the sparsity is studied, groups are classified by sparsity and then, the significant difference among MAEs of classified groups is analyzed. To improve the accuracy of prediction among groups by the problem of sparsity, We studied the improvement of an accurate prediction for recommending system through reducing sparsity by sorting sparsity items, and replacing the average preference among them that has a lot of respondents with the preference evaluation value.

  • PDF

Multiview-based Spectral Weighted and Low-Rank for Row-sparsity Hyperspectral Unmixing

  • Zhang, Shuaiyang;Hua, Wenshen;Liu, Jie;Li, Gang;Wang, Qianghui
    • Current Optics and Photonics
    • /
    • v.5 no.4
    • /
    • pp.431-443
    • /
    • 2021
  • Sparse unmixing has been proven to be an effective method for hyperspectral unmixing. Hyperspectral images contain rich spectral and spatial information. The means to make full use of spectral information, spatial information, and enhanced sparsity constraints are the main research directions to improve the accuracy of sparse unmixing. However, many algorithms only focus on one or two of these factors, because it is difficult to construct an unmixing model that considers all three factors. To address this issue, a novel algorithm called multiview-based spectral weighted and low-rank row-sparsity unmixing is proposed. A multiview data set is generated through spectral partitioning, and then spectral weighting is imposed on it to exploit the abundant spectral information. The row-sparsity approach, which controls the sparsity by the l2,0 norm, outperforms the single-sparsity approach in many scenarios. Many algorithms use convex relaxation methods to solve the l2,0 norm to avoid the NP-hard problem, but this will reduce sparsity and unmixing accuracy. In this paper, a row-hard-threshold function is introduced to solve the l2,0 norm directly, which guarantees the sparsity of the results. The high spatial correlation of hyperspectral images is associated with low column rank; therefore, the low-rank constraint is adopted to utilize spatial information. Experiments with simulated and real data prove that the proposed algorithm can obtain better unmixing results.

Adaptive Adjustment of Compressed Measurements for Wideband Spectrum Sensing

  • Gao, Yulong;Zhang, Wei;Ma, Yongkui
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.1
    • /
    • pp.58-78
    • /
    • 2016
  • Compressed sensing (CS) possesses the potential benefits for spectrum sensing of wideband signal in cognitive radio. The sparsity of signal in frequency domain denotes the number of occupied channels for spectrum sensing. This paper presents a scheme of adaptively adjusting the number of compressed measurements to reduce the unnecessary computational complexity when priori information about the sparsity of signal cannot be acquired. Firstly, a method of sparsity estimation is introduced because the sparsity of signal is not available in some cognitive radio environments, and the relationship between the amount of used data and estimation accuracy is discussed. Then the SNR of the compressed signal is derived in the closed form. Based on the SNR of the compressed signal and estimated sparsity, an adaptive algorithm of adjusting the number of compressed measurements is proposed. Finally, some simulations are performed, and the results illustrate that the simulations agree with theoretical analysis, which prove the effectiveness of the proposed adaptive adjusting of compressed measurements.

Method to Improve Data Sparsity Problem of Collaborative Filtering Using Latent Attribute Preference (잠재적 속성 선호도를 이용한 협업 필터링의 데이터 희소성 문제 개선 방법)

  • Kwon, Hyeong-Joon;Hong, Kwang-Seok
    • Journal of Internet Computing and Services
    • /
    • v.14 no.5
    • /
    • pp.59-67
    • /
    • 2013
  • In this paper, we propose the LAR_CF, latent attribute rating-based collaborative filtering, that is robust to data sparsity problem which is one of traditional problems caused of decreasing rating prediction accuracy. As compared with that existing collaborative filtering method uses a preference rating rated by users as feature vector to calculate similarity between objects, the proposed method improves data sparsity problem using unique attributes of two target objects with existing explicit preference. We consider MovieLens 100k dataset and its item attributes to evaluate the LAR_CF. As a result of artificial data sparsity and full-rating experiments, we confirmed that rating prediction accuracy can be improved rating prediction accuracy in data sparsity condition by the LAR_CF.

A Movie Recommendation System processing High-Dimensional Data with Fuzzy-AHP and Fuzzy Association Rules (퍼지 AHP와 퍼지 연관규칙을 이용하여 고차원 데이터를 처리하는 영화 추천 시스템)

  • Oh, Jae-Taek;Lee, Sang-Yong
    • Journal of Digital Convergence
    • /
    • v.17 no.2
    • /
    • pp.347-353
    • /
    • 2019
  • Recent recommendation systems are developing toward the utilization of high-dimensional data. However, high-dimensional data can increase algorithm complexity by expanding dimensions and be lower the accuracy of recommended items. In addition, it can cause the problem of data sparsity and make it difficult to provide users with proper recommended items. This study proposed an algorithm that classify users' subjective data with objective criteria with fuzzy-AHP and make use of rules with repetitive patterns through fuzzy association rules. Trying to check how problems with high-dimensional data would be mitigated by the algorithm, we performed 5-fold cross validation according to the changing number of users. The results show that the algorithm-applied system recorded accuracy that was 12.5% higher than that of the fuzzy-AHP-applied system and mitigated the problem of data sparsity.

EMPIRICAL BAYES THRESHOLDING: ADAPTING TO SPARSITY WHEN IT ADVANTAGEOUS TO DO SO

  • Silverman Bernard W.
    • Journal of the Korean Statistical Society
    • /
    • v.36 no.1
    • /
    • pp.1-29
    • /
    • 2007
  • Suppose one is trying to estimate a high dimensional vector of parameters from a series of one observation per parameter. Often, it is possible to take advantage of sparsity in the parameters by thresholding the data in an appropriate way. A marginal maximum likelihood approach, within a suitable Bayesian structure, has excellent properties. For very sparse signals, the procedure chooses a large threshold and takes advantage of the sparsity, while for signals where there are many non-zero values, the method does not perform excessive smoothing. The scope of the method is reviewed and demonstrated, and various theoretical, practical and computational issues are discussed, in particularly exploring the wide potential and applicability of the general approach, and the way it can be used within more complex thresholding problems such as curve estimation using wavelets.

Power Failure Sensitivity Analysis via Grouped L1/2 Sparsity Constrained Logistic Regression

  • Li, Baoshu;Zhou, Xin;Dong, Ping
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.8
    • /
    • pp.3086-3101
    • /
    • 2021
  • To supply precise marketing and differentiated service for the electric power service department, it is very important to predict the customers with high sensitivity of electric power failure. To solve this problem, we propose a novel grouped 𝑙1/2 sparsity constrained logistic regression method for sensitivity assessment of electric power failure. Different from the 𝑙1 norm and k-support norm, the proposed grouped 𝑙1/2 sparsity constrained logistic regression method simultaneously imposes the inter-class information and tighter approximation to the nonconvex 𝑙0 sparsity to exploit multiple correlated attributions for prediction. Firstly, the attributes or factors for predicting the customer sensitivity of power failure are selected from customer sheets, such as customer information, electric consuming information, electrical bill, 95598 work sheet, power failure events, etc. Secondly, all these samples with attributes are clustered into several categories, and samples in the same category are assumed to be sharing similar properties. Then, 𝑙1/2 norm constrained logistic regression model is built to predict the customer's sensitivity of power failure. Alternating direction of multipliers (ADMM) algorithm is finally employed to solve the problem by splitting it into several sub-problems effectively. Experimental results on power electrical dataset with about one million customer data from a province validate that the proposed method has a good prediction accuracy.

Feature-selection algorithm based on genetic algorithms using unstructured data for attack mail identification (공격 메일 식별을 위한 비정형 데이터를 사용한 유전자 알고리즘 기반의 특징선택 알고리즘)

  • Hong, Sung-Sam;Kim, Dong-Wook;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.20 no.1
    • /
    • pp.1-10
    • /
    • 2019
  • Since big-data text mining extracts many features and data, clustering and classification can result in high computational complexity and low reliability of the analysis results. In particular, a term document matrix obtained through text mining represents term-document features, but produces a sparse matrix. We designed an advanced genetic algorithm (GA) to extract features in text mining for detection model. Term frequency inverse document frequency (TF-IDF) is used to reflect the document-term relationships in feature extraction. Through a repetitive process, a predetermined number of features are selected. And, we used the sparsity score to improve the performance of detection model. If a spam mail data set has the high sparsity, detection model have low performance and is difficult to search the optimization detection model. In addition, we find a low sparsity model that have also high TF-IDF score by using s(F) where the numerator in fitness function. We also verified its performance by applying the proposed algorithm to text classification. As a result, we have found that our algorithm shows higher performance (speed and accuracy) in attack mail classification.