DOI QR코드

DOI QR Code

A Robust Bayesian Probabilistic Matrix Factorization Model for Collaborative Filtering Recommender Systems Based on User Anomaly Rating Behavior Detection

  • Yu, Hongtao (School of Information Science and Engineering, Yanshan University) ;
  • Sun, Lijun (School of Information Science and Engineering, Yanshan University) ;
  • Zhang, Fuzhi (School of Information Science and Engineering, Yanshan University)
  • Received : 2018.09.17
  • Accepted : 2019.03.06
  • Published : 2019.09.30

Abstract

Collaborative filtering recommender systems are vulnerable to shilling attacks in which malicious users may inject biased profiles to promote or demote a particular item being recommended. To tackle this problem, many robust collaborative recommendation methods have been presented. Unfortunately, the robustness of most methods is improved at the expense of prediction accuracy. In this paper, we construct a robust Bayesian probabilistic matrix factorization model for collaborative filtering recommender systems by incorporating the detection of user anomaly rating behaviors. We first detect the anomaly rating behaviors of users by the modified K-means algorithm and target item identification method to generate an indicator matrix of attack users. Then we incorporate the indicator matrix of attack users to construct a robust Bayesian probabilistic matrix factorization model and based on which a robust collaborative recommendation algorithm is devised. The experimental results on the MovieLens and Netflix datasets show that our model can significantly improve the robustness and recommendation accuracy compared with three baseline methods.

Keywords

1. Introduction

Nowadays, recommender systems have been applied to solve the information overload problem in many areas such as online product recommendations in e-commerce websites [1], Web-page recommendations in intelligent Web systems [2], POI (Point of Interest) recommendations in location-based social networks [3], and cloud service recommendations in cloud computing market [4]. Collaborative filtering (CF) [5] is a commonly-used technique in recommender systems, which has been widely used in e-commerce sites such as Amazon and eBay. CF methods are categorized as memory- and model-based methods [6]. Memory-based methods include user- and item-based approaches, which make recommendations based on similarity between users or items. Model-based methods first train a model using the known ratings of users, then exploit the model to predict ratings for unrated items.

Due to the open nature of CF-based systems, however, malicious users may bias the output of such systems by injecting fake profiles. This behavior has been known as shilling attacks or profile injection attacks [7], [8]. The fake profiles are called attack profiles or shilling profiles. Depending on the purpose of attacks, shilling attacks can be categorized into either push attacks (i.e., attacks that are designed to increase the probability of an item being recommended) or nuke attacks (i.e., attacks that are designed to decrease the probability of an item being recommended) [9]. The well-studied shilling attacks include random attack, average attack, and AoP (Average over Popular items) attack, etc [7], [8], [10]. These attacks present a challenge to the credibility of CF-based systems. Therefore, how to guarantee the credibility of CF-based systems has become a problem that cannot be ignored.

To reduce the impact of shilling attacks, many methods for detecting such attacks have been presented [11-16]. These detection methods mainly adopt binary classification to spot and filter shilling profiles, which are easy to filter out genuine profiles. An alternative way is to improve the robustness of recommendation algorithms. Robustness is the ability of a recommender system to make stable recommendations when its rating database is contaminated with noise data or malicious ratings [17], which has been investigated in the context of shilling attacks. Although a variety of robust CF algorithms have been presented, most of them improve robustness at the cost of decreasing accuracy. This is because the existing matrix factorization based robust recommendation methods need to discard outliers in parameter estimation, which may lead to the rejection of some genuine users' ratings, thus resulting in loss of accuracy.

To address these problems, we present a robust Bayesian probabilistic matrix factorization (BPMF) model for CF systems based on user anomaly rating behavior detection. Particularly, we first use clustering technique and target item identification method to detect the anomaly rating users, then we combine the detection results with BPMF model to construct a robust CF model and devise a robust CF algorithm to make recommendations.

The contributions of this paper are summarized below:

1) We use the clustering algorithm and target item identification method to detect user anomaly rating behaviors, and based on which an indicator matrix of attack users is generated.

2) We present a robust BPMF model by incorporating the indicator matrix of attack users and based on it a robust CF algorithm is devised.

3) We carry out experiments on different datasets and compare our model with other approaches.

The rest of paper is organized as follows. Section 2 briefly introduces the research on robust recommendation algorithms. Section 3 describes the proposed model in detail. Experimental results are presented in Section 4. The conclusion and future work are given in Section 5.

 

2. Related Work

Research on robust recommendation methods has been conducted over the past decade and has achieved considerable results. Mehta et al. [18] proposed a M-estimator based matrix factorization algorithm (MMF). MMF can find outliers by monitoring whether the residual is in a certain range. Nevertheless, the reported results show that MMF is only effective for small-scale attacks. Cheng and Hurley [19] proposed a least trimmed squares estimator based matrix factorization algorithm (LTSMF), which performed better than MMF. However, the accuracy of LTSMF algorithm is limited because some genuine users’ ratings with the largest residuals are also discarded. Mehta and Nejdl [20] proposed a robust recommendation algorithm, i.e., VarSelect SVD. VarSelect SVD has been proved to be robust against shilling attacks, but it needs a prior knowledge of attack size. In [21], a robust recommendation method was proposed. This method first uses the relevance vector machine classifier to measure suspicious users, then mines the implicit trust between users according to their ratings, and incorporates the results of measurement to build a multidimensional trust model. By combining the trust model, neighbor model, and matrix factorization, a robust recommendation algorithm is finally developed. In [22], a robust CF method was proposed. It uses R1-norm to build a robust non-negative MF model, and based on it a robust CF algorithm is developed to make recommendations.

Probabilistic matrix factorization (PMF) [23] is a special MF, which is applicable to large and sparse datasets. Liu et al. [24] presented a new PMF model, which improved prediction accuracy by combining user relations with rating matrix. Nevertheless, the prediction of this model is easily affected by malicious ratings. In [25], a BPMF-based recommendation model was proposed. This model introduces the prior distribution over the hyper-parameters on the basis of PMF to avoid over-fitting. It proves that BPMF is better than PMF in accuracy. In [26 ], a robust recommendation model is proposed, which improved the prediction accuracy androbustness by using the long tail distribution or excluding attack users. Li et al. [27] presented a metadata-enhanced variational Bayesian MF model for robust recommendation. It fuses the BPMF model with metadata, which can weaken the effect of malicious users on the model's posterior, and thus guarantees the robustness of algorithm.

In this work, we aim to build a robust CF model with strong attack-resistant capability and high recommendation accuracy. Unlike the methods in [18,19,22] that use the robust estimators or R1-norm to limit the impact of ratings with the largest residuals on the recommendation models, which are easy to discard genuine users' ratings with maxinum residuals, our model only filters ratings on the target item based on the detection results of anomaly rating users. Different from the approach in [20], our model does not require a prior knowledge of attack size. Unlike the approach in [21], our model uses an unsupervised clustering algorithm and target item identification method to detect anomaly rating users, which does not need to train the classification model.

 

3. The Proposed Model

In this section, we first propose an approach for detecting anomaly rating users based on the modified K-means algorithm and target item identification method, then we combine the detection results with Bayesian probabilistic matrix factorization model to build a robust CF model and devise the corresponding robust CF algorithm which is called RBPMF-CF.

 

3.1 Detecting Anomaly Rating Behavior of Users

In the context of shilling attacks, the attack users usually give the highest rating for the item that they want to promote. This means the attack users generally have greater residual (i.e., the difference between a user’s real and predicted ratings) than that of genuine users. To illustrate the characteristic of ratings for the attack users, we randomly choose 144 genuine users from the Movielens 100K dataset, and inject attack profiles generated by average attack, random attack, and AoP attack, respectively. These attacks are all push attacks, the filler size and attack size are set to 3%, respectively. Based on these profiles, the mean residuals of genuine and attack users are calculated, respectively. Fig. 1 depicts the mean residuals of 228 users which include 144 genuine users, 28 AoP attack users, 28 average attack users, and 28 random attack users.

As shown in Fig. 1, the mean residuals of attackers are greater than those of genuine users, which means the ratings given by the attackers are generally greater than those of most genuine users. It can be seen from Fig. 1, some mean residuals of AoP attackers are close to those of most genuine users. This is because AoP attack is the obfuscated form of average attack. Unlike average attack profiles, AoP attack profiles use a certain percentage of popular items as filler items, which makes them look like genuine ones.

Due to the high similarity between attackers, we utilize a modified K-means algorithm to cluster anomaly rating users and further spot the attackers by the target item identification method.

 

Fig. 1. The mean residuals of user ratings

 

3.1.1 Clustering Anomaly Rating Users

K-means is a conventional clustering algorithm, which splits a dataset samples into K clusters [28], [29]. This algorithm usually uses Euclidean distance as the metric to measure the similarity between two samples. The samples with high similarity are grouped into the same cluster, and the samples between clusters have low similarity. However, the K-means algorithm with Euclidean distance-based similarity measurement cannot well separate the attack users from genuine ones due to the high similarity between them. Hence, we present a similarity measurement metric to modify K-means algorithm in order to better group the attack users.

Definition 1 popularity degree of item (PDI). The popularity degree of item i, PDI(i) is denoted by

 

\(P D I(i)=\sum_{u \in K_{u}} \Gamma\left(r_{u i}\right)\)       (1)

\(\Gamma\left(r_{u i}\right)=\left\{\begin{array}{l}{1, r_{u i} \neq \varnothing} \\{0, \quad r_{u i}=\varnothing}\end{array}\right.\)       (2)

 

where \(r_{ui}\) denotes the rating of user u for item \(i\), \(K_u\) denotes the set of users, \(r_{ui}\)≠∅ represents that user u rates on item \(i\), \(r_{ui}\) =∅ represents that user u does not rate on item \(i\).

Fig. 2 illustrates the difference of PDI between genuine and attack users. In Fig. 2, genuine users tend to rate the popular items, and the attack users rate all items with equal probability.

 

Fig. 2. The difference of PDI between genuine and attack users. Note that abscissa represents the sequence numbers of items after sorting on PDI in descending order

Definition 2 average rating popularity degree of user (APDU). The average rating popularity degree of user u, APDU(u), is defined as below:

 

\(\operatorname{APDU}(u)=\frac{\sum_{i \in l_{u}} P D I(i)}{\left|I_{u}\right|}\)       (3)

 

where \(I_u\) denotes the set of items rated by user \(u\)\(\left| I_u \right|\) is the number of items in set \(I_u\).

Fig. 3 illustrates the average rating popularity degree of genuine and attack users. In Fig. 3, the APDU of attackers is less than that of the majority of genuine users. Therefore, it can be used as the basis of cluster center selection and similarity measurement.

 

Fig. 3. APDU of users

Definition 3 the distance between users (DIST). The distance between users u and v is defined as follows:

 

\(DIST(u,v)=\left| APDU(u)-APDU(v) \right|\)       (4)

 

where \(APDU(u)\) and \(APDU(v)\) are average rating popularity degrees of users \(u \)and \(v\), respectively. The greater the \(DIST(u,v)\), the larger the difference between users \(u \) and \(v\).

\(DIST(u,v)\) has two properties. One is symmetry, \(i.e.\), \(DIST(u,v)=DIST(v,u)\). The other is non-negativity, i.e.,\( DIST(u, v)\ge0.\) Therefore, it can be used for measuring the difference between users.

To better group attack users, we utilize DIST(u,v) as similarity measurement metric of K-means algorithm and adjust the selection strategy of its cluster centers. The main steps for clustering anomaly rating users are as follows:

Step 1. Initialize the cluster centers k1 and k2 by calculating \(\operatorname{mean}_{u \in \kappa_{u}} A P D U(u)\) and \(\min _{u \in K_{u}} A P D U(u)\) in user rating database.

Step 2. For any user \(u \in k_u\) , compute the distance between u and cluster centers by Eq. (4), and assign u to the nearest cluster.

Step 3. Update the cluster centers k1 and k2 by calculating the mean and minimum value of APDU in each cluster.

Step 4. Repeat steps 2 and 3 until k1 and k2 are no longer change, and obtain the cluster of anomaly rating users.

According to the above steps, we design an algorithm to cluster anomaly rating users.

Algorithm 1 clustering anomaly rating users

Input: user rating matrix \(R\)

Output: suspicious users cluster \(C_s\)

1: \(S_{1} \leftarrow \varnothing, S_{2} \leftarrow \varnothing, \text { sum }_{1} \leftarrow 0, \text { sum }_{2} \leftarrow 0\) 

2: for each item \(i \in I \) do

3: \(P D I(i)=\sum_{u \in K_{u}} \Gamma\left(r_{u i}\right)\)

4: end for

5: for each user \(u \in k_u\) do

6: \(A P D U(u)=\sum_{i \in I_{u}} P D I(i) /\left|I_{u}\right|\)

7: end for

8:\(f_{1}=\operatorname{mean}(A P D U), f_{2}=\min (A P D U) / * f_{1}, f_{2}\) denote the center of clusters \(S_1, S_2\), respectively \(*/\)

9: Initialize the center of clusters \(S_1,S_2\) with \(f_1,f_2\)

10: repeat

11: \(k_{1} \leftarrow f_{1}, k_{2} \leftarrow f_{2}\)

12: for each user \(u \in k_u\) do

13: compute the distance between \(u\) and cluster center \(k_1\) by \(D I S T_{1}=\left|A P D U(u)-k_{1}\right|\)

14: compute the distance between \(u\) and cluster center \(k_2\) by \(D I S T_{2}=\left|A P D U(u)-k_{2}\right|\)

15: if \(DIST_1 < DIST_2\) then

16: \(S_{1} \leftarrow S_{1} \cup\{u\}\)

17: else

18: \(S_{2} \leftarrow S_{2} \cup\{u\}\)

19: end if

20: end for

21: for each user \(u \in S_1\) do

22: \(\operatorname{sum}_{1} \leftarrow \operatorname{sum}_{1}+D I S T_{1}, A P D U_{1}(u)=\sum_{i \in I_{u}} P D I(i) /\left| I_{u}\right|\)

23: end for

24: for each user \(u \in S_2\) do

25: \(\operatorname{sum}_{2} \leftarrow \operatorname{sum}_{2}+\operatorname{DIS} T_{2}, \operatorname{APD} U_{2}(u)=\sum_{i \in I_{u}} \operatorname{PDI}(i) /\left|I_{u}\right|\)

26: end for

27: if  \(\frac{s u m_{1}}{\left|s_{1}\right|} \geq \frac{s u m_{2}}{\left|s_{2}\right|}\) then

28: \(f_1=mean(APDU_1), f_2=min(APDU_2)\)

29: else

30: \(f_1=min(APDU_1), f_2=mean(APDU_2)\)

31: end if

32: until (\(k_1=f_1 \) and \(k_2=f_2\))

33: if \(\frac{s u m_{1}}{\left|s_{1}\right|} \geq \frac{s u m_{2}}{\left|s_{2}\right|}\) then

34: \(C_{\mathrm{s}} \leftarrow S_{2}\)

35: else

36: \(C_{\mathrm{s}} \leftarrow S_{1}\)

37: end if

38: return \(C_s\)

In Algorithm 1, Lines 1-32 are the modified K-means algorithm with new similarity measurement metric and new selection strategy of cluster centers. Lines 33-38 judge the cluster of anomaly rating users and return it.

 

The time complexity for algorithm 1 is analyzed below. The time complexity for Line 1, Lines 2-4, Lines 5-7, Lines 8-9, Lines 10-32, and Lines 33-38 is \(O(1), O\left(|I| \times\left|\kappa_{u}\right|\right)\)\(O\left(\left|I_{\max }\right| \times\left|\kappa_{u}\right|\right), O\left(\left|\kappa_{u}\right|\right)\) \(repeat\_times\times\left[O(1)+O\left(\left|\kappa_{u}\right|\right)+O\left(\left|S_{1}\right| \times\left|I_{\max }\right|\right) +O\left(\left|S_{2}\right| \times\left|I_{\max }\right|\right) +\right.\left.O\left(\left|S_{1}\right|\right) +O\left(\left|S_{2}\right|\right)\right], \text { and } O(1)\) , and O(1) , respectively. Since \(\left|S_{1}\right|<\left|\kappa_{u}\right|,\left|S_{2}\right|<\left|\kappa_{u}\right|,\left|I_{\max }\right|<|I|\) , and \(repeat\_times\) is far less than \(\left|\kappa_{u}\right|\) , the time complexity for Lines 10-32 is at most \(O\left(\left|\kappa_{u}\right| \times|I|\right)\) . Thus, the time complexity of Algorithm 1 is \(O(1)+O\left(I|\times| \kappa_{u} |\right)+O\left(I_{\max }|\mathbf{x}| \kappa_{u} |\right)+O\left(\left|\kappa_{u}\right|\right)+O\left(\left|\kappa_{u}\right| \times|I|\right)+O(1) \approx O\left(\left|\kappa_{u}\right| \times|I|\right)\).

The space complexity analysis for Algorithm 1 is as follows. The space complexity forstoring the input data (i.e., the user rating matrix R) and output data (i.e., the suspicious userscluster Cs) is at most \(O\left(\left|\kappa_{u}\right| \times|I|\right)+O\left(\left|\kappa_{u}\right|\right)\) . The space complexity for storing the array variables PDI and APDU, set variables I and κu, set variables S1 and S2, and other variablessuch as f1, f2, k1, and k2 is \(O(|I|)+O\left(\left|\kappa_{u}\right|\right), O(| I|)+O\left(\left|\kappa_{u}\right|\right), O\left(\left|\kappa_{u}\right|\right)\) , and O(1) , respectively. Thus, the space complexity of Algorithm 1 is \(O\left(\left|\kappa_{u}\right| \times|I|\right)+O\left(\left|\kappa_{u}\right|\right)+2 \times\left[O(|I|)+O\left(\left|\kappa_{u}\right|\right)\right]+O\left(\left|\kappa_{u}\right|\right)+O(1) \approx O\left(\left|\kappa_{u}\right| \times|I|\right)\).

 

3.1.2 Identifying The Attack Users

As the cluster of anomaly rating users obtained by Algorithm 1 may contain genuine users, we need to further identify the attack users in this cluster.

 

Firstly, we seek the attacked item from the user rating database, which can be done by calculating the PDI of items in the rating database. For the case of push attacks, the attacked item is usually selected from unpopular items. The set of unpopular items is defined as follows: 

\(U I=\left\{i | i \in I, P D I(i)<\frac{\sum_{j \in I} P D I(j)}{|I|}\right\}\)       (5)

 

where \(I\) denotes the set of items.

Taking into account the ratings given by users in the cluster of anomaly rating users, we recalculate PDI of each item in the set of unpopular items, the attacked item is the item with the largest PDI in this set.

Secondly, we further identify attackers in the cluster of anomaly rating users according to the attacked item. If a user u in the cluster of anomaly rating users rates the attacked item, then the user u is viewed as an attack user and the flag of user u is set to 1.

According to the above analysis, we present an algorithm to further identify the attackers.

 

Algorithm 2 identifying the attackers

20200402_094700.png 이미지20200402_094745.png 이미지

In Algorithm 2, Lines 1-11 obtain the set of unpopular items by calculating PDI of items in the rating database according to Eq. (5). Lines 12-13 obtain the cluster of anomaly rating users by calling Algorithm 1 and get the attacked item by function getAttackedItem\((UI,C_s)\). Lines 14-21 obtain the indicator matrix of attackers.

The time complexity for Algorithm 2 is analyzed below. The time complexity for Lines 1-11, Lines 12-13, and Lines 14-21 is \(O\left(|I| \times\left|\kappa_{u}\right|\right), O\left(\left|\kappa_{u}\right| \times|I|\right)\), and \(O(|\kappa_u|)\), respectively.

Thus, the time complexity for Algorithm 2 is \(O(|\kappa_u|\times |I|)\) .

The space complexity analysis for Algorithm 2 is as follows. The space complexity for storing the input data (i.e., the user rating matrix R) and output data (i.e., the indicator matrix of attackers Z) is \(O\left(\left|\kappa_{u}\right| \times|I|\right)+O\left(\left|\kappa_{u}\right|\right)\) . The space complexity for Lines 1-11, Lines 12-13, and Lines 14-21 is \(3 \times O(I |)+O\left(\left|\kappa_{u}\right|\right), O\left(\left|\kappa_{u}\right| \times|I|\right)\), and \(O(|\kappa_u|)\) , respectively. Therefore, The space complexity for Algorithm 2 is \(O(|\kappa_u|\times |I|)\).

 

3.2 Robust BPMF Model for Recommendation

In MF model [30], the user rating matrix \(R \in \mathfrak{R}^{m \times n}\) is decomposed into two low-rank matrices \(U=\left(U_{1}, U_{2}, \ldots, U_{m}\right) \in \mathfrak{R}^{m \times d} \text { and } V=\left(V_{1}, V_{2}, \ldots, V_{n}\right) \in \mathfrak{R}^{n \times d}\) which are called the user and item feature matrices, where Ui and Vj are d-dimensional user and item feature vectors, m and n are the number of users and items, d is the feature dimension. The expression is below:

 

\(R \approx U^{*} V^{\mathrm{T}}\)      (6)

 

PMF is a special matrix factorization of analyzing low dimensional factorization from the view of statistics [23], which supposes the user ratings, item and user feature vectors obey Russian distribution. Compared with the traditional matrix factorization (such as SVD), PMF is easier to deal with big data and sparse data. Because it no longer looks for the optimal low rank, but rebuilds the model to train the user and item feature vectors from the angle of probability.

BPMF is a full Bayesian treatment of PMF model by integrating all model parameters and hyper-parameters to avoid tuning parameters [24], [25]. The conditional probability over the observed ratings is given by:

 

\(p\left(R | U, V, \sigma^{2}\right)=\prod_{i=1}^{m} \prod_{j=1}^{n}\left[N\left(R_{i j} | U_{i} V_{j}^{\mathrm{T}}, \sigma^{2}\right)\right]^{I_{ij}}\)       (7)

 

where \(N(x|\mu,\sigma^2)\) is the probability density function of Gaussian distribution with mean µ and variance σ2 , Rij is the rating of user i for item j, Iij is an indicator function that is 1 if user i rates on item j and 0, otherwise. The prior distributions over U and V are given by:

 

\(p\left(U | \mu_{U}, \Lambda_{U}\right)=\prod_{i=1}^{m} N\left(U_{i} | \mu_{U}, \Lambda_{U}^{-1}\right)\)       (8)

\(p\left(V | \mu_{V}, \Lambda_{V}\right)=\prod_{j=1}^{n} N\left(V_{j} | \mu_{V}, \Lambda_{V}^{-1}\right)\)       (9)

 

Bayesian probabilistic matrix factorization supposes the user hyper-parameters \(\Theta_U=\{\mu_U,\Lambda_U\}\) and item hyper-parameters \(\Theta_V=\{\mu_V,\Lambda_V\}\) obey the Gaussian-Wishart distribution. The prior distributions of \(\Theta_U\) and \(\Theta_V\) are given by:

 

\(p\left(\Theta_{U} | \Theta_{0}\right)=p\left(\mu_{U} | \Lambda_{U}\right) p\left(\Lambda_{U}\right)=N\left(\mu_{U} | \mu_{0},\left(\beta_{0} \Lambda_{U}\right)^{-1}\right) \mathcal{W}\left(\Lambda_{U} | W_{0}, v_{0}\right)\)       (10)

\(p\left(\Theta_{V} | \Theta_{0}\right)=p\left(\mu_{V} | \Lambda_{V}\right) p\left(\Lambda_{V}\right)=N\left(\mu_{V} | \mu_{0},\left(\beta_{0} \Lambda_{V}\right)^{-1}\right) \mathcal{W}\left(\Lambda_{V} | W_{0}, v_{0}\right)\)       (11)

 

where \(W(x|W_0,v_0)\) is the Wishart distribution with \(v_0\) degrees of freedom and a d x d scalematrix \(W_{0}, \quad \Theta_{0}=\left\{\mu_{0}, v_{0}, W_{0}\right\}\) .

To develop attack-resistant CF recommendation algorithms, we construct a robust CF model by incorporating the detection results of user anomaly rating behaviors into the BPMF model. The reason for adopting BPMF model is that this model itself has better accuracy and robustness than other models. Particularly, we incorporate the indicator matrix of attackers Zobtained by Algorithm 2 into the prior distribution of item feature matrix V to decrease the negative impact of attackers. At the same time, we reserve the ratings of attackers on the unattacked items to alleviate the data sparsity, which is helpful to improve the recommendation accuracy.

Depending on whether or not the item is an attacked target item, the following two cases should be taken into account.

For the attacked item, we suppose the prior distribution of user ratings as follows:

 

\(p\left(R | U, V, \sigma^{2}\right)=\prod_{i=1}^{m} \prod_{j=1}^{n}\left[N\left(R_{i j} | U_{i} V_{i}^{\mathrm{T}}, \sigma^{2}\right)\right]^{l_{ij}\left(1-Z_{i}\right)}\)       (12)

 

For the unattacked items, we use Eq. (7) as the prior distribution of user ratings. That is to say, we ignore the impact of attackers on the unattacked items.

Based on the constructed robust BPMF model, we design a robust CF algorithm, namely RBPMF-CF, which is described below:

Algorithm 3 RBPMF-CF

20200402_095803.png 이미지20200402_095837.png 이미지

In Algorithm 3, Lines 1-6 initialize the feature matrices U, V and sample the user features. Lines 7-14 sample the item features. If the sampled item is an attacked item, then we excludethe ratings of attackers on the attacked item. Line 15 is to return the feature matrices U, V.

The time complexity for Algorithm 3 is analyzed below. The time complexity for Line 1, Lines 2-14, and Line 15 is \(O\left(d \times\left|\kappa_{u}\right|\right)+O(d \times|I|), \operatorname{loop} \times\left[O\left(\left|\kappa_{u}\right|\right)+O(|I|)\right]\) and \(O(1)\) , respectively. Since d and loop are far less than \(|\kappa_u|\) and \(|I|\) , the time complexity for Algorithm 3 is \(O(|\kappa_u|+|I|)\) .

The space complexity analysis for Algorithm 3 is as follows. The space complexity forstoring the input data and output data is \(O\left(\left|\kappa_{u}\right| \times|I|\right)+O(1)+O\left(\left|\kappa_{u}\right|\right)\) and \(O\left(d \times\left|\kappa_{u}\right|\right)+O(d \times|I|)\) , respectively. The space complexity for storing the set variables \(I\), \(\kappa_u\) and other variables such as \(k\), \(u\), and \(i\) is\(O(|I|)+O\left(\left|\kappa_{u}\right|\right)\) and \(O(1)\), respectively. Therefore, the space complexity for Algorithm 3 is \(O(|\kappa_u|\times|I|)\) .

 

4. Experimental Evaluation

 

4.1 Experimental Data and Settings

We use the following datasets to evaluate our RBPMF-CF algorithm.

(1) MovieLens 100K dataset.1[http://grouplens.org/datasets/movielens/100k/] It includes 100000 ratings from 943 users on 1682 movies. The ratings are all integer values between 1 and 5, where 1 denotes disliked and 5 denotes the most liked. The sparsity level of this dataset is 93.7%. We randomly extract 80% ratings from this dataset as training set and the remaining 20% are used as test set.

(2) Netflix dataset.2[It was constructed to support participants in the Netflix prize ( http://netflixprize.com )] It includes 103297638 ratings on 17770 movies by 480189 users and its sparsity level is 98.8%. All ratings are integer values between 1 and 5. We randomly extract 214690 ratings on 4000 movies by 2000 users as the sampled dataset whose sparsity level is 97.3%. The partition method of training and test sets for the sampled dataset is the same as that of MovieLens 100K dataset.

To evaluate the robustness of RBPMF-CF, the attack profiles generated by average attack, random attack, and AoP attack, respectively are injected into the training sets. We set the filler size to 3% and 5%, the attack size to 2%, 4% , 6%, 8%, and 10%, respectively. The target item is randomly chosen from unpopular items and all attack profiles are generated for push attacks.

In our experiments, we set the dimension of features d to 10 and the number of iteration loop to 50 for RBPMF-CF algorithm.

 

4.2 Evaluation Metrics

We use root mean squared error (RMSE) and prediction shift (PS) to measure the algorithm's performance.

RMSE is a metric to measure the algorithm's prediction accuracy and it is defined below[31]

 

\(\mathrm{RMSE}=\sqrt{\frac{\sum_{(u, d) e T}\left(r_{u i}-\hat{r}_{u i}\right)^{2}}{|T|}}\)       (13)

 

where \(T\) is the test set, \(r_{ui}\) and \(\hat{r}_{ui}\) are the real and predicted ratings of user \(u\) on item \(i\), respectively.

PS is a metric to measure the algorithm's robustness, which is defined below [10]

 

\(\mathrm{PS}=\frac{\sum_{u \in K_{v}}\left|\hat{r}_{u i}^{\prime}-\hat{r}_{u i}\right|}{\left|\boldsymbol{\kappa}_{u}\right|}\)        (14)

 

where \(|\kappa_u|\) is the number of users in the test set, \(\hat{r}_{ui}\) and \(\hat{r}'_{ui}\) are the predicted ratings of user u on item i before and after attacks, respectively.

 

4.3 Experimental Results and Analysis

To illustrate the superiority of RBPMF-CF, we compare it with three baseline algorithms.

(1) MMF: A robust MF algorithm based on M-estimator [18]. In our experiments, we set the feature dimension, the number of iterations, and the learning rate to 10, 25, and 0.01, respectively for MMF algorithm.

(2) LTSMF: A robust MF algorithm based on least trimmed square estimator [19]. In our experiments, we set the feature dimension, the number of iterations, and the learning rate to 10, 25, and 0.01, respectively for LTSMF algorithm.

(3) VarSelect SVD: A robust recommendation algorithm based on principal component analysis and singular value decomposition [20]. In our experiments, we set the feature dimension, the number of iterations, and the learning rate to 10, 100, and 0.01, respectively for VarSelect SVD algorithm.

 

4.3.1 Comparison of Performance on The MovieLens Dataset

Table 1 shows the comparison of performance for MMF, LTSMF, VarSelect SVD, and RBPMF-CF on the MovieLens dataset under various attacks at different attack sizes across different filler sizes.

 

As shown in Table 1, under three attacks, the RMSE of MMF and LTSMF is above 0.96, and the RMSE values of both algorithms fluctuate between 0.96 and 0.97 as the attack and 4696 Yu et al.: A Robust Bayesian Probabilistic Matrix Factorization Model for Collaborative Filtering filler sizes increase. On the whole, the accuracy of two algorithms is relatively close. The RMSE of VarSelect SVD is about 0.95. Nevertheless, the RMSE of VarSelect SVD under average and AoP attacks is almost above 0.95. Moreover, the majority RMSE values of VarSelect SVD under AoP attack are greater than those of it under average attack. The reason is that AoP attack profiles have very high similarity with genuine profiles so that parts of the mare viewed as genuine profiles. By contrast, the RMSE of RBPMF-CF under three attacks is reduced obviously, which is below 0.92. This indicates that the combination of attack detection and BPMF model can further improve the accuracy of algorithm. Furthermore, the RMSE of RBPMF-CF under three attacks has little change at various attack sizes and filler sizes, which means RBPMF-CF is stable. Thus, RBPMF-CF has better accuracy than MMF, LTSMF, and VarSelect SVD on the MovieLens dataset.

 

Table 1. Comparison of performance on the MovieLens dataset

 

It can be seen from Table 1, for the same filler size and attack size, the PS of RBPMF-CFunder three attacks is smaller than that of MMF, LTSMF, and VarSelect SVD, which means the robustness of RBPMF-CF is better than that of other algorithms. It can also be seen from Table 1, the PS of MMF, LTSMF and VarSelect SVD under AoP attack is slightly greater than that of them under average attack. The reason is that AoP attack profiles use a certain percentage of popular items as filler items, which makes them look like genuine ones. Due to the high similarity between them, some AoP attack profiles are regarded as genuine profiles by three algorithms, thus leading to a great prediction shift. However, the PS values of RBPMF-CF under AoP attack are basically consistent with those of it under average attack. This indicates that RBPMF-CF is still robust against AoP attack. Therefore, RBPMF-CF is more robust than MMF, LTSMF and VarSelect SVD on the MovieLens dataset.

 

4.3.2 Comparison of Performance on The Netflix Dataset

Table 2 shows the comparison of performance for MMF, LTSMF, VarSelect SVD, and RBPMF-CF on the Netflix dataset under various attacks at different attack sizes across different filler sizes.

 

Table 2. Comparison of performance on the Netflix dataset

As shown in Table 2, under random attack, the RMSE of RBPMF-CF is below 0.91, which is better than that MMF, LTSMF, and VarSelect SVD. Moreover, the change of its RMSE is relatively stable. This means that the increase of attack and filler sizes has little impact on the accuracy of RBPMF-CF. Under average attack, the RMSE of MMF and LTSMF is close to 0.97 and 0.98, respectively, which is slightly greater than that of them on the MovieLensdataset. This means the accuracy of two algorithms is affected to some extent by the sparsity of Netflix dataset. The RMSE of VarSelect SVD under average attack is about 0.95, which is better than that of MMF and LTSMF. The RMSE of RBPMF-CF under average attack is about 0.91, which is slightly smaller than that of it on the MovieLens dataset. This indicates that the sparsity of Netflix dataset has little impact on the accuracy of RBPMF-CF. For the case of AoPattack, the RMSE of MMF and LTSMF is above 0.97, the RMSE of VarSelect SVD is close to 0.96, which is slightly greater than that of them under average attack. This is because parts of AoP attack profiles are viewed as genuine ones due to their high similarity, resulting in a decline in accuracy for three algorithms. The RMSE of RBPMF-CF is about 0.91, which is better than that of baselines. Thus, RBPMF-CF also has better accuracy than MMF, LTSMF, and VarSelect SVD on the Netflix dataset.

It can be seen from Table 2, the PS values of RBPMF-CF under three attacks are smaller than those of MMF, LTSMF, and VarSelect SVD, which indicates that RBPMF-CF is more robust against three attacks than the baseline algorithms. For the case of AoP attack, the PS values of MMF, LTSMF, and VarSelect SVD are slightly greater than those of them under average attack. This is because parts of AoP attack profiles are regarded as genuine profiles by three algorithms. The PS values of RBPMF-CF under AoP attack are basically consistent with the results under average attack, which means RBPMF-CF is also robust against AoP attack. Therefore, the robustness of RBPMF-CF is also better than that of MMF, LTSMF, and VarSelect SVD under three attacks on the Netflix dataset.

 

4.3.3 The Effectiveness of Our Anomaly Rating Behavior Detection Method

To show the effectiveness of our anomaly rating behavior detection method (i.e., the first step of the proposed model, which is called ARBD for short), we conduct further experiments on two datasets from three aspects. Firstly, we compare ARBD with two existing shilling attack detection methods in the literature of recommender systems using precision and recall metrics. Secondly, we substitute ARBD with an existing detection method and perform the proposed Bayesian probabilistic matrix factorization model after removing the detected anomaly ratings. Thirdly, we combine ARBD with the basic matrix factorization (MF) model, which is denoted as ARBD+MF for convenience, and compare it with the basic MF.

(1) Comparison of detection performance. To show the performance of ARBD in detecting anomaly rating users, the precision and recall metrics are used for evaluating its performance, which are defined below:

 

\(\text { Precision }=\frac{T P}{T P+F P}\)       (15)

\(\text { Recall }=\frac{T P}{T P+F N}\)       (16)

 

where TP and FN are the number of attack profiles correctly identified and misclassified, respectively, FP is the number of genuine profiles misclassified.

To show the superiority of ARBD, we compare it with two baselines.

a) CBS (Catch the Black Sheep) [14]: An unsupervised approach for detecting shilling attacks, which needs labeling seed attack users. In the experiment, 20% attack users for each attack size are used as the seeds.

b) UD-HMM [16]: An unsupervised approach for detecting shilling attacks based on hidden Markov model and hierarchical clustering. In the experiment, the parameters Nand α are set to 5 and 0.7, respectively.

Tables 3 and 4 list the precision and recall of three methods with various attacks on the MovieLens and Netflix datasets, respectively.

 

Table 3. Precision and recall of three methods on the Movielens dataset

 

Table 4. Precision and recall of three methods on the Netflix dataset

As shown in Tables 3 and 4, CBS maintains high precision and recall in detecting attacks with large attack sizes, which indicates that CBS can correctly detect most attack profiles. However, it performs poorly when detecting attacks with small attack sizes. The reason is that the number of seed attack users for CBS is small at low attack sizes. As to UD-HMM, it can effectively detect random and average attacks with large attack sizes, but it performs poorly in detecting the two attacks with small attack sizes. When detecting AoP attack, the overall performance of UD-HMM is not very good because a number of genuine profiles are misidentified. All the precision and recall values of ARBD are 1, which indicates that ARBD can correctly detect attack profiles and none of them are misidentified. These results illustrate the effectiveness of ARBD in detecting attacks. Therefore, ARBD outperforms CBS and UD-HMM in detecting three attacks.

(2) Comparison of RMSE and PS by substituting ARBD with UD-HMM. To show the effectiveness of ARBD, we substitute ARBD with UD-HMM and combine UD-HMM with the proposed Bayesian probabilistic matrix factorization model (denoted as UD-HMM+BPMF for convenience) to make recommendations. Tables 5 and 6 list the RMSE and PS of UD-HMM+BPMF and RBPMF-CF on the MovieLens and Netflix datasets with various attacks, respectively.

As shown in Table 5, the RMSE of UD-HMM+BPMF under three attacks is between 0.9134 and 0.9164, the RMSE of RBPMF-CF under three attacks is between 0.9134 and 0.9156. These results indicate that there is little difference between UD-HMM+BPMF and RBPMF-CF in prediction accuracy on the MovieLens dataset. The PS of UD-HMM+BPMFunder three attacks is between 0.3019 and 0.4746, the PS of RBPMF-CF under three attacks is between 0.3163 and 0.4216. These results illustrate that there is no big difference between UD-HMM+BPMF and RBPMF-CF in prediction shift on the MovieLens dataset.

 

Table 5. Comparison of RMSE and PS for UD-HMM+BPMF and RBPMF-CF on the MovieLens dataset

 

Table 6. Comparison of RMSE and PS for UD-HMM+BPMF and RBPMF-CF on the Netflix dataset

It can be seen from Table 6, under three attacks, the RMSE of UD-HMM+BPMF is between 0.8854 and 0.8911, the RMSE of RBPMF-CF is between 0.8876 and 0.8911. These results illustrate that the prediction accuracy of UD-HMM+BPMF on the Netflix dataset is almost the same with that of RBPMF-CF. The PS of UD-HMM+BPMF under random and average attacks is between 0.3130 and 0.5392, the PS of RBPMF-CF under random and average attacks is between 0.3139 and 0.4572. These results illustrate that there is no obvious difference between UD-HMM+BPMF and RBPMF-CF in prediction shift under random and average attacks on the Netflix dataset. As to AoP attack, the PS of UD-HMM+BPMF is between 0.3553 and 1.0018, which is larger than that of it under random and average attacks. This is because UD-HMM performs poorly in detecting AoP attack. The PS of RBPMF-CFunder AoP attack is between 0.3139 and 0.4471, which has little difference compared with that of it under random and average attacks. This again illustrates the superiority of ARBD indetecting anomaly rating users.

(3) Comparison of RMSE and PS for the basic MF and ARBD+MF. To further show the effectiveness of ARBD, we conduct experiments to compare RMSE and PS of the basic MF and ARBD+MF on the MovieLens and Netflix datasets with various attacks. Tables 7 and 8 list the RMSE and PS of the basic MF and ARBD+MF on the MovieLens and Netflix datasets, respectively.

As shown in Table 7, under three attacks, the RMSE of basic MF is between 0.9701 and 0.9760, the RMSE of ARBD+MF is between 0.9725 and 0.9764. Clearly, there is almost no difference between the basic MF and ARBD+MF in prediction accuracy. As to the prediction shift metric, the PS of basic MF under three attacks is between 0.8009 and 1.4840, the PS of ARBD+MF under three attacks is between 0.0824 and 0.1973. Clearly, ARBD+MF is more robust against attacks than the basic MF on the MovieLens dataset.

As shown in Table 8, under three attacks, the RMSE of basic MF is between 0.9533 and 0.9618, the RMSE of ARBD+MF is between 0.9621 and 0.9653, which has little difference between them in prediction accuracy. Under three attacks, the PS of basic MF is between 1.1482 and 1.7988, the PS of ARBD+MF is between 0.0715 and 0.1661. Clearly, the robustness of ARBD+MF on the Netflix dataset is much better than that of basic MF.

Therefore, the combination of ARBD with the basic MF model (i.e., ARBD+MF) can significantly improve the robustness of the basic MF. This again illustrates the effectiveness of ARBD in detecting anomaly rating users.

 

Table 7. RMSE and PS of the basic MF and ARBD+MF on the MovieLens dataset

 

Table 8. RMSE and PS of the basic MF and ARBD+MF on the Netflix dataset

 

4.3.4 Comparison of Actual Runtime for Four Algorithms

The actual runtime for a model-based recommendation algorithm consists of the training time and predictive time, which refers to the time required to train a predictive model on the training set and the time required to perform rating prediction for the target item on the test set, respectively. For RBPMF-CF algorithm, the training time also includes the time for detecting anomaly rating users. To compare the runtime of four algorithms, we carry out experiments on two datasets and calculate their training time and predictive time, respectively. Tables 9 and 10 list the training and predictive time for four algorithms, respectively.

 

Table 9. The training time for four algorithms (s)

 

Table 10. The predictive time for four algorithms (μs)

As listed in Table 9, the training time of VarSelect SVD on the MovieLens dataset is the largest, the training time of MMF, LTSMF and RBPMF-CF on the MovieLens 100K dataset has no obvious difference. As to Netflix dataset, the training time of VarSelect SVD is still the largest, LTSMF comes the second, MMF ranks the third, and the training time of RBPMF-CF is the smallest. Therefore, RBPMF-CF has obvious advantage in training time on the Netflix dataset.

In Table 10, the predictive time of four algorithms on two datasets is all in microseconds, which means four algorithms can make a predictive rating for a target item very quickly.

Therefore, there is no big difference between them in predictive time while the predictive time of RBPMF-CF is the smallest and the predictive time of VarSelect SVD is the largest.

 

 

5. Conclusion

In this paper, we present a robust BPMF model for CF recommender systems based on detecting anomaly rating users. To reduce the impact of shilling attacks on recommendation results, we present a modified K-means algorithm to cluster anomaly rating users and based on it we further identify and mark the attack users. By combining the detection results with BPMF model, we construct a robust CF model and design a robust recommendation algorithm.

Compared with the baseline algorithms, our algorithm is more accurate and robust. In our future work, we will incorporate the item attribute information into BPMF model to further improve the recommendation accuracy of our algorithm. In addition, we will explore more effective ways to detect the attackers to further improve the robustness of our algorithm.

References

  1. Z. Lin, "An empirical investigation of user and system recommendations in e-commerce," Decision Support Systems, vol. 68, pp. 111-124, December, 2014. https://doi.org/10.1016/j.dss.2014.10.003
  2. T.T.S. Nguyen, H. Lu and J. Lu, "Web-page recommendation based on web usage and domain knowledge," IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 10, pp. 2574-2587, October, 2014. https://doi.org/10.1109/TKDE.2013.78
  3. H. Yin, X. Zhou, B. Cui, and et al., "Adapting to user interest drift for POI recommendation," IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 10, pp. 2566-2581, October, 2016. https://doi.org/10.1109/TKDE.2016.2580511
  4. H. Mezni and T. Abdeljaoued, "A cloud services recommendation system based on Fuzzy Formal Concept Analysis," Data & Knowledge Engineering, vol. 116, pp. 100-123, July, 2018. https://doi.org/10.1016/j.datak.2018.05.008
  5. G. Guo, J. Zhang and N. Yorke-Smith, "A novel recommendation model regularized with user trust and item ratings," IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 7, pp. 1607-1620, July, 2016. https://doi.org/10.1109/TKDE.2016.2528249
  6. Y. Shi, M. Larson and A. Hanjalic, "Collaborative filtering beyond the user-item matrix: a survey of the state of the art and future challenges," ACM Computing Surveys, vol. 47, no. 1, pp.1-45, July, 2014.
  7. I. Gunes, C. Kaleli, A. Bilge, and et al., "Shilling attacks against recommender systems: a comprehensive survey," Artificial Intelligence Review, vol. 42, no. 4, pp. 767-799, April, 2014. https://doi.org/10.1007/s10462-012-9364-9
  8. B. Mobasher, R. Burke, R. Bhaumik and C. Williams, "Toward trustworthy recommender systems: an analysis of attack models and algorithm robustness," ACM Transactions on Internet Technology, vol. 7, no. 4, pp. 1-23, October, 2007. https://doi.org/10.1145/1189740.1189741
  9. R. Burke, M.P. O'Mahony, and N.J. Hurley, "Robust collaborative recommendation," in Ricci, F., Rokach, L., Shapira, B., Kantor, P.B.(Ed): Recommender Systems Handbook, Springer, pp. 805-835, 2011.
  10. N. Hurley, Z. Cheng and M. Zhang, "Statistical attack detection," in Proc. of the 3rd ACM Conference on Recommender systems, pp. 149-156, October 23-25, 2009.
  11. C.A. Williams, B. Mobasher and R. Burke, "Defending recommender systems: detection of profile injection attacks," Service Oriented Computing and Applications, vol. 1, no. 3, pp. 157-170, November, 2007. https://doi.org/10.1007/s11761-007-0013-0
  12. B. Mehta and W. Nejdl, "Unsupervised strategies for shilling detection and robust collaborative filtering," User Modeling and User Adapted Interaction, vol. 19, no. 1-2, pp. 65-97, February, 2009. https://doi.org/10.1007/s11257-008-9050-4
  13. J. Lee and D. Zhu, "Shilling attack detection-A new approach for a trustworthy recommender system," Informs Journal on Computing, vol. 24, no. 1, pp. 117-131, January, 2012. https://doi.org/10.1287/ijoc.1100.0440
  14. Y. Zhang, Y. Tan, M. Zhang, et al., "Catch the black sheep: unified framework for shilling attack detection based on fraudulent action propagation," in Proc. of the 24th International Conference on Artificial Intelligence, pp. 2408-2414, July 25-31, 2015.
  15. Z. Yang, L. Xu, Z. Cai, and et al. ,"Re-scale AdaBoost for attack detection in collaborative filtering recommender systems," Knowledge-Based Systems, vol. 100, pp. 74-88, May, 2016. https://doi.org/10.1016/j.knosys.2016.02.008
  16. F. Zhang, Z. Zhang, P. Zhang, and et al., "UD-HMM: An unsupervised method for shilling attack detection based on hidden Markov model and hierarchical clustering," Knowledge-Based Systems, vol. 148, pp. 146-166, May, 2018. https://doi.org/10.1016/j.knosys.2018.02.032
  17. M. O'Mahony, N. Hurley, N. Kushmerick, and et al., "Collaborative recommendation: a robustness analysis," ACM Transactions on Internet Technology, vol. 4, no. 4, pp. 344-377, November, 2004. https://doi.org/10.1145/1031114.1031116
  18. B. Mehta, T. Hofmann, and W. Nejdl, "Robust collaborative filtering," in Proc. of the 2007 ACM Conference on Recommender Systems, pp. 49-56, October 19-20, 2007.
  19. Z. Cheng and N. Hurley, "Robust collaborative recommendation by least trimmed squares matrix factorization," in Proc. of the 22nd International Conference on Tools with Artificial Intelligence, pp. 105-112, October 27-29, 2010.
  20. B. Mehta and W. Nejdl, "Attack resistant collaborative filtering," in Proc. of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 75-82, July 20-24, 2008.
  21. H. Yi and F. Zhang, "Robust recommendation method based on suspicious users measurement and multidimensional trust," Journal of Intelligent Information Systems, vol. 46, no. 2, pp. 349-367, April, 2016. https://doi.org/10.1007/s10844-015-0375-2
  22. F. Zhang, Y. Lu, J. Chen and et al., "Robust collaborative filtering based on non-negative matrix factorization and R1-norm," Knowledge-Based Systems, vol. 118, pp. 177-190, February, 2017. https://doi.org/10.1016/j.knosys.2016.11.021
  23. R. Salakhutdinov and A. Mnih, "Probabilistic matrix factorization," Advances in neural information processing systems, pp. 1257-1264, December 3-6, 2008.
  24. J. Liu, C. Wu and W. Liu, "Bayesian probabilistic matrix factorization with social relations and item contents for recommendation," Decision Support Systems, vol. 55, no. 3, pp. 838-850, June, 2013. https://doi.org/10.1016/j.dss.2013.04.002
  25. R. Salakhutdinov and A. Mnih, "Bayesian probabilistic matrix factorization using markov chain monte carlo," in Proc. of 25th International Conference on Machine Learning, pp. 880-887, July 5-9, 2008.
  26. B. Lakshminarayanan, G. Bouchard and C. Archambeau, "Robust Bayesian Matrix Factorisation," in Proc. of the 14th International Conference on Artificial Intelligence and Statistics, pp. 425-433, April 11-13, 2011.
  27. C. Li and Z. Luo, "A metadata-enhanced variational Bayesian matrix factorization model for robust collaborative recommendation," Acta Automatic Sinica, vol. 37, no. 9, pp. 1067-1076, September, 2011.
  28. H. Li, H. He and Y. Wen, "Dynamic particle swarm optimization and K-means clustering algorithm for image segmentation," Optik - International Journal for Light and Electron Optics, vol. 126, no. 24, pp. 4817-4822, December, 2015.
  29. ABS. Serapiao, GS. Correa, FB. Goncalves and VO. Carvalho, "Combining K-means and K-harmonic with fish school search algorithm for data clustering task on graphics processing units," Applied Soft Computing, vol. 41, pp. 290-304, April, 2016. https://doi.org/10.1016/j.asoc.2015.12.032
  30. Y. Koren, R. Bell and C. Volinsky, "Matrix factorization techniques for recommender systems," Computer, vol. 42, no. 8, pp. 30-37, August, 2009. https://doi.org/10.1109/MC.2009.263
  31. Koren Y. "Factor in the Neighbors: Scalable and accurate collaborative filtering," ACM Transactions on Knowledge Discovery from Data, vol. 4, no. 1, pp. 1-24, January, 2010. https://doi.org/10.1145/1644873.1644874

Cited by

  1. Forecasting economic result of business logic improvements using Game Theory for modeling user scenarios vol.8, pp.3, 2019, https://doi.org/10.23939/mmc2021.03.560