• Title/Summary/Keyword: Similarity filtering

Search Result 281, Processing Time 0.03 seconds

Matrix-based Filtering and Load-balancing Algorithm for Efficient Similarity Join Query Processing in Distributed Computing Environment (분산 컴퓨팅 환경에서 효율적인 유사 조인 질의 처리를 위한 행렬 기반 필터링 및 부하 분산 알고리즘)

  • Yang, Hyeon-Sik;Jang, Miyoung;Chang, Jae-Woo
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.7
    • /
    • pp.667-680
    • /
    • 2016
  • As distributed computing platforms like Hadoop MapReduce have been developed, it is necessary to perform the conventional query processing techniques, which have been executed in a single computing machine, in distributed computing environments efficiently. Especially, studies on similarity join query processing in distributed computing environments have been done where similarity join means retrieving all data pairs with high similarity between given two data sets. But the existing similarity join query processing schemes for distributed computing environments have a problem of skewed computing load balance between clusters because they consider only the data transmission cost. In this paper, we propose Matrix-based Load-balancing Algorithm for efficient similarity join query processing in distributed computing environment. In order to uniform load balancing of clusters, the proposed algorithm estimates expected computing cost by using matrix and generates partitions based on the estimated cost. In addition, it can reduce computing loads by filtering out data which are not used in query processing in clusters. Finally, it is shown from our performance evaluation that the proposed algorithm is better on query processing performance than the existing one.

Entropy-based Similarity Measures for Memory-based Collaborative Filtering

  • Kwon, Hyeong-Joon;Latchman, Haniph
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.5 no.2
    • /
    • pp.5-10
    • /
    • 2013
  • We proposed a novel similarity measure using weighted difference entropy (WDE) to improve the performance of the CF system. The proposed similarity metric evaluates the entropy with a preference score difference between the common rated items of two users, and normalizes it based on the Gaussian, tanh and sigmoid function. We showed significant improvement of experimental results and environments. These experiments involved changing the number of nearest neighborhoods, and we presented experimental results for two data sets with different characteristics, and results for the quality of recommendation.

Improvement on Similarity Calculation in Collaborative Filtering Recommendation using Demographic Information (인구 통계 정보를 이용한 협업 여과 추천의 유사도 개선 기법)

  • 이용준;이세훈;왕창종
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.9 no.5
    • /
    • pp.521-529
    • /
    • 2003
  • In this paper we present an improved method by using demographic information for overcoming the similarity miss-calculation from the sparsity problem in collaborative filtering recommendation systems. The similarity between a pair of users is only determined by the ratings given to co-rated items, so items that have not been rated by both users are ignored. To solve this problem, we add virtual neighbor's rating using demographic information of neighbors for improving prediction accuracy. It is one kind of extentions of traditional collaborative filtering methods using the peason correlation coefficient. We used the Grouplens movie rating data in experiment and we have compared the proposed method with the collaborative filtering methods by the mean absolute error and receive operating characteristic values. The results show that the proposed method is more efficient than the collaborative filtering methods using the pearson correlation coefficient about 9% in MAE and 13% in sensitivity of ROC.

Harmonic Mean Weight by Combining Content Based Filtering and Collaborative Filtering in a Recommender System (내용 기반 여과와 협력적 여과의 병합을 통한 추천 시스템에서 조화 평균 가중치)

  • 정경용;류중경;강운구;이정현
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.3_4
    • /
    • pp.239-250
    • /
    • 2003
  • Recent recommender system user a method of combining collaborative filtering system and content based filtering system in order to slove the problem of the Sparsity and First-Rater in collaborative filtering system. In this paper, to make up for the prediction accuracy in hybrid Recommender system, the harmonic mean weight(CBCF_harmonic_mean) is used for calculating the user similarity weight. After setting up the threshold as 45 considering the performance of content based filtering, we apply significance weight of n/45 to user similarity weight. To estimate the performance of the proposed method, it if compared with that of combing both the existing collaborative filtering system and the content- based filtering system. As a result, it confirms that the suggested method is efficient at improving the prediction accuracy as solving problems of the exiting collaborative filtering system.

Applying Different Similarity Measures based on Jaccard Index in Collaborative Filtering

  • Lee, Soojung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.5
    • /
    • pp.47-53
    • /
    • 2021
  • Sparse ratings data hinder reliable similarity computation between users, which degrades the performance of memory-based collaborative filtering techniques for recommender systems. Many works in the literature have been developed for solving this data sparsity problem, where the most simple and representative ones are the methods of utilizing Jaccard index. This index reflects the number of commonly rated items between two users and is mostly integrated into traditional similarity measures to compute similarity more accurately between the users. However, such integration is very straightforward with no consideration of the degree of data sparsity. This study suggests a novel idea of applying different similarity measures depending on the numeric value of Jaccard index between two users. Performance experiments are conducted to obtain optimal values of the parameters used by the proposed method and evaluate it in comparison with other relevant methods. As a result, the proposed demonstrates the best and comparable performance in prediction and recommendation accuracies.

Integration of Similarity Values Reflecting Rating Time for Collaborative Filtering

  • Lee, Soojung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.1
    • /
    • pp.83-89
    • /
    • 2022
  • As a representative technique of recommender systems, collaborative filtering has been successfully in service through many commercial and academic systems. This technique recommends items highly rated by similar neighbor users, based on similarity of ratings on common items rated by two users. Recently research on time-aware recommender systems has been conducted, which attempts to improve system performance by reflecting user rating time of items. However, the decay rate uniform to past ratings has a risk of lowering the rating prediction performance of the system. This study proposes a rating time-aware similarity measure between users, which is a novel approach different from previous ones. The proposed approach considers changes of similarity value over time, not item rating time. In order to evaluate performance of the proposed method, experiments using various parameter values and types of time change functions are conducted, resulting in improving prediction accuracy of existing traditional similarity measures significantly.

Analysis of Performance Improvement of Collaborative Filtering based on Neighbor Selection Criteria (이웃 선정 조건에 따른 협력 필터링의 성능 향상 분석)

  • Lee, Soojung
    • The Journal of Korean Association of Computer Education
    • /
    • v.18 no.4
    • /
    • pp.55-62
    • /
    • 2015
  • Recommender systems through collaborative filtering has been utilized successfully in various areas by providing with convenience in searching information. Measuring similarity is critical in determining performance of these systems, because it is the criteria for the range of recommenders. This study analyzes distributions of similarity from traditional measures and investigates relations between similarities and the number of co-rated items. With this, this study suggests a method for selecting reliable recommenders by restricting similarities, which compensates for the drawbacks of previous measures. Experimental results showed that restricting similarities of neighbors by upper and lower thresholds yield superior performance than previous methods, especially when consulting fewer nearest neighbors. Maximum improvement of 0.047 for cosine similarity and that of 0.03 for Pearson was achieved. This result tells that a collaborative filtering system using Pearson or cosine similarities should not consult neighbors with very high or low similarities.

Query Expansion and Term Weighting Method for Document Filtering (문서필터링을 위한 질의어 확장과 가중치 부여 기법)

  • Shin, Seung-Eun;Kang, Yu-Hwan;Oh, Hyo-Jung;Jang, Myung-Gil;Park, Sang-Kyu;Lee, Jae-Sung;Seo, Young-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.10B no.7
    • /
    • pp.743-750
    • /
    • 2003
  • In this paper, we propose a query expansion and weighting method for document filtering to increase precision of the result of Web search engines. Query expansion for document filtering uses ConceptNet, encyclopedia and documents of 10% high similarity. Term weighting method is used for calculation of query-documents similarity. In the first step, we expand an initial query into the first expanded query using ConceptNet and encyclopedia. And then we weight the first expanded query and calculate the first expanded query-documents similarity. Next, we create the second expanded query using documents of top 10% high similarity and calculate the second expanded query- documents similarity. We combine two similarities from the first and the second step. And then we re-rank the documents according to the combined similarities and filter off non-relevant documents with the lower similarity than the threshold. Our experiments showed that our document filtering method results in a notable improvement in the retrieval effectiveness when measured using both precision-recall and F-Measure.

Collaborative Filtering Recommendation Algorithm Based on LDA2Vec Topic Model (LDA2Vec 항목 모델을 기반으로 한 협업 필터링 권장 알고리즘)

  • Xin, Zhang;Lee, Scott Uk-Jin
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2020.07a
    • /
    • pp.385-386
    • /
    • 2020
  • In this paper, we propose a collaborative filtering recommendation algorithm based on the LDA2Vec topic model. By extracting and analyzing the article's content, calculate their semantic similarity then combine the traditional collaborative filtering algorithm to recommend. This approach may promote the system's recommend accuracy.

  • PDF

Nearest-Neighbor Collaborative Filtering Using Dimensionality Reduction by Non-negative Matrix Factorization (비부정 행렬 인수분해 차원 감소를 이용한 최근 인접 협력적 여과)

  • Ko, Su-Jeong
    • The KIPS Transactions:PartB
    • /
    • v.13B no.6 s.109
    • /
    • pp.625-632
    • /
    • 2006
  • Collaborative filtering is a technology that aims at teaming predictive models of user preferences. Collaborative filtering systems have succeeded in Ecommerce market but they have shortcomings of high dimensionality and sparsity. In this paper we propose the nearest neighbor collaborative filtering method using non-negative matrix factorization(NNMF). We replace the missing values in the user-item matrix by using the user variance coefficient method as preprocessing for matrix decomposition and apply non-negative factorization to the matrix. The positive decomposition method using the non-negative decomposition represents users as semantic vectors and classifies the users into groups based on semantic relations. We compute the similarity between users by using vector similarity and selects the nearest neighbors based on the similarity. We predict the missing values of items that didn't rate by a new user based on the values that the nearest neighbors rated items.