• Title/Summary/Keyword: matrix factorization

Search Result 309, Processing Time 0.026 seconds

Recommendations Based on Listwise Learning-to-Rank by Incorporating Social Information

  • Fang, Chen;Zhang, Hengwei;Zhang, Ming;Wang, Jindong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.1
    • /
    • pp.109-134
    • /
    • 2018
  • Collaborative Filtering (CF) is widely used in recommendation field, which can be divided into rating-based CF and learning-to-rank based CF. Although many methods have been proposed based on these two kinds of CF, there still be room for improvement. Firstly, the data sparsity problem still remains a big challenge for CF algorithms. Secondly, the malicious rating given by some illegal users may affect the recommendation accuracy. Existing CF algorithms seldom took both of the two observations into consideration. In this paper, we propose a recommendation method based on listwise learning-to-rank by incorporating users' social information. By taking both ratings and order of items into consideration, the Plackett-Luce model is presented to find more accurate similar users. In order to alleviate the data sparsity problem, the improved matrix factorization model by integrating the influence of similar users is proposed to predict the rating. On the basis of exploring the trust relationship between users according to their social information, a listwise learning-to-rank algorithm is proposed to learn an optimal ranking model, which can output the recommendation list more consistent with the user preference. Comprehensive experiments conducted on two public real-world datasets show that our approach not only achieves high recommendation accuracy in relatively short runtime, but also is able to reduce the impact of malicious ratings.

Using Experts Among Users for Novel Movie Recommendations

  • Lee, Kibeom;Lee, Kyogu
    • Journal of Computing Science and Engineering
    • /
    • v.7 no.1
    • /
    • pp.21-29
    • /
    • 2013
  • The introduction of recommender systems to existing online services is now practically inevitable, with the increasing number of items and users on online services. Popular recommender systems have successfully implemented satisfactory systems, which are usually based on collaborative filtering. However, collaborative filtering-based recommenders suffer from well-known problems, such as popularity bias, and the cold-start problem. In this paper, we propose an innovative collaborative-filtering based recommender system, which uses the concepts of Experts and Novices to create fine-grained recommendations that focus on being novel, while being kept relevant. Experts and Novices are defined using pre-made clusters of similar items, and the distribution of users' ratings among these clusters. Thus, in order to generate recommendations, the experts are found dynamically depending on the seed items of the novice. The proposed recommender system was built using the MovieLens 1 M dataset, and evaluated with novelty metrics. Results show that the proposed system outperforms matrix factorization methods according to discovery-based novelty metrics, and can be a solution to popularity bias and the cold-start problem, while still retaining collaborative filtering.

Document Summarization using Pseudo Relevance Feedback and Term Weighting (의사연관피드백과 용어 가중치에 의한 문서요약)

  • Kim, Chul-Won;Park, Sun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.3
    • /
    • pp.533-540
    • /
    • 2012
  • In this paper, we propose a document summarization method using the pseudo relevance feedback and the term weighting based on semantic features. The proposed method can minimize the user intervention to use the pseudo relevance feedback. It also can improve the quality of document summaries because the inherent semantic of the sentence set are well reflected by term weighting derived from semantic feature. In addition, it uses the semantic feature of term weighting and the expanded query to reduce the semantic gap between the user's requirement and the result of proposed method. The experimental results demonstrate that the proposed method achieves better performant than other methods without term weighting.

Document Clustering using Clustering and Wikipedi (군집과 위키피디아를 이용한 문서군집)

  • Park, Sun;Lee, Seong Ho;Park, Hee Man;Kim, Won Ju;Kim, Dong Jin;Chandra, Abel;Lee, Seong Ro
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2012.10a
    • /
    • pp.392-393
    • /
    • 2012
  • This paper proposes a new document clustering method using clustering and Wikipedia. The proposed method can well represent the concept of cluster topics by means of NMF. It can solve the problem of "bags of words" to be not considered the meaningful relationships between documents and clusters, which expands the important terms of cluster by using of the synonyms of Wikipedia. The experimental results demonstrate that the proposed method achieves better performance than other document clustering methods.

  • PDF

Empirical Comparison of Word Similarity Measures Based on Co-Occurrence, Context, and a Vector Space Model

  • Kadowaki, Natsuki;Kishida, Kazuaki
    • Journal of Information Science Theory and Practice
    • /
    • v.8 no.2
    • /
    • pp.6-17
    • /
    • 2020
  • Word similarity is often measured to enhance system performance in the information retrieval field and other related areas. This paper reports on an experimental comparison of values for word similarity measures that were computed based on 50 intentionally selected words from a Reuters corpus. There were three targets, including (1) co-occurrence-based similarity measures (for which a co-occurrence frequency is counted as the number of documents or sentences), (2) context-based distributional similarity measures obtained from a latent Dirichlet allocation (LDA), nonnegative matrix factorization (NMF), and Word2Vec algorithm, and (3) similarity measures computed from the tf-idf weights of each word according to a vector space model (VSM). Here, a Pearson correlation coefficient for a pair of VSM-based similarity measures and co-occurrence-based similarity measures according to the number of documents was highest. Group-average agglomerative hierarchical clustering was also applied to similarity matrices computed by individual measures. An evaluation of the cluster sets according to an answer set revealed that VSM- and LDA-based similarity measures performed best.

Enhancing Snippet Extraction Method using Fuzzy and Semantic Features (퍼지와 의미특징을 이용한 스니핏 추출 향상 방법)

  • Park, Sun;Lee, Yeonwoo;Cho, Kwangmoon;Yang, Huyeol;Lee, Seong Ro
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.11
    • /
    • pp.2374-2381
    • /
    • 2012
  • This paper proposes a new enhancing snippet extraction method using fuzzy and semantic features. The proposed method creates a delegate of sentence by using semantic features. It extracts snippet using fuzzy association between a delegate sentence and sentence set which well represents query. In addition, the method uses pseudo relevance feedback to expand query which extracts snippet to be well reflected semantic user's intention. The experimental results demonstrate the proposed method can achieve better snippet extraction performance than the previous methods.

Source Analysis of Size Distribution and Density Estimation in PM2.5 -Part II (입경 분포 원인 분석 및 PM2.5 밀도 추정 -Part II)

  • Bae, Min-Suk;Park, Da-Jeong;Lee, Jeonghoon;Ahn, Joon-Young;Lee, Yeong-Jae
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.32 no.2
    • /
    • pp.158-166
    • /
    • 2016
  • To characterize the features of particle apparent density, continuous measurements of particle number size distributions from optical particle sizer (OPS) and 24 hr integrated particle mass concentrations from filter based sampler were conducted at the National institute of environmental research NamBu Supersite (NNBS, $35.22^{\circ}N$, $126.84^{\circ}E$) in Gwangju for 16 days from Nov. 4 in 2014. Source apportionment model was carried out by applying Positive Matrix Factorization (PMF) to particle size distribution data. Three different distributions related to primary and secondary sources were investigated by the diurnal patterns of identified factors. Density estimated by gaussian model has been calculated as $1.69g/cm^3$ with 95% confidence bounds ($1.57{\sim}1.81g/cm^3$).

Font Classification using NMF and EMD (NMF와 EMD를 이용한 영문자 활자체 폰트분류)

  • Lee, Chang-Woo;Kang, Hyun;Jung, Kee-Chul;Kim, Hang-Joon
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.04b
    • /
    • pp.688-690
    • /
    • 2004
  • 최근 전자화된 문서 영상을 효율적으로 관리하고 검색하기 위한 문서구조분석 방법과 문서의 자동 분류에 관한 많은 연구가 발표되고 있다. 본 논문에서는 NMF(non-negative matrix factorization) 알고리즘을 사용하여 폰트를 자동으로 분류하는 방법을 제안한다. 제안된 방법은 폰트의 구분 특징들이 공간적으로 국부성을 가지는 부분으로 표현될 수 있다는 가정을 바탕으로, 전체의 폰트 이미지들로부터 각 폰트들의 구분 특징인 부분을 학습하고, 학습된 부분들을 특징으로 사용하여 폰트를 분류하는 방법이다. 학습된 폰트의 특징들은 계층적 군집화 알고리즘을 이용하여 템플릿을 생성하고, 테스트 패턴을 분류하기 위하여 템플릿 패턴과의 EMD(earth mover's distance)를 사용한다. 실험결과에서 폰트 이미지들의 공간적으로 국부적인 특징들이 조사되고, 그 특징들의 폰트 식별을 위한 적절성을 보였다. 제안된 방법이 기존의 문자인식. 문서 검색 시스템들의 전처리기로 사용되면. 그 시스템들의 성능을 향상시킬 것으로 기대된다.

  • PDF

A Construction of the Linear Digital Switching Function over Finite Fields (유한체상에서의 선형디지털스위칭함수 구성)

  • Park, Chun-Myoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.12
    • /
    • pp.2201-2206
    • /
    • 2008
  • This paper presents a method of constructing the Linear Digital Switching Function(LDSF) over finite fields. The proposed method is as following. First of all, we extract the input/output relationship of linear characteristics for the given digital switching functions, Next, we convert the input/output relationship to Directed Cyclic Graph(DCG) using basic gates adder and coefficient multiplier that are defined by mathematical properties in finite fields. Also, we propose the new factorization method for matrix characteristics equation that represent the relationship of the input/output characteristics. The proposed method have properties of generalization and regularity. Also, the proposed method is possible to any prime number multiplication expression.

PARAFAC Tensor Reconstruction for Recommender System based on Apache Spark (아파치 스파크에서의 PARAFAC 분해 기반 텐서 재구성을 이용한 추천 시스템)

  • Im, Eo-Jin;Yong, Hwan-Seung
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.4
    • /
    • pp.443-454
    • /
    • 2019
  • In recent years, there has been active research on a recommender system that considers three or more inputs in addition to users and goods, making it a multi-dimensional array, also known as a tensor. The main issue with using tensor is that there are a lot of missing values, making it sparse. In order to solve this, the tensor can be shrunk using the tensor decomposition algorithm into a lower dimensional array called a factor matrix. Then, the tensor is reconstructed by calculating factor matrices to fill original empty cells with predicted values. This is called tensor reconstruction. In this paper, we propose a user-based Top-K recommender system by normalized PARAFAC tensor reconstruction. This method involves factorization of a tensor into factor matrices and reconstructs the tensor again. Before decomposition, the original tensor is normalized based on each dimension to reduce overfitting. Using the real world dataset, this paper shows the processing of a large amount of data and implements a recommender system based on Apache Spark. In addition, this study has confirmed that the recommender performance is improved through normalization of the tensor.