• Title/Summary/Keyword: sparsity

Search Result 328, Processing Time 0.025 seconds

Personal Information Protection Recommendation System using Deep Learning in POI (POI 에서 딥러닝을 이용한 개인정보 보호 추천 시스템)

  • Peng, Sony;Park, Doo-Soon;Kim, Daeyoung;Yang, Yixuan;Lee, HyeJung;Siet, Sophort
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.11a
    • /
    • pp.377-379
    • /
    • 2022
  • POI refers to the point of Interest in Location-Based Social Networks (LBSNs). With the rapid development of mobile devices, GPS, and the Web (web2.0 and 3.0), LBSNs have attracted many users to share their information, physical location (real-time location), and interesting places. The tremendous demand of the user in LBSNs leads the recommendation systems (RSs) to become more widespread attention. Recommendation systems assist users in discovering interesting local attractions or facilities and help social network service (SNS) providers based on user locations. Therefore, it plays a vital role in LBSNs, namely POI recommendation system. In the machine learning model, most of the training data are stored in the centralized data storage, so information that belongs to the user will store in the centralized storage, and users may face privacy issues. Moreover, sharing the information may have safety concerns because of uploading or sharing their real-time location with others through social network media. According to the privacy concern issue, the paper proposes a recommendation model to prevent user privacy and eliminate traditional RS problems such as cold-start and data sparsity.

Hybrid Movie Recommendation System Using Clustering Technique (클러스터링 기법을 이용한 하이브리드 영화 추천 시스템)

  • Sophort Siet;Sony Peng;Yixuan Yang;Sadriddinov Ilkhomjon;DaeYoung Kim;Doo-Soon Park
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.357-359
    • /
    • 2023
  • This paper proposes a hybrid recommendation system (RS) model that overcomes the limitations of traditional approaches such as data sparsity, cold start, and scalability by combining collaborative filtering and context-aware techniques. The objective of this model is to enhance the accuracy of recommendations and provide personalized suggestions by leveraging the strengths of collaborative filtering and incorporating user context features to capture their preferences and behavior more effectively. The approach utilizes a novel method that combines contextual attributes with the original user-item rating matrix of CF-based algorithms. Furthermore, we integrate k-mean++ clustering to group users with similar preferences and finally recommend items that have highly rated by other users in the same cluster. The process of partitioning is the use of the rating matrix into clusters based on contextual information offers several advantages. First, it bypasses of the computations over the entire data, reducing runtime and improving scalability. Second, the partitioned clusters hold similar ratings, which can produce greater impacts on each other, leading to more accurate recommendations and providing flexibility in the clustering process. keywords: Context-aware Recommendation, Collaborative Filtering, Kmean++ Clustering.

Jaccard Index Reflecting Time-Context for User-based Collaborative Filtering

  • Soojung Lee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.10
    • /
    • pp.163-170
    • /
    • 2023
  • The user-based collaborative filtering technique, one of the implementation methods of the recommendation system, recommends the preferred items of neighboring users based on the calculations of neighboring users with similar rating histories. However, it fundamentally has a data scarcity problem in which the quality of recommendations is significantly reduced when there is little common rating history. To solve this problem, many existing studies have proposed various methods of combining Jaccard index with a similarity measure. In this study, we introduce a time-aware concept to Jaccard index and propose a method of weighting common items with different weights depending on the rating time. As a result of conducting experiments using various performance metrics and time intervals, it is confirmed that the proposed method showed the best performance compared to the original Jaccard index at most metrics, and that the optimal time interval differs depending on the type of performance metric.

Deep Learning-Based Personalized Recommendation Using Customer Behavior and Purchase History in E-Commerce (전자상거래에서 고객 행동 정보와 구매 기록을 활용한 딥러닝 기반 개인화 추천 시스템)

  • Hong, Da Young;Kim, Ga Yeong;Kim, Hyon Hee
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.6
    • /
    • pp.237-244
    • /
    • 2022
  • In this paper, we present VAE-based recommendation using online behavior log and purchase history to overcome data sparsity and cold start. To generate a variable for customers' purchase history, embedding and dimensionality reduction are applied to the customers' purchase history. Also, Variational Autoencoders are applied to online behavior and purchase history. A total number of 12 variables are used, and nDCG is chosen for performance evaluation. Our experimental results showed that the proposed VAE-based recommendation outperforms SVD-based recommendation. Also, the generated purchase history variable improves the recommendation performance.

A Movie Recommendation System based on Fuzzy-AHP with User Preference and Partition Algorithm (사용자 선호도와 군집 알고리즘을 이용한 퍼지-계층적 분석 기법 기반 영화 추천 시스템)

  • Oh, Jae-Taek;Lee, Sang-Yong
    • Journal of Digital Convergence
    • /
    • v.15 no.11
    • /
    • pp.425-432
    • /
    • 2017
  • The current recommendation systems have problems including the difficulty of figuring out whether they recommend items that actual users have preference for or have simple interest in, the scarcity of data to recommend proper items due to the extremely small number of users, and the cold-start issue of the dropping system performance to recommend items that can satisfy users according to the influx of new users. In an effort to solve these problems, this study implemented a movie recommendation system to ensure user satisfaction by using the Fuzzy-Analytic Hierarchy Process, which can reflect uncertain situations and problems, and the data partition algorithm to group similar items among the given ones. The data of a survey on movie preference with 61 users was applied to the system, and the results show that it solved the data scarcity problem based on the Fuzzy-AHP and recommended items fit for a user with the data partition algorithm even with the influx of new users. It is thought that research on the density-based clustering will be needed to filter out future noise data or outlier data.

Collaborative Filtering using Co-Occurrence and Similarity information (상품 동시 발생 정보와 유사도 정보를 이용한 협업적 필터링)

  • Na, Kwang Tek;Lee, Ju Hong
    • Journal of Internet Computing and Services
    • /
    • v.18 no.3
    • /
    • pp.19-28
    • /
    • 2017
  • Collaborative filtering (CF) is a system that interprets the relationship between a user and a product and recommends the product to a specific user. The CF model is advantageous in that it can recommend products to users with only rating data without any additional information such as contents. However, there are many cases where a user does not give a rating even after consuming the product as well as consuming only a small portion of the total product. This means that the number of ratings observed is very small and the user rating matrix is very sparse. The sparsity of this rating data poses a problem in raising CF performance. In this paper, we concentrate on raising the performance of latent factor model (especially SVD). We propose a new model that includes product similarity information and co occurrence information in SVD. The similarity and concurrence information obtained from the rating data increased the expressiveness of the latent space in terms of latent factors. Thus, Recall increased by 16% and Precision and NDCG increased by 8% and 7%, respectively. The proposed method of the paper will show better performance than the existing method when combined with other recommender systems in the future.

A Hybrid Collaborative Filtering Using a Low-dimensional Linear Model (저차원 선형 모델을 이용한 하이브리드 협력적 여과)

  • Ko, Su-Jeong
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.10
    • /
    • pp.777-785
    • /
    • 2009
  • Collaborative filtering is a technique used to predict whether a particular user will like a particular item. User-based or item-based collaborative techniques have been used extensively in many commercial recommender systems. In this paper, a hybrid collaborative filtering method that combines user-based and item-based methods using a low-dimensional linear model is proposed. The proposed method solves the problems of sparsity and a large database by using NMF among the low-dimensional linear models. In collaborative filtering systems the methods using the NMF are useful in expressing users as semantic relations. However, they are model-based methods and the process of computation is complex, so they can not recommend items dynamically. In order to complement the shortcomings, the proposed method clusters users into groups by using NMF and selects features of groups by using TF-IDF. Mutual information is then used to compute similarities between items. The proposed method clusters users into groups and extracts features of groups on offline and determines the most suitable group for an active user using the features of groups on online. Finally, the proposed method reduces the time required to classify an active user into a group and outperforms previous methods by combining user-based and item-based collaborative filtering methods.

T-Commerce Sale Prediction Using Deep Learning and Statistical Model (딥러닝과 통계 모델을 이용한 T-커머스 매출 예측)

  • Kim, Injung;Na, Kihyun;Yang, Sohee;Jang, Jaemin;Kim, Yunjong;Shin, Wonyoung;Kim, Deokjung
    • Journal of KIISE
    • /
    • v.44 no.8
    • /
    • pp.803-812
    • /
    • 2017
  • T-commerce is technology-fusion service on which the user can purchase using data broadcasting technology based on bi-directional digital TVs. To achieve the best revenue under a limited environment in regard to the channel number and the variety of sales goods, organizing broadcast programs to maximize the expected sales considering the selling power of each product at each time slot. For this, this paper proposes a method to predict the sales of goods when it is assigned to each time slot. The proposed method predicts the sales of product at a time slot given the week-in-year and weather of the target day. Additionally, it combines a statistical predict model applying SVD (Singular Value Decomposition) to mitigate the sparsity problem caused by the bias in sales record. In experiments on the sales data of W-shopping, a T-commerce company, the proposed method showed NMAE (Normalized Mean Absolute Error) of 0.12 between the prediction and the actual sales, which confirms the effectiveness of the proposed method. The proposed method is practically applied to the T-commerce system of W-shopping and used for broadcasting organization.

Why Gabor Frames? Two Fundamental Measures of Coherence and Their Role in Model Selection

  • Bajwa, Waheed U.;Calderbank, Robert;Jafarpour, Sina
    • Journal of Communications and Networks
    • /
    • v.12 no.4
    • /
    • pp.289-307
    • /
    • 2010
  • The problem of model selection arises in a number of contexts, such as subset selection in linear regression, estimation of structures in graphical models, and signal denoising. This paper studies non-asymptotic model selection for the general case of arbitrary (random or deterministic) design matrices and arbitrary nonzero entries of the signal. In this regard, it generalizes the notion of incoherence in the existing literature on model selection and introduces two fundamental measures of coherence-termed as the worst-case coherence and the average coherence-among the columns of a design matrix. It utilizes these two measures of coherence to provide an in-depth analysis of a simple, model-order agnostic one-step thresholding (OST) algorithm for model selection and proves that OST is feasible for exact as well as partial model selection as long as the design matrix obeys an easily verifiable property, which is termed as the coherence property. One of the key insights offered by the ensuing analysis in this regard is that OST can successfully carry out model selection even when methods based on convex optimization such as the lasso fail due to the rank deficiency of the submatrices of the design matrix. In addition, the paper establishes that if the design matrix has reasonably small worst-case and average coherence then OST performs near-optimally when either (i) the energy of any nonzero entry of the signal is close to the average signal energy per nonzero entry or (ii) the signal-to-noise ratio in the measurement system is not too high. Finally, two other key contributions of the paper are that (i) it provides bounds on the average coherence of Gaussian matrices and Gabor frames, and (ii) it extends the results on model selection using OST to low-complexity, model-order agnostic recovery of sparse signals with arbitrary nonzero entries. In particular, this part of the analysis in the paper implies that an Alltop Gabor frame together with OST can successfully carry out model selection and recovery of sparse signals irrespective of the phases of the nonzero entries even if the number of nonzero entries scales almost linearly with the number of rows of the Alltop Gabor frame.

A New Similarity Measure for Categorical Attribute-Based Clustering (범주형 속성 기반 군집화를 위한 새로운 유사 측도)

  • Kim, Min;Jeon, Joo-Hyuk;Woo, Kyung-Gu;Kim, Myoung-Ho
    • Journal of KIISE:Databases
    • /
    • v.37 no.2
    • /
    • pp.71-81
    • /
    • 2010
  • The problem of finding clusters is widely used in numerous applications, such as pattern recognition, image analysis, market analysis. The important factors that decide cluster quality are the similarity measure and the number of attributes. Similarity measures should be defined with respect to the data types. Existing similarity measures are well applicable to numerical attribute values. However, those measures do not work well when the data is described by categorical attributes, that is, when no inherent similarity measure between values. In high dimensional spaces, conventional clustering algorithms tend to break down because of sparsity of data points. To overcome this difficulty, a subspace clustering approach has been proposed. It is based on the observation that different clusters may exist in different subspaces. In this paper, we propose a new similarity measure for clustering of high dimensional categorical data. The measure is defined based on the fact that a good clustering is one where each cluster should have certain information that can distinguish it with other clusters. We also try to capture on the attribute dependencies. This study is meaningful because there has been no method to use both of them. Experimental results on real datasets show clusters obtained by our proposed similarity measure are good enough with respect to clustering accuracy.