• Title/Summary/Keyword: Sparsity Problem

Search Result 135, Processing Time 0.025 seconds

A Prediction System of User Preferences for Newly Released Items Based on Words (새로 출시되는 품목들을 위한 단어 기반의 사용자 선호도 예측 기법)

  • Choi, Yoon-Seok;Moon, Byung-Ro
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.16 no.2
    • /
    • pp.156-163
    • /
    • 2006
  • CF systems are widely used in recommendation due to the easy implementation and the outstanding performance. They have several problems such as the sparsity problem, the first-rater problem, and recommending explanation. Many studies are suggested to resolve these problems. While the influence of the sparsity problem lessens as the users' data are accumulated, but the first-rater problem is originated from the CF systems and there are a number of researches to overcome the disadvantages of CF systems based on the content-based methods. Also CF systems are black boxes, providing no explanation of working of the recommendation. In this paper we present a content-based prediction system based on the preference words, which exposes the reasoning behind a recommendation. Our system predicts user's rating of a new movie and we suggest a semiotic network-based method to solve the mismatching problem between the items. For experimental comparison, we used EachMovie and IMDb dataset.

Conditional Generative Adversarial Network based Collaborative Filtering Recommendation System (Conditional Generative Adversarial Network(CGAN) 기반 협업 필터링 추천 시스템)

  • Kang, Soyi;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.157-173
    • /
    • 2021
  • With the development of information technology, the amount of available information increases daily. However, having access to so much information makes it difficult for users to easily find the information they seek. Users want a visualized system that reduces information retrieval and learning time, saving them from personally reading and judging all available information. As a result, recommendation systems are an increasingly important technologies that are essential to the business. Collaborative filtering is used in various fields with excellent performance because recommendations are made based on similar user interests and preferences. However, limitations do exist. Sparsity occurs when user-item preference information is insufficient, and is the main limitation of collaborative filtering. The evaluation value of the user item matrix may be distorted by the data depending on the popularity of the product, or there may be new users who have not yet evaluated the value. The lack of historical data to identify consumer preferences is referred to as data sparsity, and various methods have been studied to address these problems. However, most attempts to solve the sparsity problem are not optimal because they can only be applied when additional data such as users' personal information, social networks, or characteristics of items are included. Another problem is that real-world score data are mostly biased to high scores, resulting in severe imbalances. One cause of this imbalance distribution is the purchasing bias, in which only users with high product ratings purchase products, so those with low ratings are less likely to purchase products and thus do not leave negative product reviews. Due to these characteristics, unlike most users' actual preferences, reviews by users who purchase products are more likely to be positive. Therefore, the actual rating data is over-learned in many classes with high incidence due to its biased characteristics, distorting the market. Applying collaborative filtering to these imbalanced data leads to poor recommendation performance due to excessive learning of biased classes. Traditional oversampling techniques to address this problem are likely to cause overfitting because they repeat the same data, which acts as noise in learning, reducing recommendation performance. In addition, pre-processing methods for most existing data imbalance problems are designed and used for binary classes. Binary class imbalance techniques are difficult to apply to multi-class problems because they cannot model multi-class problems, such as objects at cross-class boundaries or objects overlapping multiple classes. To solve this problem, research has been conducted to convert and apply multi-class problems to binary class problems. However, simplification of multi-class problems can cause potential classification errors when combined with the results of classifiers learned from other sub-problems, resulting in loss of important information about relationships beyond the selected items. Therefore, it is necessary to develop more effective methods to address multi-class imbalance problems. We propose a collaborative filtering model using CGAN to generate realistic virtual data to populate the empty user-item matrix. Conditional vector y identify distributions for minority classes and generate data reflecting their characteristics. Collaborative filtering then maximizes the performance of the recommendation system via hyperparameter tuning. This process should improve the accuracy of the model by addressing the sparsity problem of collaborative filtering implementations while mitigating data imbalances arising from real data. Our model has superior recommendation performance over existing oversampling techniques and existing real-world data with data sparsity. SMOTE, Borderline SMOTE, SVM-SMOTE, ADASYN, and GAN were used as comparative models and we demonstrate the highest prediction accuracy on the RMSE and MAE evaluation scales. Through this study, oversampling based on deep learning will be able to further refine the performance of recommendation systems using actual data and be used to build business recommendation systems.

Sparse Document Data Clustering Using Factor Score and Self Organizing Maps (인자점수와 자기조직화지도를 이용한 희소한 문서데이터의 군집화)

  • Jun, Sung-Hae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.22 no.2
    • /
    • pp.205-211
    • /
    • 2012
  • The retrieved documents have to be transformed into proper data structure for the clustering algorithms of statistics and machine learning. A popular data structure for document clustering is document-term matrix. This matrix has the occurred frequency value of a term in each document. There is a sparsity problem in this matrix because most frequencies of the matrix are 0 values. This problem affects the clustering performance. The sparseness of document-term matrix decreases the performance of clustering result. So, this research uses the factor score by factor analysis to solve the sparsity problem in document clustering. The document-term matrix is transformed to document-factor score matrix using factor scores in this paper. Also, the document-factor score matrix is used as input data for document clustering. To compare the clustering performances between document-term matrix and document-factor score matrix, this research applies two typed matrices to self organizing map (SOM) clustering.

Collaborative Filtering for Credit Card Recommendation based on Multiple User Profiles (신용카드 추천을 위한 다중 프로파일 기반 협업필터링)

  • Lee, Won Cheol;Yoon, Hyoup Sang;Jeong, Seok Bong
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.40 no.4
    • /
    • pp.154-163
    • /
    • 2017
  • Collaborative filtering, one of the most widely used techniques to build recommender systems, is based on the idea that users with similar preferences can help one another find useful items. Credit card user behavior analytics show that most customers hold three or less credit cards without duplicates. This behavior is one of the most influential factors to data sparsity. The 'cold-start' problem caused by data sparsity prevents recommender system from providing recommendation properly in the personalized credit card recommendation scenario. We propose a personalized credit card recommender system to address the cold-start problem, using multiple user profiles. The proposed system consists of a training process and an application process using five user profiles. In the training process, the five user profiles are transformed to five user networks based on the cosine similarity, and an integrated user network is derived by weighted sum of each user network. The application process selects k-nearest neighbors (users) from the integrated user network derived in the training process, and recommends three of the most frequently used credit card by the k-nearest neighbors. In order to demonstrate the performance of the proposed system, we conducted experiments with real credit card user data and calculated the F1 Values. The F1 value of the proposed system was compared with that of the existing recommendation techniques. The results show that the proposed system provides better recommendation than the existing techniques. This paper not only contributes to solving the cold start problem that may occur in the personalized credit card recommendation scenario, but also is expected for financial companies to improve customer satisfactions and increase corporate profits by providing recommendation properly.

Image Denoising for Metal MRI Exploiting Sparsity and Low Rank Priors

  • Choi, Sangcheon;Park, Jun-Sik;Kim, Hahnsung;Park, Jaeseok
    • Investigative Magnetic Resonance Imaging
    • /
    • v.20 no.4
    • /
    • pp.215-223
    • /
    • 2016
  • Purpose: The management of metal-induced field inhomogeneities is one of the major concerns of distortion-free magnetic resonance images near metallic implants. The recently proposed method called "Slice Encoding for Metal Artifact Correction (SEMAC)" is an effective spin echo pulse sequence of magnetic resonance imaging (MRI) near metallic implants. However, as SEMAC uses the noisy resolved data elements, SEMAC images can have a major problem for improving the signal-to-noise ratio (SNR) without compromising the correction of metal artifacts. To address that issue, this paper presents a novel reconstruction technique for providing an improvement of the SNR in SEMAC images without sacrificing the correction of metal artifacts. Materials and Methods: Low-rank approximation in each coil image is first performed to suppress the noise in the slice direction, because the signal is highly correlated between SEMAC-encoded slices. Secondly, SEMAC images are reconstructed by the best linear unbiased estimator (BLUE), also known as Gauss-Markov or weighted least squares. Noise levels and correlation in the receiver channels are considered for the sake of SNR optimization. To this end, since distorted excitation profiles are sparse, $l_1$ minimization performs well in recovering the sparse distorted excitation profiles and the sparse modeling of our approach offers excellent correction of metal-induced distortions. Results: Three images reconstructed using SEMAC, SEMAC with the conventional two-step noise reduction, and the proposed image denoising for metal MRI exploiting sparsity and low rank approximation algorithm were compared. The proposed algorithm outperformed two methods and produced 119% SNR better than SEMAC and 89% SNR better than SEMAC with the conventional two-step noise reduction. Conclusion: We successfully demonstrated that the proposed, novel algorithm for SEMAC, if compared with conventional de-noising methods, substantially improves SNR and reduces artifacts.

Applying Different Similarity Measures based on Jaccard Index in Collaborative Filtering

  • Lee, Soojung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.5
    • /
    • pp.47-53
    • /
    • 2021
  • Sparse ratings data hinder reliable similarity computation between users, which degrades the performance of memory-based collaborative filtering techniques for recommender systems. Many works in the literature have been developed for solving this data sparsity problem, where the most simple and representative ones are the methods of utilizing Jaccard index. This index reflects the number of commonly rated items between two users and is mostly integrated into traditional similarity measures to compute similarity more accurately between the users. However, such integration is very straightforward with no consideration of the degree of data sparsity. This study suggests a novel idea of applying different similarity measures depending on the numeric value of Jaccard index between two users. Performance experiments are conducted to obtain optimal values of the parameters used by the proposed method and evaluate it in comparison with other relevant methods. As a result, the proposed demonstrates the best and comparable performance in prediction and recommendation accuracies.

Sparse kernel classication using IRWLS procedure

  • Kim, Dae-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.4
    • /
    • pp.749-755
    • /
    • 2009
  • Support vector classification (SVC) provides more complete description of the lin-ear and nonlinear relationships between input vectors and classifiers. In this paper. we propose the sparse kernel classifier to solve the optimization problem of classification with a modified hinge loss function and absolute loss function, which provides the efficient computation and the sparsity. We also introduce the generalized cross validation function to select the hyper-parameters which affects the classification performance of the proposed method. Experimental results are then presented which illustrate the performance of the proposed procedure for classification.

  • PDF

CONSTRUCTIONS FOR SPARSE ROW-ORTHOGONAL MATRICES WITH A FULL ROW

  • Cheon, Gi-Sang;Park, Se-Won;Seol, Han-Guk
    • Journal of the Korean Mathematical Society
    • /
    • v.36 no.2
    • /
    • pp.333-344
    • /
    • 1999
  • In [4], it was shown that an n by n orthogonal matrix which has a row of nonzeros has at least ( log2n + 3)n - log2n +1 nonzero entries. In this paper, the matrices achieving these bounds are constructed. The analogous sparsity problem for m by n row-orthogonal matrices which have a row of nonzeros in conjectured.

  • PDF

A Penalized Principal Components using Probabilistic PCA

  • Park, Chong-Sun;Wang, Morgan
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2003.05a
    • /
    • pp.151-156
    • /
    • 2003
  • Variable selection algorithm for principal component analysis using penalized likelihood method is proposed. We will adopt a probabilistic principal component idea to utilize likelihood function for the problem and use HARD penalty function to force coefficients of any irrelevant variables for each component to zero. Consistency and sparsity of coefficient estimates will be provided with results of small simulated and illustrative real examples.

  • PDF

A SIMPLE CONSTRUCTION FOR THE SPARSE MATRICES WITH ORTHOGONAL ROWS

  • Cheon, Gi-Sang;Lee, Gwang-Yeon
    • Communications of the Korean Mathematical Society
    • /
    • v.15 no.4
    • /
    • pp.587-595
    • /
    • 2000
  • We contain a simple construction for the sparse n x n connected orthogonal matrices which have a row of p nonzero entries with 2$\leq$p$\leq$n. Moreover, we study the analogous sparsity problem for an m x n connected row-orthogonal matrices.

  • PDF