• Title/Summary/Keyword: Data sparsity

Search Result 174, Processing Time 0.032 seconds

The cluster-indexing collaborative filtering recommendation

  • Park, Tae-Hyup;Ingoo Han
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2003.05a
    • /
    • pp.400-409
    • /
    • 2003
  • Collaborative filtering (CF) recommendation is a knowledge sharing technology for distribution of opinions and facilitating contacts in network society between people with similar interests. The main concerns of the CF algorithm are about prediction accuracy, speed of response time, problem of data sparsity, and scalability. In general, the efforts of improving prediction algorithms and lessening response time are decoupled. We propose a three-step CF recommendation model which is composed of profiling, inferring, and predicting steps while considering prediction accuracy and computing speed simultaneously. This model combines a CF algorithm with two machine learning processes, SOM (Self-Organizing Map) and CBR (Case Based Reasoning) by changing an unsupervised clustering problem into a supervised user preference reasoning problem, which is a novel approach for the CF recommendation field. This paper demonstrates the utility of the CF recommendation based on SOM cluster-indexing CBR with validation against control algorithms through an open dataset of user preference.

  • PDF

Hybrid Product Recommendation for e-Commerce : A Clustering-based CF Algorithm

  • Ahn, Do-Hyun;Kim, Jae-Sik;Kim, Jae-Kyeong;Cho, Yoon-Ho
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2003.05a
    • /
    • pp.416-425
    • /
    • 2003
  • Recommender systems are a personalized information filtering technology to help customers find the products they would like to purchase. Collaborative filtering (CF) has been known to be the most successful recommendation technology. However its widespread use in e-commerce has exposed two research issues, sparsity and scalability. In this paper, we propose several hybrid recommender procedures based on web usage mining, clustering techniques and collaborative filtering to address these issues. Experimental evaluation of suggested procedures on real e-commerce data shows interesting relation between characteristics of procedures and diverse situations.

  • PDF

Performance Improvement Using Clustering in Collaborative Filtering Recommendation Systems (군집 분석을 통한 Collaborative Filtering 기반의 추천시스템의 성능개선)

  • Woo, Hee-Sung;Suh, Yong-Moo
    • 한국IT서비스학회:학술대회논문집
    • /
    • 2003.11a
    • /
    • pp.223-232
    • /
    • 2003
  • 추천시스템을 설계하는 방법에는 크게 Content-Based Filtering 기법과 Collaborative Filtering 기법이 있다. 이 중 Collaborative Filtering 기법은 사용자가 아직 평가하지 못한 상품에 대한 예측값을 계산할 때, 나와 유사한 상품선호를 갖고 있는 사람들이 그 상품에 대해 평가한 점수를 활용하는 방법이다. 하지만 순수한 Collaborative Filtering 방법은 일반적으로 알려진 Data Sparsity의 문제, First Rater의 문제뿐만 아니라 예측값의 부정확성과 기하급수적 계산량의 증가로 실제구현이 어렵다는 문제점을 가지고 있다. 본 연구에서는 이러한 'Collaborative filtering' 시스템의 문제들 중 예측의 부정확성과 실제 구현의 어려움을 해결할 수 있는 방법으로 군집분석을 적용해 보았다. 특히 본 연구에서는 군집을 나눌 때, 실제 추천이 이루어지는 상품 도메인이 아닌, 그 상품도메인과 비슷한 선호의 기준을 가지고 선택하게 되는 '선택의 상관관계'가 높은 '이웃 상품도메인'에서 사용자들의 군집을 나누고 이를 실제 추천이 이루어지는 상품도메인에 적용하는 방식을 사용하였다.

  • PDF

Using User Rating Patterns for Selecting Neighbors in Collaborative Filtering

  • Lee, Soojung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.9
    • /
    • pp.77-82
    • /
    • 2019
  • Collaborative filtering is a popular technique for recommender systems and used in many practical commercial systems. Its basic principle is select similar neighbors of a current user and from their past preference information on items the system makes recommendations for the current user. One of the major problems inherent in this type of system is data sparsity of ratings. This is mainly caused from the underlying similarity measures which produce neighbors based on the ratings records. This paper handles this problem and suggests a new similarity measure. The proposed method takes users rating patterns into account for computing similarity, without just relying on the commonly rated items as in previous measures. Performance experiments of various existing measures are conducted and their performance is compared in terms of major performance metrics. As a result, the proposed measure reveals better or comparable achievements in all the metrics considered.

Tucker Modeling based Kronecker Constrained Block Sparse Algorithm

  • Zhang, Tingping;Fan, Shangang;Li, Yunyi;Gui, Guan;Ji, Yimu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.2
    • /
    • pp.657-667
    • /
    • 2019
  • This paper studies synthetic aperture radar (SAR) imaging problem which the scatterers are often distributed in block sparse pattern. To exploiting the sparse geometrical feature, a Kronecker constrained SAR imaging algorithm is proposed by combining the block sparse characteristics with the multiway sparse reconstruction framework with Tucker modeling. We validate the proposed algorithm via real data and it shows that the our algorithm can achieve better accuracy and convergence than the reference methods even in the demanding environment. Meanwhile, the complexity is smaller than that of the existing methods. The simulation experiments confirmed the effectiveness of the algorithm as well.

Massive MIMO Channel Estimation Algorithm Based on Weighted Compressed Sensing

  • Lv, Zhiguo;Wang, Weijing
    • Journal of Information Processing Systems
    • /
    • v.17 no.6
    • /
    • pp.1083-1096
    • /
    • 2021
  • Compressed sensing-based matching pursuit algorithms can estimate the sparse channel of massive multiple input multiple-output systems with short pilot sequences. Although they have the advantages of low computational complexity and low pilot overhead, their accuracy remains insufficient. Simply multiplying the weight value and the estimated channel obtained in different iterations can only improve the accuracy of channel estimation under conditions of low signal-to-noise ratio (SNR), whereas it degrades accuracy under conditions of high SNR. To address this issue, an improved weighted matching pursuit algorithm is proposed, which obtains a suitable weight value uop by training the channel data. The step of the weight value increasing with successive iterations is calculated according to the sparsity of the channel and uop. Adjusting the weight value adaptively over the iterations can further improve the accuracy of estimation. The results of simulations conducted to evaluate the proposed algorithm show that it exhibits improved performance in terms of accuracy compared to previous methods under conditions of both high and low SNR.

HiCORE: Hi-C Analysis for Identification of Core Chromatin Looping Regions with Higher Resolution

  • Lee, Hongwoo;Seo, Pil Joon
    • Molecules and Cells
    • /
    • v.44 no.12
    • /
    • pp.883-892
    • /
    • 2021
  • Genome-wide chromosome conformation capture (3C)-based high-throughput sequencing (Hi-C) has enabled identification of genome-wide chromatin loops. Because the Hi-C map with restriction fragment resolution is intrinsically associated with sparsity and stochastic noise, Hi-C data are usually binned at particular intervals; however, the binning method has limited reliability, especially at high resolution. Here, we describe a new method called HiCORE, which provides simple pipelines and algorithms to overcome the limitations of single-layered binning and predict core chromatin regions with three-dimensional physical interactions. In this approach, multiple layers of binning with slightly shifted genome coverage are generated, and interacting bins at each layer are integrated to infer narrower regions of chromatin interactions. HiCORE predicts chromatin looping regions with higher resolution, both in human and Arabidopsis genomes, and contributes to the identification of the precise positions of potential genomic elements in an unbiased manner.

Consideration upon Importance of Metadata Extraction for a Hyper-Personalized Recommender System on Unsupervised Learning (비지도 학습 기반 초개인화 추천 서비스를 위한 메타데이터 추출의 중요성 고찰)

  • Paik, Juryon;Ko, Kwang-Ho
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2022.01a
    • /
    • pp.19-22
    • /
    • 2022
  • 서비스 관점에서 구축되는 추천 시스템의 성능은 얼마나 효율적인 추천 모델을 적용하여 심층적으로 설계되었는가에 좌우된다고도 볼 수 있다. 특히, 추천 시스템의 초개인화는 세계적인 추세로 1~2년 전부터 구글, 아마존, 알리바바 등의 데이터 플랫폼 강자들이 경쟁적으로 딥 러닝 기반의 알고리즘을 개발, 자신들의 추천 서비스에 적용하고 있다. 본 연구는 갈수록 고도화되는 추천 시스템으로 인해 발생하는 여러 문제들 중 사용자 또는 서비스 정보가 부족하여 계속적으로 발생하고 있는 Cold-start 문제와 추천할 서비스와 사용자는 지속적으로 늘어나지만 실제로 사용자가 소비하게 되는 서비스의 비율은 현저하게 감소하는 데이터 희소성 문제 (Sparsity Problem)에 대한 솔루션을 모색하는 알고리즘 관점에서 연구하고자 한다. 본 논문은 첫 단계로, 적용하는 메타데이터에 따라 추천 결과의 정확성이 얼마나 차이가 나는지를 보이고 딥러닝 비지도학습 방식을 메타데이터 선정 및 추출에 적용하여 실시간으로 변화하는 소비자의 실제 생활 패턴 및 니즈를 예측해야 하는 필요성에 대해서 기술하고자 한다.

  • PDF

A Study of Pattern Defect Data Augmentation with Image Generation Model (이미지 생성 모델을 이용한 패턴 결함 데이터 증강에 대한 연구)

  • Byungjoon Kim;Yongduek Seo
    • Journal of the Korea Computer Graphics Society
    • /
    • v.29 no.3
    • /
    • pp.79-84
    • /
    • 2023
  • Image generation models have been applied in various fields to overcome data sparsity, time and cost issues. However, it has limitations in generating images from regular pattern images and detecting defects in such data. In this paper, we verified the feasibility of the image generation model to generate pattern images and applied it to data augmentation for defect detection of OLED panels. The data required to train an OLED defect detection model is difficult to obtain due to the high cost of OLED panels. Therefore, even if the data set is obtained, it is necessary to define and classify various defect types. This paper introduces an OLED panel defect data acquisition system that acquires a hypothetical data set and augments the data with an image generation model. In addition, the difficulty of generating pattern images in the diffusion model is identified and a possibility is proposed, and the limitations of data augmentation and defect detection data augmentation using the image generation model are improved.

Increasing Accuracy of Classifying Useful Reviews by Removing Neutral Terms (중립도 기반 선택적 단어 제거를 통한 유용 리뷰 분류 정확도 향상 방안)

  • Lee, Minsik;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.129-142
    • /
    • 2016
  • Customer product reviews have become one of the important factors for purchase decision makings. Customers believe that reviews written by others who have already had an experience with the product offer more reliable information than that provided by sellers. However, there are too many products and reviews, the advantage of e-commerce can be overwhelmed by increasing search costs. Reading all of the reviews to find out the pros and cons of a certain product can be exhausting. To help users find the most useful information about products without much difficulty, e-commerce companies try to provide various ways for customers to write and rate product reviews. To assist potential customers, online stores have devised various ways to provide useful customer reviews. Different methods have been developed to classify and recommend useful reviews to customers, primarily using feedback provided by customers about the helpfulness of reviews. Most shopping websites provide customer reviews and offer the following information: the average preference of a product, the number of customers who have participated in preference voting, and preference distribution. Most information on the helpfulness of product reviews is collected through a voting system. Amazon.com asks customers whether a review on a certain product is helpful, and it places the most helpful favorable and the most helpful critical review at the top of the list of product reviews. Some companies also predict the usefulness of a review based on certain attributes including length, author(s), and the words used, publishing only reviews that are likely to be useful. Text mining approaches have been used for classifying useful reviews in advance. To apply a text mining approach based on all reviews for a product, we need to build a term-document matrix. We have to extract all words from reviews and build a matrix with the number of occurrences of a term in a review. Since there are many reviews, the size of term-document matrix is so large. It caused difficulties to apply text mining algorithms with the large term-document matrix. Thus, researchers need to delete some terms in terms of sparsity since sparse words have little effects on classifications or predictions. The purpose of this study is to suggest a better way of building term-document matrix by deleting useless terms for review classification. In this study, we propose neutrality index to select words to be deleted. Many words still appear in both classifications - useful and not useful - and these words have little or negative effects on classification performances. Thus, we defined these words as neutral terms and deleted neutral terms which are appeared in both classifications similarly. After deleting sparse words, we selected words to be deleted in terms of neutrality. We tested our approach with Amazon.com's review data from five different product categories: Cellphones & Accessories, Movies & TV program, Automotive, CDs & Vinyl, Clothing, Shoes & Jewelry. We used reviews which got greater than four votes by users and 60% of the ratio of useful votes among total votes is the threshold to classify useful and not-useful reviews. We randomly selected 1,500 useful reviews and 1,500 not-useful reviews for each product category. And then we applied Information Gain and Support Vector Machine algorithms to classify the reviews and compared the classification performances in terms of precision, recall, and F-measure. Though the performances vary according to product categories and data sets, deleting terms with sparsity and neutrality showed the best performances in terms of F-measure for the two classification algorithms. However, deleting terms with sparsity only showed the best performances in terms of Recall for Information Gain and using all terms showed the best performances in terms of precision for SVM. Thus, it needs to be careful for selecting term deleting methods and classification algorithms based on data sets.