• Title/Summary/Keyword: data sparsity

Search Result 175, Processing Time 0.026 seconds

The Adaptive Personalization Method According to Users Purchasing Index : Application to Beverage Purchasing Predictions (고객별 구매빈도에 동적으로 적응하는 개인화 시스템 : 음료수 구매 예측에의 적용)

  • Park, Yoon-Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.95-108
    • /
    • 2011
  • TThis is a study of the personalization method that intelligently adapts the level of clustering considering purchasing index of a customer. In the e-biz era, many companies gather customers' demographic and transactional information such as age, gender, purchasing date and product category. They use this information to predict customer's preferences or purchasing patterns so that they can provide more customized services to their customers. The previous Customer-Segmentation method provides customized services for each customer group. This method clusters a whole customer set into different groups based on their similarity and builds predictive models for the resulting groups. Thus, it can manage the number of predictive models and also provide more data for the customers who do not have enough data to build a good predictive model by using the data of other similar customers. However, this method often fails to provide highly personalized services to each customer, which is especially important to VIP customers. Furthermore, it clusters the customers who already have a considerable amount of data as well as the customers who only have small amount of data, which causes to increase computational cost unnecessarily without significant performance improvement. The other conventional method called 1-to-1 method provides more customized services than the Customer-Segmentation method for each individual customer since the predictive model are built using only the data for the individual customer. This method not only provides highly personalized services but also builds a relatively simple and less costly model that satisfies with each customer. However, the 1-to-1 method has a limitation that it does not produce a good predictive model when a customer has only a few numbers of data. In other words, if a customer has insufficient number of transactional data then the performance rate of this method deteriorate. In order to overcome the limitations of these two conventional methods, we suggested the new method called Intelligent Customer Segmentation method that provides adaptive personalized services according to the customer's purchasing index. The suggested method clusters customers according to their purchasing index, so that the prediction for the less purchasing customers are based on the data in more intensively clustered groups, and for the VIP customers, who already have a considerable amount of data, clustered to a much lesser extent or not clustered at all. The main idea of this method is that applying clustering technique when the number of transactional data of the target customer is less than the predefined criterion data size. In order to find this criterion number, we suggest the algorithm called sliding window correlation analysis in this study. The algorithm purposes to find the transactional data size that the performance of the 1-to-1 method is radically decreased due to the data sparity. After finding this criterion data size, we apply the conventional 1-to-1 method for the customers who have more data than the criterion and apply clustering technique who have less than this amount until they can use at least the predefined criterion amount of data for model building processes. We apply the two conventional methods and the newly suggested method to Neilsen's beverage purchasing data to predict the purchasing amounts of the customers and the purchasing categories. We use two data mining techniques (Support Vector Machine and Linear Regression) and two types of performance measures (MAE and RMSE) in order to predict two dependent variables as aforementioned. The results show that the suggested Intelligent Customer Segmentation method can outperform the conventional 1-to-1 method in many cases and produces the same level of performances compare with the Customer-Segmentation method spending much less computational cost.

T-Commerce Sale Prediction Using Deep Learning and Statistical Model (딥러닝과 통계 모델을 이용한 T-커머스 매출 예측)

  • Kim, Injung;Na, Kihyun;Yang, Sohee;Jang, Jaemin;Kim, Yunjong;Shin, Wonyoung;Kim, Deokjung
    • Journal of KIISE
    • /
    • v.44 no.8
    • /
    • pp.803-812
    • /
    • 2017
  • T-commerce is technology-fusion service on which the user can purchase using data broadcasting technology based on bi-directional digital TVs. To achieve the best revenue under a limited environment in regard to the channel number and the variety of sales goods, organizing broadcast programs to maximize the expected sales considering the selling power of each product at each time slot. For this, this paper proposes a method to predict the sales of goods when it is assigned to each time slot. The proposed method predicts the sales of product at a time slot given the week-in-year and weather of the target day. Additionally, it combines a statistical predict model applying SVD (Singular Value Decomposition) to mitigate the sparsity problem caused by the bias in sales record. In experiments on the sales data of W-shopping, a T-commerce company, the proposed method showed NMAE (Normalized Mean Absolute Error) of 0.12 between the prediction and the actual sales, which confirms the effectiveness of the proposed method. The proposed method is practically applied to the T-commerce system of W-shopping and used for broadcasting organization.

Clustering Analysis by Customer Feature based on SOM for Predicting Purchase Pattern in Recommendation System (추천시스템에서 구매 패턴 예측을 위한 SOM기반 고객 특성에 의한 군집 분석)

  • Cho, Young Sung;Moon, Song Chul;Ryu, Keun Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.2
    • /
    • pp.193-200
    • /
    • 2014
  • Due to the advent of ubiquitous computing environment, it is becoming a part of our common life style. And tremendous information is cumulated rapidly. In these trends, it is becoming a very important technology to find out exact information in a large data to present users. Collaborative filtering is the method based on other users' preferences, can not only reflect exact attributes of user but also still has the problem of sparsity and scalability, though it has been practically used to improve these defects. In this paper, we propose clustering method by user's features based on SOM for predicting purchase pattern in u-Commerce. it is necessary for us to make the cluster with similarity by user's features to be able to reflect attributes of the customer information in order to find the items with same propensity in the cluster rapidly. The proposed makes the task of clustering to apply the variable of featured vector for the user's information and RFM factors based on purchase history data. To verify improved performance of proposing system, we make experiments with dataset collected in a cosmetic internet shopping mall.

A Dynamic Recommendation System Using User Log Analysis and Document Similarity in Clusters (사용자 로그 분석과 클러스터 내의 문서 유사도를 이용한 동적 추천 시스템)

  • 김진수;김태용;최준혁;임기욱;이정현
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.5
    • /
    • pp.586-594
    • /
    • 2004
  • Because web documents become creation and disappearance rapidly, users require the recommend system that offers users to browse the web document conveniently and correctly. One largely untapped source of knowledge about large data collections is contained in the cumulative experiences of individuals finding useful information in the collection. Recommendation systems attempt to extract such useful information by capturing and mining one or more measures of the usefulness of the data. The existing Information Filtering system has the shortcoming that it must have user's profile. And Collaborative Filtering system has the shortcoming that users have to rate each web document first and in high-quantity, low-quality environments, users may cover only a tiny percentage of documents available. And dynamic recommendation system using the user browsing pattern also provides users with unrelated web documents. This paper classifies these web documents using the similarity between the web documents under the web document type and extracts the user browsing sequential pattern DB using the users' session information based on the web server log file. When user approaches the web document, the proposed Dynamic recommendation system recommends Top N-associated web documents set that has high similarity between current web document and other web documents and recommends set that has sequential specificity using the extracted informations and users' session information.

Extracting Typical Group Preferences through User-Item Optimization and User Profiles in Collaborative Filtering System (사용자-상품 행렬의 최적화와 협력적 사용자 프로파일을 이용한 그룹의 대표 선호도 추출)

  • Ko Su-Jeong
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.7
    • /
    • pp.581-591
    • /
    • 2005
  • Collaborative filtering systems have problems involving sparsity and the provision of recommendations by making correlations between only two users' preferences. These systems recommend items based only on the preferences without taking in to account the contents of the items. As a result, the accuracy of recommendations depends on the data from user-rated items. When users rate items, it can be expected that not all users ran do so earnestly. This brings down the accuracy of recommendations. This paper proposes a collaborative recommendation method for extracting typical group preferences using user-item matrix optimization and user profiles in collaborative tittering systems. The method excludes unproven users by using entropy based on data from user-rated items and groups users into clusters after generating user profiles, and then extracts typical group preferences. The proposed method generates collaborative user profiles by using association word mining to reflect contents as well as preferences of items and groups users into clusters based on the profiles by using the vector space model and the K-means algorithm. To compensate for the shortcoming of providing recommendations using correlations between only two user preferences, the proposed method extracts typical preferences of groups using the entropy theory The typical preferences are extracted by combining user entropies with item preferences. The recommender system using typical group preferences solves the problem caused by recommendations based on preferences rated incorrectly by users and reduces time for retrieving the most similar users in groups.

GEase-K: Linear and Nonlinear Autoencoder-based Recommender System with Side Information (GEase-K: 부가 정보를 활용한 선형 및 비선형 오토인코더 기반의 추천시스템)

  • Taebeom Lee;Seung-hak Lee;Min-jeong Ma;Yoonho Cho
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.167-183
    • /
    • 2023
  • In the recent field of recommendation systems, various studies have been conducted to model sparse data effectively. Among these, GLocal-K(Global and Local Kernels for Recommender Systems) is a research endeavor combining global and local kernels to provide personalized recommendations by considering global data patterns and individual user characteristics. However, due to its utilization of kernel tricks, GLocal-K exhibits diminished performance on highly sparse data and struggles to offer recommendations for new users or items due to the absence of side information. In this paper, to address these limitations of GLocal-K, we propose the GEase-K (Global and EASE kernels for Recommender Systems) model, incorporating the EASE(Embarrassingly Shallow Autoencoders for Sparse Data) model and leveraging side information. Initially, we substitute EASE for the local kernel in GLocal-K to enhance recommendation performance on highly sparse data. EASE, functioning as a simple linear operational structure, is an autoencoder that performs highly on extremely sparse data through regularization and learning item similarity. Additionally, we utilize side information to alleviate the cold-start problem. We enhance the understanding of user-item similarities by employing a conditional autoencoder structure during the training process to incorporate side information. In conclusion, GEase-K demonstrates resilience in highly sparse data and cold-start situations by combining linear and nonlinear structures and utilizing side information. Experimental results show that GEase-K outperforms GLocal-K based on the RMSE and MAE metrics on the highly sparse GoodReads and ModCloth datasets. Furthermore, in cold-start experiments divided into four groups using the GoodReads and ModCloth datasets, GEase-K denotes superior performance compared to GLocal-K.

Social Network Analysis for the Effective Adoption of Recommender Systems (추천시스템의 효과적 도입을 위한 소셜네트워크 분석)

  • Park, Jong-Hak;Cho, Yoon-Ho
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.305-316
    • /
    • 2011
  • Recommender system is the system which, by using automated information filtering technology, recommends products or services to the customers who are likely to be interested in. Those systems are widely used in many different Web retailers such as Amazon.com, Netfix.com, and CDNow.com. Various recommender systems have been developed. Among them, Collaborative Filtering (CF) has been known as the most successful and commonly used approach. CF identifies customers whose tastes are similar to those of a given customer, and recommends items those customers have liked in the past. Numerous CF algorithms have been developed to increase the performance of recommender systems. However, the relative performances of CF algorithms are known to be domain and data dependent. It is very time-consuming and expensive to implement and launce a CF recommender system, and also the system unsuited for the given domain provides customers with poor quality recommendations that make them easily annoyed. Therefore, predicting in advance whether the performance of CF recommender system is acceptable or not is practically important and needed. In this study, we propose a decision making guideline which helps decide whether CF is adoptable for a given application with certain transaction data characteristics. Several previous studies reported that sparsity, gray sheep, cold-start, coverage, and serendipity could affect the performance of CF, but the theoretical and empirical justification of such factors is lacking. Recently there are many studies paying attention to Social Network Analysis (SNA) as a method to analyze social relationships among people. SNA is a method to measure and visualize the linkage structure and status focusing on interaction among objects within communication group. CF analyzes the similarity among previous ratings or purchases of each customer, finds the relationships among the customers who have similarities, and then uses the relationships for recommendations. Thus CF can be modeled as a social network in which customers are nodes and purchase relationships between customers are links. Under the assumption that SNA could facilitate an exploration of the topological properties of the network structure that are implicit in transaction data for CF recommendations, we focus on density, clustering coefficient, and centralization which are ones of the most commonly used measures to capture topological properties of the social network structure. While network density, expressed as a proportion of the maximum possible number of links, captures the density of the whole network, the clustering coefficient captures the degree to which the overall network contains localized pockets of dense connectivity. Centralization reflects the extent to which connections are concentrated in a small number of nodes rather than distributed equally among all nodes. We explore how these SNA measures affect the performance of CF performance and how they interact to each other. Our experiments used sales transaction data from H department store, one of the well?known department stores in Korea. Total 396 data set were sampled to construct various types of social networks. The dependant variable measuring process consists of three steps; analysis of customer similarities, construction of a social network, and analysis of social network patterns. We used UCINET 6.0 for SNA. The experiments conducted the 3-way ANOVA which employs three SNA measures as dependant variables, and the recommendation accuracy measured by F1-measure as an independent variable. The experiments report that 1) each of three SNA measures affects the recommendation accuracy, 2) the density's effect to the performance overrides those of clustering coefficient and centralization (i.e., CF adoption is not a good decision if the density is low), and 3) however though the density is low, the performance of CF is comparatively good when the clustering coefficient is low. We expect that these experiment results help firms decide whether CF recommender system is adoptable for their business domain with certain transaction data characteristics.

Default Voting using User Coefficient of Variance in Collaborative Filtering System (협력적 여과 시스템에서 사용자 변동 계수를 이용한 기본 평가간 예측)

  • Ko, Su-Jeong
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.11
    • /
    • pp.1111-1120
    • /
    • 2005
  • In collaborative filtering systems most users do not rate preferences; so User-Item matrix shows great sparsity because it has missing values for items not rated by users. Generally, the systems predict the preferences of an active user based on the preferences of a group of users. However, default voting methods predict all missing values for all users in User-Item matrix. One of the most common methods predicting default voting values tried two different approaches using the average rating for a user or using the average rating for an item. However, there is a problem that they did not consider the characteristics of items, users, and the distribution of data set. We replace the missing values in the User-Item matrix by the default noting method using user coefficient of variance. We select the threshold of user coefficient of variance by using equations automatically and determine when to shift between the user averages and item averages according to the threshold. However, there are not always regular relations between the averages and the thresholds of user coefficient of variances in datasets. It is caused that the distribution information of user coefficient of variances in datasets affects the threshold of user coefficient of variance as well as their average. We decide the threshold of user coefficient of valiance by combining them. We evaluate our method on MovieLens dataset of user ratings for movies and show that it outperforms previously default voting methods.

Development of Personalized Recommendation System using RFM method and k-means Clustering (RFM기법과 k-means 기법을 이용한 개인화 추천시스템의 개발)

  • Cho, Young-Sung;Gu, Mi-Sug;Ryu, Keun-Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.6
    • /
    • pp.163-172
    • /
    • 2012
  • Collaborative filtering which is used explicit method in a existing recommedation system, can not only reflect exact attributes of item but also still has the problem of sparsity and scalability, though it has been practically used to improve these defects. This paper proposes the personalized recommendation system using RFM method and k-means clustering in u-commerce which is required by real time accessablity and agility. In this paper, using a implicit method which is is not used complicated query processing of the request and the response for rating, it is necessary for us to keep the analysis of RFM method and k-means clustering to be able to reflect attributes of the item in order to find the items with high purchasablity. The proposed makes the task of clustering to apply the variable of featured vector for the customer's information and calculating of the preference by each item category based on purchase history data, is able to recommend the items with efficiency. To estimate the performance, the proposed system is compared with existing system. As a result, it can be improved and evaluated according to the criteria of logicality through the experiment with dataset, collected in a cosmetic internet shopping mall.

An Item-based Collaborative Filtering Technique by Associative Relation Clustering in Personalized Recommender Systems (개인화 추천 시스템에서 연관 관계 군집에 의한 아이템 기반의 협력적 필터링 기술)

  • 정경용;김진현;정헌만;이정현
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.4
    • /
    • pp.467-477
    • /
    • 2004
  • While recommender systems were used by a few E-commerce sites former days, they are now becoming serious business tools that are re-shaping the world of I-commerce. And collaborative filtering has been a very successful recommendation technique in both research and practice. But there are two problems in personalized recommender systems, it is First-Rating problem and Sparsity problem. In this paper, we solve these problems using the associative relation clustering and “Lift” of association rules. We produce “Lift” between items using user's rating data. And we apply Threshold by -cut to the association between items. To make an efficiency of associative relation cluster higher, we use not only the existing Hypergraph Clique Clustering algorithm but also the suggested Split Cluster method. If the cluster is completed, we calculate a similarity iten in each inner cluster. And the index is saved in the database for the fast access. We apply the creating index to predict the preference for new items. To estimate the Performance, the suggested method is compared with existing collaborative filtering techniques. As a result, the proposed method is efficient for improving the accuracy of prediction through solving problems of existing collaborative filtering techniques.