Browse > Article

Pre-Evaluation for Prediction Accuracy by Using the Customer's Ratings in Collaborative Filtering  

Lee, Seok-Jun (상지대학교 경상대학 경영학과)
Kim, Sun-Ok (한라대학교 정보통신공학부)
Publication Information
Asia pacific journal of information systems / v.17, no.4, 2007 , pp. 187-206 More about this Journal
Abstract
The development of computer and information technology has been combined with the information superhighway internet infrastructure, so information widely spreads not only in special fields but also in the daily lives of people. Information ubiquity influences the traditional way of transaction, and leads a new E-commerce which distinguishes from the existing E-commerce. Not only goods as physical but also service as non-physical come into E-commerce. As the scale of E-Commerce is being enlarged as well. It keeps people from finding information they want. Recommender systems are now becoming the main tools for E-Commerce to mitigate the information overload. Recommender systems can be defined as systems for suggesting some Items(goods or service) considering customers' interests or tastes. They are being used by E-commerce web sites to suggest products to their customers who want to find something for them and to provide them with information to help them decide which to purchase. There are several approaches of recommending goods to customer in recommender system but in this study, the main subject is focused on collaborative filtering technique. This study presents a possibility of pre-evaluation for the prediction performance of customer's preference in collaborative filtering before the process of customer's preference prediction. Pre-evaluation for the prediction performance of each customer having low performance is classified by using the statistical features of ratings rated by each customer is conducted before the prediction process. In this study, MovieLens 100K dataset is used to analyze the accuracy of classification. The classification criteria are set by using the training sets divided 80% from the 100K dataset. In the process of classification, the customers are divided into two groups, classified group and non classified group. To compare the prediction performance of classified group and non classified group, the prediction process runs the 20% test set through the Neighborhood Based Collaborative Filtering Algorithm and Correspondence Mean Algorithm. The prediction errors from those prediction algorithm are allocated to each customer and compared with each user's error. Research hypothesis : Two research hypotheses are formulated in this study to test the accuracy of the classification criterion as follows. Hypothesis 1: The estimation accuracy of groups classified according to the standard deviation of each user's ratings has significant difference. To test the Hypothesis 1, the standard deviation is calculated for each user in training set which is divided 80% from MovieLens 100K dataset. Four groups are classified according to the quartile of the each user's standard deviations. It is compared to test the estimation errors of each group which results from test set are significantly different. Hypothesis 2: The estimation accuracy of groups that are classified according to the distribution of each user's ratings have significant differences. To test the Hypothesis 2, the distributions of each user's ratings are compared with the distribution of ratings of all customers in training set which is divided 80% from MovieLens 100K dataset. It assumes that the customers whose ratings' distribution are different from that of all customers would have low performance, so six types of different distributions are set to be compared. The test groups are classified into fit group or non-fit group according to the each type of different distribution assumed. The degrees in accordance with each type of distribution and each customer's distributions are tested by the test of ${\chi}^2$ goodness-of-fit and classified two groups for testing the difference of the mean of errors. Also, the degree of goodness-of-fit with the distribution of each user's ratings and the average distribution of the ratings in the training set are closely related to the prediction errors from those prediction algorithms. Through this study, the customers who have lower performance of prediction than the rest in the system are classified by those two criteria, which are set by statistical features of customers ratings in the training set, before the prediction process.
Keywords
Recommender System; Collaborative Filtering; Pre-evaluation for prediction; Pre-information;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 김경재, 김병국, "데이터 마이닝을 이용한 인터넷 쇼핑몰 상품추천시스템," 한국지능정보시스템학회논문지, 제11권, 제1호, 2005, pp. 191-205
2 김용수, "비정형화된 속성의 학습을 통한 자동화된 내용 기반 필터링 기법의 개발," Journal of the Korean Data Analysis Society, Vol. 8, No. 4, 2006, pp. 1615-1624
3 김종우, 배세진, 이홍주, "협업 필터링 기반 개인화 추천에서의 평가자료의 희소 정도의 영향," 경영정보학연구, Vol. 14, No. 2, 2004, pp. 131-149
4 김재경, 안도현, 조윤호, "Development of a Personalized Recommendation Procedure Based on Data Mining Techniques for Internet Shopping Malls," 한국지능정보시스템학회논문지, 제9권, 제3호, 2003, pp. 177-191
5 심장섭, "K-means 군집화와 순차 패턴 기법을 사용하는 VLDB 기반의 추천 시스템 설계," 충북대학교, 박사학위논문, 2005
6 Herlocker, J., J. Konstan, J. Riedl, "An Empirical Analysis of Design Choices in Neighborhood Based Collaborative Filtering Algorithms," Information Retrieval, Vol. 5, No. 4, 2002, pp. 287-310   DOI   ScienceOn
7 Lee, S.J., S.O. Kim, H.C. Lee, " A Study on the Interrelationship between the Pre- diction Error and the Rating's Pattern in Collaborative Filtering," Journal of the Korean Data & Information Science Society, Vol. 18, No. 3, 2007b, pp. 659-668
8 Shardanand, U. and P. Maes, "Social Information Filtering: Algorithms for Automating 'Word of Mouth'," In Proceedings of ACM CHI'95 Conference on Human Factors in Computing Systems, 1995, pp. 210-217
9 Adomavicius, G., A. Tuzhilin, "Toward the Next Generation of Recommender Sys- tems: A Survey of the State-of-the-Art and Possible Extensions," IEEE Transactions on Knowledge and DATA Engineering, Vol. 17, No. 6, 2005, pp. 734-749   DOI   ScienceOn
10 Claypool, M., A. Gokhale, T. Miranda, P. Murnikov, D. Netes and M. Sartin, "Combining content-based and collaborative filters in an online newspaper," In Proceedings of ACM SIGIR Workshop on Recommender Systems: Algori- thms and Evaluation, University of California, Berkeley, Aug. 1999
11 Deshpande, M., G. Karypis, "Item-based top-N recommendation algorithms," ACM Transactions on Information Systems, Vol. 22, No. 1, 2004, pp. 143-177   DOI   ScienceOn
12 Hill, W.L., S.M. Rosenstein, G. Furnas, "Recommending and Evaluating Choices in A Virtual Community of use," In Pro- ceedings of the SIGCHI conference on Human factors in computing systems, 1995, pp. 194-201
13 이석준, 이희춘, "협업 필터링 추천에서 대응평균 알고리즘의 예측 성능에 관한 연구," Information Systems Review, 제9권, 제1호, 2007, pp. 85-103   과학기술학회마을
14 손재봉, 서용무, "협업 필터링 시스템에서 Degree of Match를 이용한 성능향상," Information Systems Review, 제8권, 제3호, 2006, pp. 139-154   과학기술학회마을
15 Sarwar, B.M., J. Konstan, A. Borchers, J. Herlocker, B. Miller, J. Riedl, "Using Filtering Agents to Improve Prediction Quality in the GroupLens Research Collaborative Filtering System," In Proceedings of the 1998 Conference on Computer Supported Cooperative Work, Nov. 1998
16 Lee, S.J., S.O. Kim, H.C. Lee, "Pre-Evalua tion for Detecting Abnormal Users in Re- commender System," Journal of the Korean Data & Information Science Society, Vol. 18, No. 3, 2007a, pp. 619-628
17 이석준, 김선옥, 이희춘, "추천시스템에서 Run 특이자가 예측 정확도에 미치는 영향에 관한 연구," 한국인터넷정보학회 추계학술발표대회, 2007a, pp. 299-302.
18 Kim, T.H. and S. B. Yang, "An Improved Neighbor Selection Algorithm in Collaborative Filtering," IEICE TRANS. INF. & SYST., Vol. E88-D, No. 5, 2005, pp. 1072-1076   DOI
19 이희춘, 이석준, "대응평균 알고리즘을 이용한 협력적 필터링 추천시스템의 성능향상," 한국경영정보학회 2006 추계컨퍼런스, 2006, pp. 208-214
20 이석준, 김선옥, 이희춘, "협력적 필터링에서 평가치의 Run 특이자와 예측 정확도의 관계에 관한 연구," Journal of the Korean Data Society, Vol. 9, No. 4, 2007b, pp. 2043-2054
21 Burke, R., B. Mobasher, R. Bhaumik, C. Williams, "Segment-Based Injection Attacks against Collaborative Filtering Recommender Systems," In Proceedings of the Fifth IEEE International Conference on Data Mining, 2005, pp. 577-580
22 O'Mahony, M.P., N.J. Hurley, G.C. M. Silvestre, "Detecting noise in recommender system databases," In Proceedings of the 11th international conference on Intelligent user interfaces, 2006, pp. 109-115
23 한국인터넷진흥원, "한국인터넷백서 2007," 한국인터넷진흥원, 2007
24 Resnick, P., N. Iacovou, M. Suchak, P. Bergstorm, J. Riedl, "GroupLens: An Open Architecture for Collaborative Filtering of Netnews," In Proceedings of ACM 1994 Conference on Computer Supported Cooperative Work, 1994, pp. 175-186
25 Huseyin P. and D. Wenliang, "SVD-based Collaborative Filtering with Privacy," In Proceedings of the 2005 ACM symposium on Applied computing, 2005, pp. 791-795
26 Breese, J.S., D. Heckerman, C. Kadie, "Em- pirical Analysis of Predictive Algorithms for Collaborative Filtering," In Proceedings of the Fourteenth Annual Conference on Un- certainty in Artificial Intelligence, 1998, pp. 43-52, Madison, Wisconsin
27 Balabanovic, M., Y. Shoham, "Fab: con- tentbased, collaborative recommendation," Communications of the ACM, Vol. 40, Issue 3, 1997, pp. 66-72