Browse > Article
http://dx.doi.org/10.12673/jant.2014.18.5.468

A Study On Recommend System Using Co-occurrence Matrix and Hadoop Distribution Processing  

Kim, Chang-Bok (Department of Energy IT, Gachon University)
Chung, Jae-Pil (Department of Electronic Engineering, Gachon University)
Abstract
The recommend system is getting more difficult real time recommend by lager preference data set, computing power and recommend algorithm. For this reason, recommend system is proceeding actively one's studies toward distribute processing method of large preference data set. This paper studied distribute processing method of large preference data set using hadoop distribute processing platform and mahout machine learning library. The recommend algorithm is used Co-occurrence Matrix similar to item Collaborative Filtering. The Co-occurrence Matrix can do distribute processing by many node of hadoop cluster, and it needs many computation scale but can reduce computation scale by distribute processing. This paper has simplified distribute processing of co-occurrence matrix by changes over from four stage to three stage. As a result, this paper can reduce mapreduce job and can generate recommend file. And it has a fast processing speed, and reduce map output data.
Keywords
Co-occurrence matrix; Collaborative filtering; Hadoop; Matrix factorization; Recommender system;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 F. Provost, T. Fawcett, Data Science for Business, California, CA: O'Reilly Media, 2013.
2 Y. S. Kim, "Research trend recommend service for personalization service," Korean Institute of industrial Engineers, ie magazine, Vol. 19, No 1, pp. 37-42, May 2012.
3 R. V. Meteren, and M. V. Someren, "Using content-based filtering for recommendation," in Proceedings of ECML 2000 Workshop: Machine Learning in New Information Age, Barcelona: Spain, pp. 47-56, Mar. 2000.
4 G. Linden, B. Smith, and J. York. "Amazon. com recommendations: item-to-item collaborative filtering," Internet Computing, IEEE, Vol. 7, No. 1, pp. 76-80, Jan. 2003.   DOI   ScienceOn
5 J. H. Jung, Beginning Hadoop Programming, 2nd ed, Wikibooks, 2013.
6 K. Shvachko, et al, "The hadoop distributed file system," in Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on IEEE, Incline Village: NV, pp. 1-10, May 2010.
7 Y. Koren, R. Bell, and C. Volinsky, "Matrix factorization techniques for recommender systems," Computer, Vol 42, No 8, pp. 30-37, Aug. 2009.
8 R. Gemulla, E. Nijkamp, P. J. Haas, and Y. Sismanis, "Large-scale matrix factorization with distributed stochastic gradient descent," in Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Diego: CA, pp. 69-77, Aug. 2011.
9 S. Owen, R. Anil, T. Dunning and E. Friedman, Mahout in Action, New York, NY: Manning Publications, 2011.
10 S. Schelter, C. Boden, and V. Markl, "Scalable similarity-based neighborhood methods with mapreduce," in Proceedings of the Sixth ACM Conference on Recommender Systems, Dublin: Ireland, pp. 163-170, Sep. 2012.
11 Z. D. Zhao, and M. S. Shang, "User-based collaborative-filtering recommendation algorithms on hadoop," in Knowledge Discovery and Data Mining, 2010. WKDD'10. Third International Conference on IEEE, Phuket: Thailand, pp. 478-481, Jan. 2010.
12 B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, "Item-based collaborative filtering recommendation algorithms," in Proceedings of the 10th International Conference on World Wide Web. ACM, Hong Kong, pp. 285-295, May 2001.