DOI QR코드

DOI QR Code

A Study On Recommend System Using Co-occurrence Matrix and Hadoop Distribution Processing

동시발생 행렬과 하둡 분산처리를 이용한 추천시스템에 관한 연구

  • 김창복 (가천대학교 에너지 IT학과) ;
  • 정재필 (가천대학교 전자공학과)
  • Received : 2014.09.05
  • Accepted : 2014.10.06
  • Published : 2014.10.30

Abstract

The recommend system is getting more difficult real time recommend by lager preference data set, computing power and recommend algorithm. For this reason, recommend system is proceeding actively one's studies toward distribute processing method of large preference data set. This paper studied distribute processing method of large preference data set using hadoop distribute processing platform and mahout machine learning library. The recommend algorithm is used Co-occurrence Matrix similar to item Collaborative Filtering. The Co-occurrence Matrix can do distribute processing by many node of hadoop cluster, and it needs many computation scale but can reduce computation scale by distribute processing. This paper has simplified distribute processing of co-occurrence matrix by changes over from four stage to three stage. As a result, this paper can reduce mapreduce job and can generate recommend file. And it has a fast processing speed, and reduce map output data.

추천시스템은 선호 데이터가 대형화, 컴퓨터 처리능력과 추천 알고리즘 등에 의해 실시간 추천이 어려워지고 있다. 이에 따라 추천시스템은 대형 선호데이터를 분산처리 하는 방법에 대한 연구가 활발히 진행되고 있다. 본 논문은 하둡 분산처리 플랫폼과 머하웃 기계학습 라이브러리를 이용하여, 선호데이터를 분산 처리하는 방법을 연구하였다. 추천 알고리즘은 아이템 협업필터링과 유사한 동시발생 행렬을 이용하였다. 동시발생 행렬은 하둡 클러스터의 여러 노드에서 분산처리를 할 수 있으며, 기본적으로 많은 계산량이 필요하지만, 분산처리과정에서 계산량을 줄일 수 있다. 또한, 본 논문은 동시발생 행렬처리의 분산 처리과정을 4 단계에서 3 단계로 단순화하였다. 결과로서, 맵리듀스 잡을 감소할 수 있으며, 동일한 추천 파일을 생성할 수 있었다. 또한, 하둡 의사 분산모드를 이용하여 데이터를 처리하였을 때 빠른 처리속도를 보였으며, 맵 출력 데이터가 감소되었다.

Keywords

References

  1. Y. S. Kim, "Research trend recommend service for personalization service," Korean Institute of industrial Engineers, ie magazine, Vol. 19, No 1, pp. 37-42, May 2012.
  2. R. V. Meteren, and M. V. Someren, "Using content-based filtering for recommendation," in Proceedings of ECML 2000 Workshop: Machine Learning in New Information Age, Barcelona: Spain, pp. 47-56, Mar. 2000.
  3. B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, "Item-based collaborative filtering recommendation algorithms," in Proceedings of the 10th International Conference on World Wide Web. ACM, Hong Kong, pp. 285-295, May 2001.
  4. G. Linden, B. Smith, and J. York. "Amazon. com recommendations: item-to-item collaborative filtering," Internet Computing, IEEE, Vol. 7, No. 1, pp. 76-80, Jan. 2003. https://doi.org/10.1109/MIC.2003.1167344
  5. J. H. Jung, Beginning Hadoop Programming, 2nd ed, Wikibooks, 2013.
  6. K. Shvachko, et al, "The hadoop distributed file system," in Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on IEEE, Incline Village: NV, pp. 1-10, May 2010.
  7. Y. Koren, R. Bell, and C. Volinsky, "Matrix factorization techniques for recommender systems," Computer, Vol 42, No 8, pp. 30-37, Aug. 2009.
  8. R. Gemulla, E. Nijkamp, P. J. Haas, and Y. Sismanis, "Large-scale matrix factorization with distributed stochastic gradient descent," in Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Diego: CA, pp. 69-77, Aug. 2011.
  9. S. Schelter, C. Boden, and V. Markl, "Scalable similarity-based neighborhood methods with mapreduce," in Proceedings of the Sixth ACM Conference on Recommender Systems, Dublin: Ireland, pp. 163-170, Sep. 2012.
  10. Z. D. Zhao, and M. S. Shang, "User-based collaborative-filtering recommendation algorithms on hadoop," in Knowledge Discovery and Data Mining, 2010. WKDD'10. Third International Conference on IEEE, Phuket: Thailand, pp. 478-481, Jan. 2010.
  11. F. Provost, T. Fawcett, Data Science for Business, California, CA: O'Reilly Media, 2013.
  12. S. Owen, R. Anil, T. Dunning and E. Friedman, Mahout in Action, New York, NY: Manning Publications, 2011.