Implementation of big web logs analyzer in estimating preferences for web contents

웹 컨텐츠 선호도 측정을 위한 대용량 웹로그 분석기 구현

  • 최은정 (서울여자대학교 교양교육부) ;
  • 김명주 (서울여자대학교 정보미디어대학)
  • Published : 2012.12.30

Abstract

With the rapid growth of internet infrastructure, World Wide Web is evolving recently into various services such as cloud computing, social network services. It simply go beyond the sharing of information. It started to provide new services such as E-business, remote control or management, providing virtual services, and recently it is evolving into new services such as cloud computing and social network services. These kinds of communications through World Wide Web have been interested in and have developed user-centric customized services rather than providing provider-centric informations. In these environments, it is very important to check and analyze the user requests to a website. Especially, estimating user preferences is most important. For these reasons, analyzing web logs is being done, however, it has limitations that the most of data to analyze are based on page unit statistics. Therefore, it is not enough to evaluate user preferences only by statistics of specific page. Because recent main contents of web page design are being made of media files such as image files, and of dynamic pages utilizing the techniques of CSS, Div, iFrame etc. In this paper, large log analyzer was designed and executed to analyze web server log to estimate web contents preferences of users. With mapreduce which is based on Hadoop, large logs were analyzed and web contents preferences of media files such as image files, sounds and videos were estimated.

Keywords

References

  1. 성낙일, 김민창, 서성우, "우리나라 기업의 e-비즈니스 시스템 도입현황과 성과," Journal of Information Technology Applications & Management 18(4). 2011, pp. 55-79.
  2. "2012년 2/4분기 전자상거래 및 사이버쇼핑 동향," 보도자료, 통계청, 2012. 8. 24.
  3. 임진원, "마케팅 실전, 웹 로그분석 제대로 알기," 벤처다이제스트 = Venture DIGEST, vol. 8, 2005. 5, p. 31.
  4. X. Huang, F. Peng, A. An, D. Schuurmans, "Dynamic Web log session identification with statistical language models," Journal of the American Society for Information Science and Technology, 55 (14), 2004, pp. 1290-1303. https://doi.org/10.1002/asi.20084
  5. Baglioni M., Ferrara U., Romei A., Ruggieri S., and Turini F.,. "Preprocessing and Mining Web Log Data for Web Personalization," In Proceedings of the 8th Italian Conference on Artificial Intelligence, LNCS Vol. 2829, 2003, pp. 237-249.
  6. 김영철, 강명구, 김기수, "웹로그 데이터 분석을 위한 마케팅 패러다임 연구," 전자상거래학회지, vol. 10, 2009, pp. 3-21.
  7. http://httpd.apache.org/docs/2.4/en/logs.html
  8. http://www.webalizer.org/
  9. 김영철, 강명구, 김기수, "웹로그 분석방식의 시스템 확장성 연구," 전자상거래학회지, vol. 9, 2008, pp. 3-19.
  10. 백주현, 김태영, 이영수, 손현석, "e비즈니스 환경에서 eCRM을 이용한 보험회사의 웹로그 분석과 마케팅 전략 수립 사례 연구," 한국산업경영시스템학회 학술대회, 2008, pp. 1-6.
  11. 이주일, 백경민, 신주한, 이원석, "웹로그 분석을 위한 데이터 웨어하우스 시스템 구축," 한국IT서비스학회 학술대회 논문집, 2010, pp. 291-295.
  12. 오재훈, 김재훈, 김종우, "웹 로그분석을 이용한 실시간 온라인 마케팅 시스템 설계 및 개발에 관한 연구," 한국전자거래학회지, 제16권, 제3호, 2011, pp. 249-261.
  13. 김석훈, 김은수, "웹로그 마이닝을 이용한 개인화 광고 서비스 기법," 컴퓨터교육학회 논문지, vol. 8, 2005, pp. 117-127.
  14. Global Agenda Council on Emerging Technologies, "The top 10 emerging technologies for 2012," World Economic Forum, 2012. 3. 15.
  15. http://www.gartner.com/technology/research/ top -10-technology-trends/
  16. White, Tom., "Hadoop: The Definitive Guide," O'Reilly Media., 2012. 5. 10.
  17. John Gantz & David Reinsel, "Extracting Value from Chaos," IDC IVIEW, 2011. 6., p. 6.
  18. http://hadoop.apache.org/
  19. Chuck Lam, "Hadoop IN ACTION," Manning Pubn, 2010. 11.01.
  20. J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," Proceedings of OSDI 04: 6th Symposium on Operating System Design and Implemention, San Francisco, CA, 2004. 12.