DOI QR코드

DOI QR Code

Automatic Classification of Advertising Restaurant Blogs Using Machine Learning Techniques

기계학습기법을 이용한 광고 외식 블로그의 자동분류

  • 장재영 (한성대학교 컴퓨터공학과) ;
  • 이병준 (한성대학교 컴퓨터공학과) ;
  • 조세진 (한성대학교 컴퓨터공학과) ;
  • 한다혜 (한성대학교 컴퓨터공학과) ;
  • 이규홍 (한성대학교 컴퓨터공학과)
  • Received : 2016.02.18
  • Accepted : 2016.04.08
  • Published : 2016.04.30

Abstract

Recently, users choosing a restaurant basedon information provided by blogs are increasing significantly. However, those of most blogs are unreliable since domestic restaurant blogs are occupied by advertising postings written by 'power bloggers'. Thus, in order to ensure the reliability of blogs, it is necessary to filter the advertising blogs which are sometimes false or exaggerated. In this paper, we propose the method of distinguishing the advertising blogs utilizing an automatic classification technique. In the proposed technique, we first manually collected advertising restaurant blogs, and then analyzed features which are commonly found in those blogs. Using the extracted features, we determined whether a given blog is advertising one applying automatic classification algorithms. Additionally, we select the features and the algorithm which guarantee optimal classification performance through comparative experiments.

최근 들어 블로그가 제공하는 정보를 활용하여 외식업소를 선택하는 사용자가 크게 늘고 있다. 그러나 국내의 외식관련 블로그들은 파워 블로거에 의한 광고 블로그들이 다수를 차지하고 있어 신뢰성을 잃은 지 오래다. 따라서 블로그의 신뢰성을 확보하기 위해서는 허위 또는 과장되게 작성된 광고 블로그들을 필터링하는 기술이 필수적이다. 본 논문에서는 자동분류 기술을 이용하여 광고 블로그들을 판별하는 기법을 제안한다. 제안된 기법에서는 우선 외식 블로그들 중에서 광고 블로그로 판명된 블로그들을 수집하고 이들에 공통적으로 나타나는 특징들을 분석하였다. 이렇게 추출된 특징들을 이용하여 데이터 마이닝의 자동 분류 알고리즘을 적용하여 광고 블로그 여부를 판단하였다. 또한 다양한 실험을 통해 최적의 알고리즘과 특징들을 선별하였다.

Keywords

References

  1. J. Kim and Y. Kim, How the characteristics of the food-blog marketing effect to purchasing intension with the mediation effect of trust, tourism review, Vol. 30, No. 5, pp. 85-105, 2015.
  2. J. Kim, H. Kim, S. Park, Study on Blog users' Response to Blog Marketing, information Systems Review, Vol. 11, No. 3, pp.1-17, 2009.
  3. E. Blanzieri and A. Bryl, A survey of learning-based techniques of email spam filtering, Artificial Intelligence Review, vol. 29, no. 1, pp. 63-92, 2008. https://doi.org/10.1007/s10462-009-9109-6
  4. G. Cormack, Email Spam Filtering: A Systematic Review, Foundations and Trends in Information Retrieval, vol. 1, no. 4, pp. 335-455, 2007. https://doi.org/10.1561/1500000006
  5. I. Park, H. Kang, S. Yoo, Classification of Advertising Spam Reviews, Proceedings of the 22th Annual Conference on Human and Cognitive Language Technology, 2010.
  6. H. An and B. Park, Extracting similar advertising review for Opinion Mining, IEEK Conference 2014, pp.1593-1596, 2014.
  7. N. Jindal and B. Liu, Opinion Spam and Analysis, Proceedings of WSDM, pp. 219-229, 2008.
  8. I. Oh, Pattern Recognition, KyoboBooks, 2008.
  9. J. Chang, and I. Kim, An Experimental Evaluation of Short Opinion Document Classification Using A Word Pattern Frequency, Journal of the Institute of Internet, Broadcasting and Communication, Vol. 12, No. 5, 2012.
  10. http://www.yelp.com/
  11. http://www.diningcode.com/
  12. A. Mukherjee, V. Venkataraman, B Liu and NS Glance, What Yelp Fake Review Filter Might Be Doing?, Proceedings of International AAAI Conference on Web and Social Media, 2013.
  13. M. Seo. Practical Data Processing and Analysis Using R, GilBut, 2014.
  14. J. Shim, and H. C. Lee, The Development of Automatic Ontology Generation System Using Extended Search Keywords, Journal of the Korea Academia-Industrial cooperation Society, Vol. 11, No. 6, 2009.