DOI QR코드

DOI QR Code

Application of Advertisement Filtering Model and Method for its Performance Improvement

광고 글 필터링 모델 적용 및 성능 향상 방안

  • Park, Raegeun (Department of Smart Information and Telecommunication Engineering, Sangmyung University) ;
  • Yun, Hyeok-Jin (Department of Smart Information and Telecommunication Engineering, Sangmyung University) ;
  • Shin, Ui-Cheol (Department of Smart Information and Telecommunication Engineering, Sangmyung University) ;
  • Ahn, Young-Jin (Department of Smart Information and Telecommunication Engineering, Sangmyung University) ;
  • Jeong, Seungdo (Department of Smart Information and Telecommunication Engineering, Sangmyung University)
  • 박래근 (상명대학교 스마트정보통신공학과) ;
  • 윤혁진 (상명대학교 스마트정보통신공학과) ;
  • 신의철 (상명대학교 스마트정보통신공학과) ;
  • 안영진 (상명대학교 스마트정보통신공학과) ;
  • 정승도 (상명대학교 스마트정보통신공학과)
  • Received : 2020.07.15
  • Accepted : 2020.11.06
  • Published : 2020.11.30

Abstract

In recent years, due to the exponential increase in internet data, many fields such as deep learning have developed, but side effects generated as commercial advertisements, such as viral marketing, have been discovered. This not only damages the essence of the internet for sharing high-quality information, but also causes problems that increase users' search times to acquire high-quality information. In this study, we define advertisement as "a text that obscures the essence of information transmission" and we propose a model for filtering information according to that definition. The proposed model consists of advertisement filtering and advertisement filtering performance improvement and is designed to continuously improve performance. We collected data for filtering advertisements and learned document classification using KorBERT. Experiments were conducted to verify the performance of this model. For data combining five topics, accuracy and precision were 89.2% and 84.3%, respectively. High performance was confirmed, even if atypical characteristics of advertisements are considered. This approach is expected to reduce wasted time and fatigue in searching for information, because our model effectively delivers high-quality information to users through a process of determining and filtering advertisement paragraphs.

최근 기하급수적인 인터넷 데이터의 증가로 딥러닝 등의 많은 분야가 발전하였지만 바이럴 마케팅(viral marketing)과 같은 상업적 목적의 광고가 발견되면서 정보증가의 부작용이 발생하고 있다. 이는 양질의 정보를 공유하고자 하는 인터넷의 본질을 훼손하고 있을 뿐만 아니라 사용자는 양질의 정보를 습득하기 위해 검색시간이 증가하는 문제가 야기된다. 이에 본 연구에서는 광고(Ad: Advertisement, 이하 Ad) 글을 정보 전달의 본질을 흐리는 내용의 글이라 정의하였으며 본 정의에 부합하는 정보로 필터링하는 모델을 제안하였다. 제안하는 모델은 광고 필터링 경로와 광고 필터링 성능 개선경로로 구성되었으며 지속적으로 성능이 개선되도록 설계하였다. 광고 글 필터링을 위해 데이터를 수집하고 KorBERT를 사용하여 문서분류를 학습하였다. 본 모델의 성능을 검증하기 위해 실험을 진행하였으며 5개의 주제를 통합한 데이터에 대한 정확도(Accuracy), 정밀도(Precision)는 각각 89.2%, 84.3%의 결과를 나타냈고 광고의 비정형적 특성을 고려하더라도 높은 성능이 보임을 확인하였다. 본 모델을 통해 바이럴 마케팅으로 구성된 문서에서 광고 문단을 판단하고 필터링하여 사용자에게 양질의 정보를 효과적으로 전달하며 검색하는 과정에서 낭비되는 시간과 피로가 감소할 것으로 기대된다.

Keywords

References

  1. ITU(International Telecommunication Union), Individuals using the Internet, 2005-2019 [Internet], Available From: https://www.itu.int/en/ITU-D/Statistics/Pages/stat/default.aspx (accessed Jun 25, 2020)
  2. David Reinsel, John Gantz, John Rydning, The Digitization of the World From Edge to Core, IDC(International Data Corporation), pp.1-28, Nov.s.
  3. Jeil Oh, "Supreme Court of Korea 'Portal screen change software release, No business interruption' " [Internet], NEWSIS, cited May. 05 2016, Available From: https://bit.ly/2VcIZWR (accessed Jun. 23, 2020)
  4. JeongHoon Lee, "Online A dvertising by Search Engine and Criminal Responsibility", Hongik Law Review, Vol.19, No.4, pp.59-81, 2018. DOI: http://dx.doi.org/10.16960/jhlr.19.1.201802.59
  5. Byunghee Kim, "New Definitions and Ranges on Advertising : Mixed Methods Approach", KJA(The Korean Journal of Advertising), Vol.24, No.2, pp.225-254, 2013.
  6. Korea Industrial Marketing Institute, "Viral Marketing", Marketing, Vol.49, No.5, pp.59-67, May. 2015.
  7. Anonymous. Driver's insurance cost in the 20s [Internet]. NAVER, [cited Jun 13 2020], Available From: https://bit.ly/2OaM31W (accessed Jun. 24, 2020)
  8. Hyoung-Woong Yoon, Namsu Kim, Sohyun Park, Ye-Bin Jeong, Hae-Yeoun Lee, "SNS Advertisement Filtering System", Proceedings of KIIT Conference, pp.261-262, Jun. 2018.
  9. Ji-A. Kim, Geum-Boon. Lee, "An Effective Method for Blocking Illegal Sports Gambling Ads on Social Media", Journal of the Korea Society of Computer and Information, Vol.24, No.12, pp.201-207, Dec. 2019. DOI : https://doi.org/10.9708/jksci.2019.24.12.201
  10. Taewon Song, Heeok Lee, "A Study of Legal Aspect on the Blocking Ads by AdBlock", InHa Law Review, Vol.19, No.2, pp.147-175, 2016.
  11. Sunju Park, Seungwha Chung, Naseong Pyo, Soonki Hwang, "Bottlenecks in Building an Online Customer Base: A Experimental Field Study on Viral Marketing", The Journal of the Korea Contents Association, Vol.19, No.1, pp.682-695, 2019. DOI : http://dx.doi.org/10.5392/JKCA.2019.19.01.682
  12. Sangheum Hwang, Dohyun Kim, "BERT-based Classification Model for Korean Documents", The Journal of Society for e-Business Studies, Vol.25, No.1, pp.203-214, Feb. 2020. DOI : http://dx.doi.org/10.7838/jsebs.2020.25.1.203
  13. Soojong Lim, Hyunki Kim, "Current Status of Deep Learning Pre-training for Natural Language Processing and Application to Korean" Institute of Culture Convergence Archiving, Vol.2, No.2, pp.111-118, Oct. 2019
  14. Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding" Association for Computational Linguistics, pp.4171-4186, Jun. 2019. DOI : http://dx.doi.org/10.18653/v1/N19-1423
  15. ETRI, Results of KorBERT language model comparison with Google [Internet], ETRI, Available From: http://aiopen.etri.re.kr/service_dataset.php (accessed Jun. 24, 2020)
  16. DMC MEDIA, 2019 Portal Site Usage Behavior Survey Analysis Report - Summary, Analysis Report, DMC REPORT, Korea, pp.1-13