DOI QR코드

DOI QR Code

Optimal Exploration-Exploitation Strategies in Reinforcement Learning for Online Banner Advertising: The Impact of Word-of-Mouth Effects

온라인 배너 광고 강화학습의 최적 탐색-활용 전략: 구전효과의 영향

  • Bumsoo Kim (Sogang Business School, Sogang University) ;
  • Gun Jea Yu (College of Business Administration, Hongik University) ;
  • Joonkyum Lee (Sogang Business School, Sogang University)
  • 김범수 (서강대학교 경영대학 ) ;
  • 유건재 (홍익대학교 경영대학) ;
  • 이준겸 (서강대학교 경영대학)
  • Received : 2024.05.15
  • Accepted : 2024.06.15
  • Published : 2024.06.30

Abstract

One of the most important decisions for managers in the online banner advertising industry, is to choose the best banner alternative for exposure to customers. Since it is difficult to know the click probability of each banner alternative in advance, managers must experiment with multiple alternatives, estimate the click probability of each alternative based on customer clicks, and find the optimal alternative. In this reinforcement learning process, the main decision problem is to find the optimal balance between the level of exploitation strategy that utilizes the accumulated estimated click probability information and exploration strategy that tries new alternatives to find potentially better options. In this study we analyze the impact of word-of-mouth effects and the number of alternatives on the optimal exploration-exploitation strategies. More specifically, we focus on the word-of-mouth effect, where the click-through rate of the banner increases as customers promote the related product to those around them after clicking the exposed banner, and add it to the overall reinforcement learning process. We analyze our problem by employing the Multi-Armed Bandit model, and the analysis results show that the larger the word-of-mouth effect and the fewer the number of banner alternatives, the higher the optimal exploration level of advertising reinforcement learning. We find that as the probability of customers clicking on the banner increases due to the word-of-mouth effect, the value of the previously accumulated estimated click-through rate knowledge decreases, and therefore the value of exploring new alternatives increases. Additionally, when the number of advertising alternatives is small, a larger increase in the optimal exploration level was observed as the magnitude of the word-of-mouth effect increased. This study provides meaningful academic and managerial implications at a time when online word-of-mouth and its impact on society and business is becoming more important.

온라인 배너 광고 산업에서는 일반적으로 복수의 배너 대안이 제작된다. 이때 중요한 의사결정은 어떤 광고 배너 대안을 선택해서 고객에게 노출하느냐 하는 것이다. 각 배너 대안을 고객이 클릭할 확률을 미리 알 수 없기 때문에 경영자는 실험적으로 여러 대안을 노출한 후, 고객의 클릭 여부에 따라 각 대안의 클릭 확률을 추정하며 최적의 대안을 찾아야 하고 이것은 온라인 광고와 관련된 강화학습 프로세스이다. 이 과정에서의 주요 의사결정 문제는 축적된 추정 클릭 확률 지식을 이용해서 최적의 대안을 노출하는 활용 전략과, 잠재적으로 더 우수한 대안을 찾기 위해 새로운 대안을 시도해보는 탐색 전략의 최적 균형점을 찾는 것이다. 본 연구는 구전효과와 대안의 수가 이러한 최적 탐색-활용 전략에 미치는 영향을 분석하였다. 이는 고객이 노출된 배너를 클릭하는 경우 관련 제품을 주위에 홍보하는 과정을 통해 광고 배너의 클릭률이 높아지는 구전효과를 온라인 광고 관련 강화학습에 추가하여 구현한 것이다. 분석을 위해 Multi-Armed Bandit 모형을 이용한 시뮬레이션 기법을 사용하였다. 분석 결과, 구전효과의 크기가 커지고 배너 대안의 수가 적을수록 광고 강화학습의 최적 탐색 수준이 높아지는 것이 관측되었다. 이는 구전효과에 의해 고객이 광고 배너를 클릭할 확률이 증가함에 따라 기존에 축적했던 추정 클릭률 지식의 가치가 낮아지고, 따라서 새로운 대안을 탐색하는 것의 가치가 증가하기 때문으로 분석되었다. 또한 광고 대안의 수가 작을 경우에는 구전효과 크기가 커질 때 최적 탐색 수준이 더 큰 폭으로 증가하는 경향을 발견하였다. 최근 온라인 구전으로 인해 구전효과의 영향이 커지는 시점에서 본 연구는 의미 있는 시사점을 제공한다.

Keywords

References

  1. 2011. 통계청 (2024), https://kosis.kr. 
  2. Auer, P., Cesa-Bianchi, N. and Fischer, P. (2002), Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47 (2), 235-256. 
  3. Bakos, Y., and Dellarocas, C. (2011), Cooperation without Enforcement? A Comparative Analysis of Litigation and Online Reputation as Quality Assurance Mechanisms. Management Science, 57(11), 1944-1962. 
  4. Bone, P.F. (1995), Word-of-Mouth Effects on Short-Term and Long-Term Product Judgements. Journal of Business Research, 32(3), 213-223. 
  5. Burton, R.M. and Obel, B. (2011), Computational modeling for what-is, what-might-be, and what-should-be studies-and triangulation. Organization Science, 22(5), 1195-1202. 
  6. Chandon, J.L., Chtourou, M.S., and Fortin, D.R. (2003), Effects of configuration and exposure levels on responses to web advertisements. Journal of Advertising, 43(3), 217-229. 
  7. Chu, W., Park, S-T., Beaupre, T., Motgi, N., Phadke, A., Chakraborty, S., and Zachariah, J. (2009), A case study of behavior-driven conjoint analysis on Yahoo! Front Page Today module. Proc. 15th ACM SIGKDD International Conference Knowledge Discovery Data Mining, 1097-1104. 
  8. Daw, N. D., O'doherty, J. P., Dayan, P., Seymour, B., and Dolan, R. J. (2006), Cortical substrates for exploratory decisions in humans. Nature, 441(7095), 876-879. 
  9. Dichter, E. (1966), How Word-of Mouth Advertising Works. Harvard Business Review, 44(6), 147-166. 
  10. Duan, W., Gu, B., and Whinston A.B. (2008), The dynamics of online word-of-mouth and product sales - An empirical investigation of the movie industry. Journal of Retailing, 84(2), 233-242. 
  11. Ferecatu, A., and De Bruyn, A. (2022), Understanding managers' trade-offs between exploration and exploitation. Marketing Science, 41(1), 139-165. 
  12. Floyd, K., Freling, R., Alhoqail, S., Cho, H. Y., and Freling, T. (2014), How Online Product Reviews Affect Retail Sales: A Meta-analysis. Journal of Retailing, 90(2), 217-232. 
  13. Gu, b., Park, J., and Konana, P. (2012), Research Note - The Impact of External Word-of-Mouth Sources on Retailer Sales of High-Involvement Products, Information Systems Research, 23(1), 182-196. 
  14. Hanssens, D. M., Villanueva, J., and Yoo, S. (2015), Word-of-mouth and marketing effects on customer equity. Handbook of Research on Customer Equity in Marketing, 178-198. 
  15. Hauser, J.R., Liberali, G., and Urban, G.L. (2014), Website morphing 2.0: Switching costs, partial exposure, random exit, and when to morph. Management Science, 60(6), 1594-1616. 
  16. Iyer, G., Soberman, D., and Villas-Boas, J.M. (2005), The targeting of advertising. Marketing Science, 24(3), 461-476. 
  17. Li, H., and Bukova, J.L. (1999), Cognitive impact of banner ad characteristics: An experimental study. Journalism and Mass Communication Quarterly, 76(2), 341-353. 
  18. Liberali, G., and Ferecatu, A. (2022), Morphing for consumer dynamics: Bandits meet hidden markov models. Marketing Science, 41(4), 769-794. 
  19. Liu, Y. (2006), Word of mouth for movies: Its dynamics and impact on box office revenue. Journal of Marketing, 70(3), 74-89. 
  20. Lothia, R., Donthu, N., and Hershberger, E.K. (2003), The impact of content and design elements on banner advertising click-through rates. Journal of Advertising Research, 43(4), 410-418. 
  21. Luce, R. (1959), Individual Choice Behavior: A Theoretical Analysis, Wiley, New York. 
  22. Luger, J., Raisch, S., and Schimmer, M. (2018), Dynamic balancing of exploration and exploitation: The contingent benefits of ambidexterity. Organization Science, 29(3), 449-470. 
  23. March, J.G. (1991), Exploration and exploitation in organizational learning. Organization Science, 2(1), pp.71-87. 
  24. Mahajan, A., and Teneketzis, D. (2008), Multi-armed bandit problems. In Foundations and applications of sensor management (pp. 121-151). Boston, MA: Springer US. 
  25. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., and Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529-533. 
  26. Namin, A., Hamilton, M.L., and Rohm, A.J. (2020), Impact of message design on banner advertising involvement and effectiveness: An empirical investigation. Journal of Marketing Communications, 26(2), 115-129. 
  27. North, M., and Ficorilli, M. (2017), Click Me: An Examination of the Impact Size, Color, and Design has on Banner Advertisements Generating Clicks. Journal of Financial Services Marketing, 22, 99-108. 
  28. Park, J.W., Kim, J.B., and Choi, Y.L. (2018), A Study on Performance Improvement with Combination of Softmax and UCB Algorithm, Journal of The Korea Society of Information Technology Policy & Management, 10 (1), 649-654.
  29. Posen, H.E. and Levinthal, D.A. (2012), Chasing a moving target: Exploitation and exploration in dynamic environments. Management Science, 58(3), 587-601. 
  30. Radighieri, J.P., and Muler, M. (2014), The Impact of source effects and message valence on word of mouth retransmission. International Journal of Market Research, 56(2), 249-263. 
  31. Robbins, H. (1952), Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58(5), 527-535. 
  32. Robinson, H., Wysocka, A., and Hand, C. (2007), Internet Advertising Effectiveness: the Effect of Design on Click-through Rates for Banner Ads. International Journal of Advertising, 26(4), 527-541. 
  33. Rojas-Cordova, C., Williamson, A.J., Pertuze, J. A., and Calvo, G. (2023), Why one strategy does not fit all: a systematic review on exploration-exploitation in different organizational archetypes. Review of Managerial Science, 17(7), 2251-2295. 
  34. Schwartz, E.M., Bradlow, E.T., and Fader, P.S. (2017), Customer acquisition via display advertising using multi-armed bandit experiments. Marketing Science, 36(4), 500-522. 
  35. Senecal, S., Nantel, J. (2004), The influence of online product recommendations on consumers' online choices. Journal of Retailing, 80(2), 159-169. 
  36. Shahrokhi T.S., and Ching, A.T. (2019), A Heuristic Approach to Explore: The Value of Perfect Information. Johns Hopkins Carey Business School Research Paper, (19-05). 
  37. Smith, W.K. and Tushman, M.L.. (2005), Managing strategic contradictions: A top management model for managing innovation streams. Organization Science, 16(5), 522-536. 
  38. Sorensen, A.T., and Rasmussen, S.J. (2004), Is any publicity good publicity? A Note on the Impact of Boor Reviews. NBER Working paper. Stanford University, (2004). 
  39. Stanaland, A.J.S., and Tan, J. (2010), The impact of surfer/seeker mode on the effectiveness of website characteristics, International Journal of Advertising, 29(4), 569-595 
  40. Sutton, R.S., and Barto, A.G. (2018), Reinforcement learning: An introduction. MITpress. 
  41. Uotila, J. (2017), Exploration, exploitation, and variability: Competition for primacy revisited. Strategic Organization, 15(4), pp.461-480. 
  42. Urban, G.L., Liberali, G., MacDonald, E., Bordley, R., and Hauser, J.R. (2014), Morphing banner advertising. Marketing Science, 33(1), 27-46. 
  43. Zhang, Z., Ye, Q., Law, R., and Li, Y. (2010), The impact of e-word-of-mouth on the online popularity of restaurants: A comparison of consumer reviews and editor reviews. International Journal of Hospitality Management, 29(4), 694-700.