DOI QR코드

DOI QR Code

Empirical study on BlenderBot 2.0's errors analysis in terms of model, data and dialogue

모델, 데이터, 대화 관점에서의 BlendorBot 2.0 오류 분석 연구

  • Lee, Jungseob (Human-inspired Computing Research Center, Korea University) ;
  • Son, Suhyune (Department of Computer Science and Engineering, Korea University) ;
  • Shim, Midan (Human-inspired Computing Research Center, Korea University) ;
  • Kim, Yujin (Human-inspired Computing Research Center, Korea University) ;
  • Park, Chanjun (Department of Computer Science and Engineering, Korea University) ;
  • So, Aram (Human-inspired Computing Research Center, Korea University) ;
  • Park, Jeongbae (Human-inspired Computing Research Center, Korea University) ;
  • Lim, Heuiseok (Department of Computer Science and Engineering, Korea University)
  • 이정섭 (고려대학교 Human-inspired AI 연구소) ;
  • 손수현 (고려대학교 컴퓨터학과) ;
  • 심미단 (고려대학교 Human-inspired AI 연구소) ;
  • 김유진 (고려대학교 Human-inspired AI 연구소) ;
  • 박찬준 (고려대학교 컴퓨터학과) ;
  • 소아람 (고려대학교 Human-inspired AI 연구소) ;
  • 박정배 (고려대학교 Human-inspired AI 연구소) ;
  • 임희석 (고려대학교 컴퓨터학과)
  • Received : 2021.11.17
  • Accepted : 2021.12.20
  • Published : 2021.12.28

Abstract

Blenderbot 2.0 is a dialogue model representing open domain chatbots by reflecting real-time information and remembering user information for a long time through an internet search module and multi-session. Nevertheless, the model still has many improvements. Therefore, this paper analyzes the limitations and errors of BlenderBot 2.0 from three perspectives: model, data, and dialogue. From the data point of view, we point out errors that the guidelines provided to workers during the crowdsourcing process were not clear, and the process of refining hate speech in the collected data and verifying the accuracy of internet-based information was lacking. Finally, from the viewpoint of dialogue, nine types of problems found during conversation and their causes are thoroughly analyzed. Furthermore, practical improvement methods are proposed for each point of view, and we discuss several potential future research directions.

블렌더봇 2.0 대화모델은 인터넷 검색 모듈과 멀티 세션의 도입을 통해 실시간 정보를 반영하고, 사용자에 대한 정보를 장기적으로 기억할 수 있도록 함으로써 오픈 도메인 챗봇을 대표하는 대화모델로 평가받고 있다. 그럼에도 불구하고 해당 모델은 아직 개선점이 많이 존재한다. 이에 본 논문은 블렌더봇 2.0의 여러 가지 한계점 및 오류들을 모델, 데이터, 대화의 세 가지 관점으로 분석하였다. 모델 관점에서 검색엔진의 구조적 문제점, 서비스 시 모델 응답 지연시간에 대한 오류를 주로 분석하였다. 데이터 관점에서 크라우드 소싱 과정에서 워커에게 제공된 가이드라인이 명확하지 않았으며, 수집된 데이터의 증오 언설을 정제하고 인터넷 기반의 정보가 정확한지 검증하는 과정이 부족한 오류를 지적하였다. 마지막으로, 대화 관점에서 모델과 대화하는 과정에서 발견한 아홉 가지 유형의 문제점을 면밀히 분석하였고 이에 대한 원인을 분석하였다. 더 나아가 각 관점에 대하여 실질적인 개선방안을 제안하였으며 오픈 도메인 챗봇이 나아가야 할 방향성에 대한 분석을 진행하였다.

Keywords

Acknowledgement

This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(NRF-2021R1A6A1A03045425) and supported by the MSIT(Ministry of Science and ICT), Korea, under the ITRC(Information Technology Research Center) support program(IITP-2018-0-01405) supervised by the IITP(Institute for Information & Communications Technology Planning & Evaluation).

References

  1. E. Dinan, S. Roller, K. Shuster, A. Fan, M. Auli, & J. Weston. (2018). Wizard of wikipedia: Knowledge-powered conversational agents. arXiv preprint arXiv:1811.01241.
  2. B. Kim, J. Ahn & G. Kim. (2020). Sequential latent knowledge selection for knowledge-grounded dialogue. arXiv preprint arXiv:2002.07510.
  3. H. Song, W. N. Zhang, Y. Cui, D. Wang & T. Liu. (2019). Exploiting persona information for diverse generation of conversational responses. arXiv preprint arXiv:1905.12188.
  4. P. Zhong, C. Zhang, H. Wang, Y. Liu & C. Miao, (2020). Towards persona-based empathetic conversational models. arXiv preprint arXiv:2004.12316.
  5. D. Jurafsky & J. H. Martin. (2019). Speech and language processing (3rd draft ed.), 2019.
  6. T. Fong, C. Thorpe & C. Baur. (2003). Collaboration, dialogue, human-robot interaction. Robotics Research, (pp. 255-266).
  7. J. Weizenbaum. (1966). Eliza-a computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36-45. https://doi.org/10.1145/365153.365168
  8. L. Zhou, J. Gao, D. Li & H. Y. Shum. (2020). The design and implementation of xiaoice, an empathetic social chatbot. Computational Linguistics, 46(1), 53-93. https://doi.org/10.1162/coli_a_00368
  9. B. Sun & K. Li. (2021). Neural dialogue generation methods in open domain: A survey. Natural Language Processing Research, 1,(3-4), 56-70. https://doi.org/10.2991/nlpr.d.210223.001
  10. A. Radford et al. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.
  11. Y. Zhang et al. (2019). Dialogpt: Large-scale generative pre-training for conversational response generation. arXiv preprint arXiv:1911.00536, 2019.
  12. D. Adiwardana, et al., "Towards a human-like open-domain chatbot," arXiv preprint arXiv:2001.09977.
  13. S. Bao et al. (2020). Plato-2: Towards building an opendomain chatbot via curriculum learning. arXiv preprint arXiv:2006.16779.
  14. S. Roller et al. (2020). Recipes for building an open-domain chatbot. arXiv preprint arXiv:2004.13637, 2020.
  15. S. Humeau, K. Shuster, M. A. Lachaux & J. Weston. (2019). Polyencoders: Transformer architectures and pre-training strategies for fast and accurate multi-sentence scoring. arXiv preprint arXiv:1905.01969.
  16. J. Baumgartner, S. Zannettou, B. Keegan, M. Squire & J. Blackburn. (2020). The pushshift reddit dataset. Proceedings of the international AAAI conference on web and social media, 14, 830-839.
  17. S. Zhang, E. Dinan, J. Urbanek, A. Szlam, D. Kiela & J. Weston. (2018). Personalizing dialogue agents: I have a dog, do you have pets too?. arXiv preprint arXiv:1801.07243.
  18. H. Rashkin, E. M. Smith, M. Li & Y.-L. Boureau. (2018). Towards empathetic open-domain conversation models: A new benchmark and dataset. arXiv preprint arXiv:1811.00207.
  19. E. M. Smith, M. Williamson, K. Shuster, J. Weston, & Y. L. Boureau. (2020). Can you put it all together: Evaluating conversational agents' ability to blend skills. arXiv preprint arXiv:2004.08449.
  20. T. B. Brown et al. (2005). Language models are few-shot learners. arXiv preprint arXiv:2005.14165.
  21. J. Xu, A. Szlam & J. Weston. (2021). "Beyond goldfish memory: Long-term open-domain conversation," arXiv preprint arXiv:2107.07567.
  22. M. Komeili, K. Shuster & J. Weston. (2021). Internet-augmented dialogue generation. arXiv preprint arXiv:2107.07566.
  23. K. Karma Choedak. (2020). The effect of chatbots response latency on users'trust.
  24. E. M. Bender, T. Gebru, A. McMillan-Major & S. Shmitchell. (2021). On the dangers of stochastic parrots: Can language models be too big?. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, (pp. 610-623).
  25. C. Lee, K. Yang, T. Whang, C. Park, A. Matteson, & H. Lim, "Exploring the data efficiency of cross-lingual post-training in pretrained language models," Applied Sciences, Vol. 11, No. 5, p. 1974, 2021. https://doi.org/10.3390/app11051974
  26. J. Xu, D. Ju, M. Li, Y. L. Boureau, J. Weston & E. Dinan. (2020). Recipes for safety in open-domain chatbots. arXiv preprint arXiv:2010.07079.
  27. J. Herzig, P. K. Nowak, T. Muller, F. Piccinno & J. M. Eisenschlos. (2020). Tapas: Weakly supervised table parsing via pre-training. arXiv preprint arXiv:2004.02349.
  28. P. Yin, G. Neubig, W. T. Yih & S. Riedel. (2020). Tabert: Pretraining for joint understanding of textual and tabular data. arXiv preprint arXiv:2005.08314.