DOI QR코드

DOI QR Code

A Machine Learning based Method for Measuring Inter-utterance Similarity for Example-based Chatbot

예제 기반 챗봇을 위한 기계 학습 기반의 발화 간 유사도 측정 방법

  • Yang, Min-Chul (Department of Computer and Radio Communications Engineering, Korea University) ;
  • Lee, Yeon-Su (Department of Computer and Radio Communications Engineering, Korea University) ;
  • Rim, Hae-Chang (Division of Computer and Communications Engineering, Korea University)
  • 양민철 (고려대학교 컴퓨터.전파통신공학과) ;
  • 이연수 (고려대학교 컴퓨터.전파통신공학과) ;
  • 임해창 (고려대학교 컴퓨터.통신공학부)
  • Received : 2010.06.14
  • Accepted : 2010.08.10
  • Published : 2010.08.31

Abstract

Example-based chatBot generates a response to user's utterance by searching the most similar utterance in a collection of dialogue examples. Though finding an appropriate example is very important as it is closely related to a response quality, few studies have reported regarding what features should be considered and how to use the features for similar utterance searching. In this paper, we propose a machine learning framework which uses various linguistic features. Experimental results show that simultaneously using both semantic features and lexical features significantly improves the performance, compared to conventional approaches, in terms of 1) the utilization of example database, 2) precision of example matching, and 3) the quality of responses.

예제 기반 챗봇은 사용자 발화와 가장 유사한 예제 발화를 대화 예제 데이터베이스로부터 검색하여 응답을 생성한다. 가장 유사한 발화를 찾는 것은 응답의 적절성과 직결되는 것임에도 불구하고, 유사 발화 검색을 위해 어떠한 자질을 사용할 것인지, 어떠한 방식이 좋은 지에 대한 기존 연구는 부족하였다. 본 연구에서는 검색의 정확도와 예제의 활용도를 높이기 위해 다양한 어휘적, 의미적 자질을 이용한 기계 학습 방법을 제안한다. 실험 결과 1) 대화 예제 데이터베이스의 활용도 2) 예제 발화의 매칭의 정확률 3) 답변의 질적인 측면에서 제안하는 방법은 기존의 방법에 비해 더 나은 성능을 보였다.

Keywords

References

  1. Weizenbaum, J. "Eliza - a computer program for the study of natural language communication between man and machine", Communications of the ACM, Vol. 9, pp. 36-45, 1965.
  2. Levin et al., "The ALICE System: A Workbench for Learning and Using Language", CALICO Journal Vol. 9, No. 1, pp. 27-56, 1991.
  3. H. Murao, N. Kawaguchi, S. Matsubara, Y. Yamaguchi, and Y. Inagaki. "Example-based spoken dialogue system using WOZ system log", SIGdial Workshop on Discourse and Dialogue (SIGDIAL 2003), pp. 140-148, 2003.
  4. C Lee et al. "Correlation-based Query Relaxation for Example-based Dialog Modeling", ASRU, pp. 474-478, 2009.
  5. Andres Marzal and Enrique Vidal. "Computation of normalized edit distance and applications", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 15. pp. 926-932. 1993. https://doi.org/10.1109/34.232078
  6. Jay J. Jiang and David W. Conrath. "Semantic similarity based on corpus statistics and lexical taxonomy", In Proceedings of International Conference on Research in Computational Linguistics, Taiwan. 1997.
  7. 윤애선 외, "한국어 어휘의미망 KorLex 1.5의 구축", 한국정보과학회 논문지, 제36권, 제1호, pp. 92-108. 2009.
  8. T. Joachims. "Training Linear SVMs in Linear Time", Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD). 2006.
  9. B.A. Shawar, and E. Atwell. "Different Measurement metrics to evaluate a chatbot system" Academic and Industrial Research in Dialog Technologies Workshop Proceediings. Association for Computational Linguistics. pp. 89-96. 2007.