DOI QR코드

DOI QR Code

검색 기반의 질문생성에서 중복 방지를 위한 유사 응답 검출

Detection of Similar Answers to Avoid Duplicate Question in Retrieval-based Automatic Question Generation

  • 최용석 (충남대학교 전자전파정보통신공학과) ;
  • 이공주 (충남대학교 전파정보통신공학과)
  • 투고 : 2018.10.08
  • 심사 : 2018.12.03
  • 발행 : 2019.01.31

초록

본 연구는 검색 기반의 질문 자동 생성 시스템에서 사용자가 이미 답변한 내용을 재질문하지 않도록 사용자의 응답과 유사도가 높은 응답을 질문-데이터베이스에서 찾는 방법을 제안한다. 유사도가 높게 검출된 응답의 질문은 이미 사용자가 아는 내용일 확률이 높기 때문에 질문 후보군에서 제거한다. 유사 응답 검출에는 두 응답간의 동일 단어, 바꿔쓰기 표현, 문장 내용을 모두 사용하였다. 바꿔쓰기 표현은 통계기반의 기계번역에서 사용하는 구절 테이블을 사용하여 구축하였다. 문장 내용은 두 문장을 주의-기반 컨볼루션 신경망으로 압축하여 유사도를 계산하였다. 평가를 위해 구축한 100개의 평가 응답에 질문-응답 데이터베이스로부터 가장 유사한 응답을 추출해서 얻은 결과는 MRR값 71%의 성능을 보였다.

In this paper, we propose a method to find the most similar answer to the user's response from the question-answer database in order to avoid generating a redundant question in retrieval-based automatic question generation system. As a question of the most similar answer to user's response may already be known to the user, the question should be removed from a set of question candidates. A similarity detector calculates a similarity between two answers by utilizing the same words, paraphrases, and sentential meanings. Paraphrases can be acquired by building a phrase table used in a statistical machine translation. A sentential meaning's similarity of two answers is calculated by an attention-based convolutional neural network. We evaluate the accuracy of the similarity detector on an evaluation set with 100 answers, and can get the 71% Mean Reciprocal Rank (MRR) score.

키워드

JBCRJM_2019_v8n1_27_f0001.png 이미지

Fig. 1. Example of System’s Question and User’s Answers

JBCRJM_2019_v8n1_27_f0002.png 이미지

Fig. 2. The Overview of the Automatic QuestionGeneration System

JBCRJM_2019_v8n1_27_f0003.png 이미지

Fig. 3. Similarity Function Between Sentences Using Paraphrases [15]

JBCRJM_2019_v8n1_27_f0004.png 이미지

Fig. 4. Attention-Based Convolutional Neural Networks Structure [19]

Table 1. Example of the Paraphrase Table   

JBCRJM_2019_v8n1_27_t0001.png 이미지

Table 2. Examples of Quora Datasets

JBCRJM_2019_v8n1_27_t0002.png 이미지

Table 3. Database of Questions and Answers

JBCRJM_2019_v8n1_27_t0003.png 이미지

Table 4. Examples of Questions and Answers

JBCRJM_2019_v8n1_27_t0004.png 이미지

Table 5. Word-level Comparisons Between Two Different Sets for Evaluation Data

JBCRJM_2019_v8n1_27_t0005.png 이미지

Table 6. Evaluation Results

JBCRJM_2019_v8n1_27_t0006.png 이미지

Table 7. Examples of the Most Similar Answer from QA Database

JBCRJM_2019_v8n1_27_t0007.png 이미지

Table 8. The Most Top-10 Similar Questions Found by the Similarity Functions

JBCRJM_2019_v8n1_27_t0008.png 이미지

참고문헌

  1. N. T. Le, T. Kojiri, and N. Pinkwart, "Automatic Question Generation for Educational Applications - The State of Art," In Advanced Computational Methods for Knowledge Engineering. Springer, Cham, pp. 325-338, 2014.
  2. Z. Ji, Z. Lu, and H. Li, "An information retrieval approach to short text conversation," arXiv preprint arXiv:1408.6988. 2014.
  3. J. F. Aquino, D. D. Chua, R. K. Kabiling, J. N. Pingco and R. Sagum, "Text2Test: Question Generator Utilizing Information Abstraction Techniques and Question Generation Methods for Narrative and Declarative Text," In Proceedings of the 8th National Natural Language Processing Research Symposium, pp. 29-34, 2011.
  4. P. Pabitha, M. Mohana, S. Suganthi, and B. Sivanandhini, "Automatic Question Generation System," In International Conference on Recent Trends in Information Technology, 2014.
  5. P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang, "Squad: 100,000+ questions for machine comprehension of text," In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Austin, Texas, pp. 2383-2392, 2016.
  6. X. Du, J. Shao, and C. Cardie, "Learning to Ask: Neural Question Generation for Reading Comprehension," arXiv preprint arXiv:1705.00106, 2017.
  7. N. Duan, D. Tang, P. Chen, and M. Zhou, "Question generation for question answering," In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 877-885, 2017.
  8. J. Mueller and A. Thyagarajan, "Siamese Recurrent Architectures for Learning Sentence Similarity," In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), pp. 2786-2792, 2016.
  9. J. Allan, C. Wade, and A. Bolivar, "Retrieval and novelty detection at the sentence level," In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval. SIGIR '03, pp. 314-321, 2003.
  10. T. C. Hoad and J. Zobel, "Methods for identifying versioned and plagiarized documents," In: Journal of the American Society for Information Science and Technology Archive, Vol. 54, Issue 3, pp. 203-215, 2003. https://doi.org/10.1002/asi.10170
  11. W. N. Zhang, T. Liu, Y. Yang, L. Cao, Y. Zhang, and R. Ji, "A Topic Clustering Approach to Finding Similar Questions from Large Question and Answer Archives," PloS one, Vol. 9, No. 3, e71511, 2014. https://doi.org/10.1371/journal.pone.0071511
  12. K. Wang, Z. Ming, and T. S. Chua, "A syntactic tree matching approach to finding similar questions in community-based QA services," In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '09, pp. 187-194, 2009.
  13. M. Marelli, L. Bentivogli, M. Baroni, R. Bernardi, S. Menini, and R. Zamparelli, "SemEval-2014 Task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment," In Proceedings of the 8th International Workshop on Semantic Evaluation, pp. 1-8, 2014.
  14. K. S. Tai, R. Socher, and C. D. Manning, "Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks," ACL, pp. 1556-1566, 2015.
  15. Z. Yan, N. Duan, J. Bao, P. Chen, M. Zhou, Z. Li, and J. Zhou, "Docchat: An information retrieval approach for chatbot engines using unstructured documents," In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 516-525, 2016.
  16. C. D. Manning, H. Schutze, and P. Raghavan, "Introduction to information retrieval," Cambridge University Press, 2008.
  17. C. Callison-Burch, P. Callison-Burch, and M. Osborne, "Improved statistical machine translation using paraphrases," In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pp. 17-24, 2006.
  18. R. Zens and H. Ney, "Efficient Phrase-table Representation for Machine Translation with Applications to Online MT and Speech Translation," Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLTNAACL), pp. 492-499, 2007.
  19. W. Yin, H. Schutze, B. Xiang, and B. Zhou, "AbCNN: Attention-based convolutional neural network for modeling sentence pairs," arXiv preprint arXiv:1512.05193, 2015.
  20. K. Papineni, S. Roukos, T. Ward, and W. J. Zhu, "BLEU, a Method for Automatic Evaluation of Machine Translation," In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311-318, 2002.