DOI QR코드

DOI QR Code

Query-based Answer Extraction using Korean Dependency Parsing

의존 구문 분석을 이용한 질의 기반 정답 추출

  • Lee, Dokyoung (Department of Industrial Engineering, Yonsei University) ;
  • Kim, Mintae (Department of Industrial Engineering, Yonsei University) ;
  • Kim, Wooju (Department of Industrial Engineering, Yonsei University)
  • Received : 2019.06.04
  • Accepted : 2019.07.03
  • Published : 2019.09.30

Abstract

In this paper, we study the performance improvement of the answer extraction in Question-Answering system by using sentence dependency parsing result. The Question-Answering (QA) system consists of query analysis, which is a method of analyzing the user's query, and answer extraction, which is a method to extract appropriate answers in the document. And various studies have been conducted on two methods. In order to improve the performance of answer extraction, it is necessary to accurately reflect the grammatical information of sentences. In Korean, because word order structure is free and omission of sentence components is frequent, dependency parsing is a good way to analyze Korean syntax. Therefore, in this study, we improved the performance of the answer extraction by adding the features generated by dependency parsing analysis to the inputs of the answer extraction model (Bidirectional LSTM-CRF). The process of generating the dependency graph embedding consists of the steps of generating the dependency graph from the dependency parsing result and learning the embedding of the graph. In this study, we compared the performance of the answer extraction model when inputting basic word features generated without the dependency parsing and the performance of the model when inputting the addition of the Eojeol tag feature and dependency graph embedding feature. Since dependency parsing is performed on a basic unit of an Eojeol, which is a component of sentences separated by a space, the tag information of the Eojeol can be obtained as a result of the dependency parsing. The Eojeol tag feature means the tag information of the Eojeol. The process of generating the dependency graph embedding consists of the steps of generating the dependency graph from the dependency parsing result and learning the embedding of the graph. From the dependency parsing result, a graph is generated from the Eojeol to the node, the dependency between the Eojeol to the edge, and the Eojeol tag to the node label. In this process, an undirected graph is generated or a directed graph is generated according to whether or not the dependency relation direction is considered. To obtain the embedding of the graph, we used Graph2Vec, which is a method of finding the embedding of the graph by the subgraphs constituting a graph. We can specify the maximum path length between nodes in the process of finding subgraphs of a graph. If the maximum path length between nodes is 1, graph embedding is generated only by direct dependency between Eojeol, and graph embedding is generated including indirect dependencies as the maximum path length between nodes becomes larger. In the experiment, the maximum path length between nodes is adjusted differently from 1 to 3 depending on whether direction of dependency is considered or not, and the performance of answer extraction is measured. Experimental results show that both Eojeol tag feature and dependency graph embedding feature improve the performance of answer extraction. In particular, considering the direction of the dependency relation and extracting the dependency graph generated with the maximum path length of 1 in the subgraph extraction process in Graph2Vec as the input of the model, the highest answer extraction performance was shown. As a result of these experiments, we concluded that it is better to take into account the direction of dependence and to consider only the direct connection rather than the indirect dependence between the words. The significance of this study is as follows. First, we improved the performance of answer extraction by adding features using dependency parsing results, taking into account the characteristics of Korean, which is free of word order structure and omission of sentence components. Second, we generated feature of dependency parsing result by learning - based graph embedding method without defining the pattern of dependency between Eojeol. Future research directions are as follows. In this study, the features generated as a result of the dependency parsing are applied only to the answer extraction model in order to grasp the meaning. However, in the future, if the performance is confirmed by applying the features to various natural language processing models such as sentiment analysis or name entity recognition, the validity of the features can be verified more accurately.

질의응답 시스템은 크게 사용자의 질의를 분석하는 방법인 질의 분석과 문서 내에서 적합한 정답을 추출하는 방법인 정답 추출로 이루어지며, 두 방법에 대한 다양한 연구들이 진행되고 있다. 본 연구에서는 문장의 의존 구문 분석 결과를 이용하여 질의응답 시스템 내 정답 추출의 성능 향상을 위한 연구를 진행한다. 정답 추출의 성능을 높이기 위해서는 문장의 문법적인 정보를 정확하게 반영할 필요가 있다. 한국어의 경우 어순 구조가 자유롭고 문장의 구성 성분 생략이 빈번하기 때문에 의존 문법에 기반한 의존 구문 분석이 적합하다. 기존에 의존 구문 분석을 질의응답 시스템에 반영했던 연구들은 구문 관계 정보나 구문 형식의 유사도를 정의하는 메트릭을 사전에 정의해야 한다는 한계점이 있었다. 또 문장의 의존 구문 분석 결과를 트리 형태로 표현한 후 트리 편집 거리를 계산하여 문장의 유사도를 계산한 연구도 있었는데 이는 알고리즘의 연산량이 크다는 한계점이 존재한다. 본 연구에서는 구문 패턴에 대한 정보를 사전에 정의하지 않고 정답 후보 문장을 그래프로 나타낸 후 그래프 정보를 효과적으로 반영할 수 있는 Graph2Vec을 활용하여 입력 자질을 생성하였고, 이를 정답 추출모델의 입력에 추가하여 정답 추출 성능 개선을 시도하였다. 의존 그래프를 생성하는 단계에서 의존 관계의 방향성 고려 여부와 노드 간 최대 경로의 길이를 다양하게 설정하며 자질을 생성하였고, 각각의 경우에 따른 정답추출 성능을 비교하였다. 본 연구에서는 정답 후보 문장들의 신뢰성을 위하여 웹 검색 소스를 한국어 위키백과, 네이버 지식백과, 네이버 뉴스로 제한하여 해당 문서에서 기존의 정답 추출 모델보다 성능이 향상함을 입증하였다. 본 연구의 실험을 통하여 의존 구문 분석 결과로 생성한 자질이 정답 추출 시스템 성능 향상에 기여한다는 것을 확인하였고 해당 자질을 정답 추출 시스템뿐만 아니라 감성 분석이나 개체명 인식과 같은 다양한 자연어 처리 분야에 활용 될 수 있을 것으로 기대한다.

Keywords

References

  1. Abney, S., M. Colins, A. Singhal, "Answer Extraction", Proceedings of the Sixth Conference on Applied Natural Language Processing, (2000), 296-301.
  2. Ahn, K. M. and Y. H. Seo, "A Korean Dependency Parsing Algorithm using Sets of Head Candidates", Journal of KISS : Software and Applications, Vol.41, No.1(2014), 88-95.
  3. Choi, H. S., M. T. Kim, W. J. Kim, D. W. Shin and Y. H. Lee, "Development of Information Extraction System from Multi Source Unstructured Documents for Knowledge Base Expansion", Journal of Intelligence and Information Systems Vol.24, No.4(2018), 111-136. https://doi.org/10.13088/JIIS.2018.24.4.111
  4. Doan-Nguyen, H., and L. Kosseim: "Improving the Precision of a Closed-Domain Question-Answering System with Semantic Information", Coupling approaches, coupling media and coupling languages for information retrieval, (2004), 850-859.
  5. Huang, Z., X. Wei, and Y. Kai,. "Bidirectional LSTM-CRF models for sequence tagging", arXiv preprint arXiv: 1508.01991, (2015).
  6. Hwang, H. S., J. S. Bae and C. K. Lee, "Korean Open Information Extraction using Dependency Parsing and Semantic Role Labeling", Proceedings of Korean Information Science Society, No.12(2018), 563-565.
  7. Ittycheriah, A., M. Franz, W. Zhu, and A. Ratnaparkhi, "IBM's Statistical Question Answering System", In 9th Text Retrieval Conference, (2000), 229-334.
  8. Kawahara, D., N. Kaji, and S. Kurohashi, "Question and answering system based on predicate-argument matching", Proceedings of the Third NTCIR, (2002), 21-24.
  9. Kim, B. S., H. J. Yu and G. B. Lee, "A Syntax-Based Hybrid System for Korean Open Information Extraction", The 27th Annual Conference on Human & Cognitive Language Technology, (2015), 41-45.
  10. Kwak, S. J., B. G. Kim and J. S. Lee, "Tiplet Extraction using Korean Dependency Parsing Result", The 25th Annual Conference on Human & Cognitive Language Technology, (2013), 86-89.
  11. Kwon, H. and J. Y. Choi, "A Korean Language Parser with a Unification Based Dependency Grammar", The Journal of Korea Information Science Society, Vol.19(1992), 467-476.
  12. Lim, J. H., Y. J. Bae, H. K. Kim, Y. J. Kim and K. C. Lee, "Korean Dependency Guidelines for Dependency Parsing and Exo-Brain Language Analysis Corpus", The 27th Annual Conference on Human & Cognitive Language Technology, (2015), 234-239.
  13. Lim, S. J., Y. T. Kim and D. Y. Ra, "Korean Dependency Parsing Based on Machine Learning of Feature Weights", Journal of KIISE: Software and Applications, Vol.38, No.4(2011), 214-223.
  14. McDonald, R., F. Pereira, K. Ribarov, and J. Hajic, "Non-projective Dependency Parsing using Spanning Tree Algorithms", Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, (2005), 523-530.
  15. Mendes, A. C., and L. Coheur, "An approach to answer selection in question-answering based on semantic relations", Proceedings of the 22nd International Joint Conference on Artificial Intelligence, (2011), 1852-1857.
  16. Narayanan, A., M. Chandramohan, R. Venkatesan, L. Chen, Y. Liu, and S. Jaiswal, "graph2vec: Learning distributed representations of graphs", arXiv preprint arXiv:1707.05005, (2017).
  17. Nivre, J. "Incrementality in deterministic dependency parsing", Proceedings of the Workshop on Incremental Parsing: Bringing Engineering and Cognition Together, (2004), 50-57.
  18. Punyakanok, V., D. Roth, and W. Yih, "Mapping dependency trees: An application to question answering", The 8th International Symposium on Artificial Intelligence and Mathematics, (2004).
  19. Ravichandran, D and E. Hovy, "Learning surface text patterns for a question answering system". Proceedings of the 40th annual meeting on association for computational linguistics, (2002), 41-47.
  20. Ravichandran, D., I. Abharam, and R. Salim, "Automatic derivation of surface text patterns for a maximum entropy based question answering system". Proceedings of the Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (2003).
  21. Shelmanov, A., M. Kamenskaya, M. Ananyeva, and I. Smirnov, "Semantic-syntactic analysis for question answering and definition extraction", Scientific and Technical Information Processing, Vol.44, No.6(2017), 412-423. https://doi.org/10.3103/S0147688217060089
  22. Shen D., and D. Klakow. "Exploring correlation of dependency relation paths for answer extraction". Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, (2006), 889-896.
  23. Shin, H. P., "Maximally Efficient Syntactic Parsing with Minimal Resources", The 11th Annual Conference on Human & Cognitive Language Technology, (1999), 242-248.
  24. Shin, S. E., D. Y. Yi and Y. H. Seo, "Korean Question-Answering System using Syntactic-Relation Information", Journal of the Korea Contents Association, Vol.4, No.2(2004), 36-42.
  25. Soubbotin, M. M. and S. M. Soubbotin, "Patterns for potential answer expressions as clues to the right answers", Proceedings of the 10th Text REtrieval Conference, (2001).
  26. Yao, X., B. Van-Durme, C. Callison-Burch, and P. Clark, "Answer extraction as sequence tagging with tree edit distance", Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (2013), 858-867.
  27. Yen, S. J., Y. C. Wu, J. C. Yang, Y. S. Lee, C. J. Lee, and J. J. Liu, "A support vector machine-based context-ranking model for question answering", Information Sciences, Vol.224(2013), 77-87. https://doi.org/10.1016/j.ins.2012.10.014
  28. Yu, L., K. M. Hermann, P. Blunsom, and S. Pulman. "Deep learning for answer sentence selection", arXiv preprint arXiv:1412.1632, (2014).