Browse > Article
http://dx.doi.org/10.9728/dcs.2018.19.5.927

Experimental Analysis of Correct Answer Characteristics in Question Answering Systems  

Han, Kyoung-Soo (Division of Computer Engineering, Sungkyul University)
Publication Information
Journal of Digital Contents Society / v.19, no.5, 2018 , pp. 927-933 More about this Journal
Abstract
One of the factors that have the greatest influence on the error of the question answering system that finds and provides answers to natural language questions is the step of searching for documents or passages that contain correct answers. In order to improve the retrieval performance, it is necessary to understand the characteristics of documents and passages containing correct answers. This paper experimentally analyzes how many question words appear in the correct answer documents, how the location of the question word is distributed, and how the topic of the question and the correct answer document are similar using the corpus composed of the question, the documents with correct answer, and the documents without correct answer. This study explains the causes of previous search research results for question answer system and discusses the necessary elements of effective search step.
Keywords
Question answering system; Information retrieval; Answer characteristic; Proximity density; Query expansion;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 S. Abney, M. Collins, and A. Singhal, "Answer extraction," in Proceedings of the sixth Conference on Applied Natural Language Processing, Seattle:WA, pp. 296-301, April 2000.
2 D. Moldovan, M. Pasca, S. Harabagiu, and M. Surdeanu, "Performance Issues and Error Analysis in an Open-Domain Question Answering System," ACM Transactions on Information Systems, Vol. 21, No. 2, pp. 133-154, April 2003.   DOI
3 X. Yao, B. V. Durme, and P. Clark, "Automatic Coupling of Answer Extraction and Information Retrieval," in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, pp. 159-165. August 2013.
4 J. Tiedemann and J. Mur, "Simple is Best: Experiments with Different Document Segmentation Strategies for Passage Retrieval," in Proceedings of the 2nd Workshop on Information Retrieval for Question Answering(IRQA 08 Coling 2008), pp. 17-25, Manchester, UK, August 2008.
5 H. Saggion, R. Gaizauskas, M. Hepple, I. Roberts, and M. A. Greenwood, "Exploring the Performance of Boolean Retrieval Strategies for Open Domain Question Answering," in Proceedings of the Information Retrieval for Question Answering(IR4QA) Workshop at SIGIR, 2004.
6 L. van der Plas and J. Tiedemann, "Using Lexico-Semantic Information for Query Expansion in Passage Retrieval for Question Answering," in Proceedings of the 2nd Workshop on Information Retrieval for Question Answering(IRQA 08 Coling 2008), pp. 50-57, Manchester, UK, August 2008.
7 I. Roberts and R. Gaizauskas, "Evaluating Passage Retrieval Approaches for Question Answering," in Proceedings 26th European Conference on IR Research(ECIR 2004), pp. 72-84, Sunderland, UK, April 2004.
8 S. Tellex, B. Katz, J. Lin, A. Fernandes, and G. Marton, "Quantitative Evaluation of Passage Retrieval Algorithms for Question Answering," in Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR '03), pp. 41-47, Toronto, Canada, July 2003.
9 A. Ittycheriah, M. Franz, and S. Roukos, "IBM's Statistical Question Answering System-TREC-10," in Proceedings of the 10th Text Retrieval Conference (TREC-10), pp. 258-264, Gaithersburg:MD, November 2001.
10 G. G. Lee, J. Seo, S. Lee, H. Jung, B. H. Cho, C. Lee, B. K. Kwak, J. Cha, D. Kim, J. An, H. Kim, and K. Kim, "SiteQ: Engineering High Performance QA System Using Lexico-Semantic Pattern Matching and Shallow NLP," in Proceedings of the 10th Text Retrieval Conference (TREC-10), pp. 442-451, Gaithersburg:MD, November 2001.
11 E. M. Voorhees, "Overview of the TREC 2003 Question Answering Track," in Proceedings of the 12th Text Retrieval Conference (TREC 2003), pp. 54-68, 2003.
12 F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, "Scikit-learn: Machine Learning in Python", Journal of Machine Learning Research, Vol. 12, pp. 2825-2830, 2011.
13 C. Fellbaum, WordNet: An Electronic Lexical Database, Cambridge, MA: MIT Press, 1998.
14 T. Tao and C. Zhai, "An Exploration of Proximity Measures in Information Retrieval," in Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, pp. 295-302, 2007.
15 The Apache Software Foundation, Apache Lucene [Internet]. Available: http://lucene.apache.org/.
16 C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky, "The Stanford CoreNLP Natural Language Processing Toolkit," in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55-60. 2014.
17 S. Bird, E. Loper, and E. Klein, Natural Language Processing with Python, O'Reilly Media Inc., 2009
18 R. Rehurek and P. Sojka, "Software Framework for Topic Modelling with Large Corpora", in Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Valletta, Malta, pp. 45-50, May 2010.
19 J. Lin and B. Katz, "Building a Reusable Test Collection for Question Answering," Journal of the American Society for Information Science and Technology, Vol. 57, No. 7. pp.851-861, 2006.   DOI
20 D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent Dirichlet Allocation," Journal of Machine Learning Research, Vol. 3, pp. 993-1022, January 2003.
21 K. S. Han, "Dualized Topic-Preserving Pseudo Relevance Feedback for Question Answering," IEICE Transactions on Information and Systems, Vol. E100-D, No. 7, pp. 1550-1553, July 2017.   DOI
22 K. Kim, H. J. Song, and N. Moon, "Topic Modeling for Automatic Classification of Learner Question and Answer in Teaching-Learning Support," Journal of Digital Contents Society, Vol. 18, No. 2, pp. 339-346, April 2017.   DOI