DOI QR코드

DOI QR Code

Korean TableQA: Structured data question answering based on span prediction style with S3-NET

  • Received : 2019.04.09
  • Accepted : 2020.03.09
  • Published : 2020.12.14

Abstract

The data in tables are accurate and rich in information, which facilitates the performance of information extraction and question answering (QA) tasks. TableQA, which is based on tables, solves problems by understanding the table structure and searching for answers to questions. In this paper, we introduce both novice and intermediate Korean TableQA tasks that involve deducing the answer to a question from structured tabular data and using it to build a question answering pair. To solve Korean TableQA tasks, we use S3-NET, which has shown a good performance in machine reading comprehension (MRC), and propose a method of converting structured tabular data into a record format suitable for MRC. Our experimental results show that the proposed method outperforms a baseline in both the novice task (exact match (EM) 96.48% and F1 97.06%) and intermediate task (EM 99.30% and F1 99.55%).

Keywords

Acknowledgement

This research was supported by LG CNS under the Question Answering project for formatted documents and Korea Electric Power Corporation. (Grant number:R18XA05)

References

  1. F. Hill et al., The goldilocks principle: reading children's books with explicit memory representations, arXiv preprint arXiv:1511.02301, 2015, pp. 1-13.
  2. P. Rajpurkar et al., Squad: 100,000+ questions for machine comprehension of text, arXiv preprint arXiv:1606.05250, 2016, pp. 1-10.
  3. D. Chen et al., Reading wikipedia to answer open-domain questions, arXiv preprint arXiv:1704.00051, 2017, pp. 1-10.
  4. P. Bajaj et al., Ms marco: A human-generated machine reading comprehension dataset, arXiv preprint arXiv:1611.09268, 2016, pp. 1-11.
  5. P. Rajpurkar, R. Jia, and P. Liang, Know what you don't know: Unanswerable questions for squad, arXiv preprint arXiv:1806.03822, 2018, pp. 1-9.
  6. V. Zhong, C. Xiong, and R. Socher, Seq2sql: Generating structured queries from natural language using reinforcement learning, arXiv preprint arXiv:1709.00103, 2017, pp. 1-12.
  7. P. Pasupat and P. Liang, Compositional semantic parsing on semi-structured tables, arXiv preprint arXiv:1508.00305, 2015, pp. 1-11.
  8. Z. Lu, H. Li, and B. Kao, Neural enquirer: learning to query tables in natural language, IEEE Data Eng. Bull. 39 (2016), no. 3, 63-73.
  9. K. Nishida et al., Understanding the semantic structures of tables with a hybrid deep neural network architecture, in Proc. ThirtyFirst AAAI Conf. Artif. Intell. (SanFrancisco, CA, USA), Feb. 2017, pp. 168-174.
  10. A. Graves, A. Mohamed, and G. Hinton, Speech recognition with deep recurrent neural networks, in Proc. IEEE Int. Conf. Acoustics, Speech Signal Process. (Vancouver, Canada), May, 2013, pp. 6645-6649.
  11. Y. Kim, Convolutional neural networks for sentence classification, arXiv preprint arXiv:1408.5882, 2014, pp. 1-6.
  12. M. Ghasemi-Gol and P. Szekely, Tabvec: Table vectors for classification of web tables, arXiv preprint arXiv:1802.06290, 2018, pp. 1-9.
  13. S.K. Jauhar, P. Turney, and E. Hovy, Tables as semi-structured knowledge for question answering, in Proc. Annu. Meeting Association Comput. Linguistics (Berlin, Germany), Aug. 2016, pp. 474-483.
  14. A. Morales et al., Learning to answer questions from wikipedia infoboxes, in Proc. Conf. Empirical Methods Natural Language Process. (Austin, TX, USA), Nov. 2016, pp. 1930-1935.
  15. S. Vakulenko and V. Savenkov, Tableqa: Question answering on tabular data, arXiv preprint arXiv:1705.06504, 2017, pp. 1-5.
  16. S. Sukhbaatar , J. Weston, and R. Fergus. End-to-end memory networks, in Proc. Adv. Neural Inf. Process. Syst. (Montreal, Canada), Dec. 2015, pp. 2440-2448.
  17. W. Wang et al., Gated self-matching networks for reading comprehension and question answering, in Proc. Annu. Meeting Association Comput. Linguistics (Vancouver, Canada), July 2017, pp. 189-198.
  18. M. Seo et al., Bidirectional attention flow for machine comprehension, arXiv preprint arXiv:1611.01603, 2016, pp. 1-13.
  19. H.-Y. Huang et al., Fusionnet: Fusing via fully-aware attention with application to machine comprehension, arXiv preprint arXiv:1711.07341, 2017, pp. 1-20.
  20. X. Liu et al., Stochastic answer networks for machine reading comprehension, arXiv preprint arXiv:1712.03556, 2017, pp. 1-11.
  21. F. Sun et al., U-net: Machine reading comprehension with unanswerable questions, arXiv preprint arXiv:1810.06638, 2018, pp. 1-9.
  22. C. Park et al., S3-net: Korean machine reading comprehension using sru-based sentence and self matching networks, Proceeding of KSC (2017), 649-651.
  23. T. Lei, Y. Zhang, and Y. Artzi, Training rnns as fast as cnns, arXiv preprint arXiv:1709.02755, 2017.
  24. O. Vinyals, M. Fortunato, and N. Jaitly, Pointer networks, in Proc. Adv. Neural Inf. Process. Syst. (Montreal, Canada), Dec. 2015, pp. 2692-2700.
  25. T. Mikolov et al., Efficient estimation of word representations in vector space, arXiv preprint arXiv:1301.3781, 2013, pp. 1-12.
  26. D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980, 2014, pp. 1-15.
  27. J. Chung et al., Empirical evaluation of gated recurrent neural networks on sequence modeling, arXiv preprint arXiv:1412.3555, 2014, pp. 1-9.
  28. S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural computation 9 (1997), no. 8, 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
  29. C. Park et al., S2-net: Machine reading comprehension with sru-based self-matching networks, ETRI J. 41 (2019), no. 3, 371-382. https://doi.org/10.4218/etrij.2017-0279
  30. S. Wang and J. Jiang, Machine comprehension using match-lstm and answer pointer, arXiv preprint arXiv:1608.07905, 2016, pp. 1-11.
  31. J. Devlin et al., Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805, 2018, pp. 1-16.
  32. T. Kwiatkowski et al., Natural questions: a benchmark for question answering research. Trans. Association Comput. Linguistics. 7 (2019), 453-466. https://doi.org/10.1162/tacl_a_00276