DOI QR코드

DOI QR Code

DG-based SPO tuple recognition using self-attention M-Bi-LSTM

  • Jung, Joon-young (Artificial Intelligence Research Laboratory, Electronics and Telecommunications Research Institute)
  • Received : 2020.12.07
  • Accepted : 2021.08.23
  • Published : 2022.06.10

Abstract

This study proposes a dependency grammar-based self-attention multilayered bidirectional long short-term memory (DG-M-Bi-LSTM) model for subject-predicate-object (SPO) tuple recognition from natural language (NL) sentences. To add recent knowledge to the knowledge base autonomously, it is essential to extract knowledge from numerous NL data. Therefore, this study proposes a high-accuracy SPO tuple recognition model that requires a small amount of learning data to extract knowledge from NL sentences. The accuracy of SPO tuple recognition using DG-M-Bi-LSTM is compared with that using NL-based self-attention multilayered bidirectional LSTM, DG-based bidirectional encoder representations from transformers (BERT), and NL-based BERT to evaluate its effectiveness. The DG-M-Bi-LSTM model achieves the best results in terms of recognition accuracy for extracting SPO tuples from NL sentences even if it has fewer deep neural network (DNN) parameters than BERT. In particular, its accuracy is better than that of BERT when the learning data are limited. Additionally, its pretrained DNN parameters can be applied to other domains because it learns the structural relations in NL sentences.

Keywords

Acknowledgement

This work was supported by Electronics and Telecommunications Research Institute (ETRI) grant funded by the Korean government (21ZS1100, Core Technology Research for Self-Improving Integrated Artificial Intelligence System).

References

  1. S. Auer et al., DBpedia: A nucleus for a web of open data, in Proc. Int. Semantic Web Conf. (Busan, Republic of Korea), Nov. 2007, pp. 722-735.
  2. D. Vrandecic and M. Krotzsch, Wikidata: A free collaborative knowledgebase, Commun. ACM 57 (2014), no. 10, 78-85. https://doi.org/10.1145/2629489
  3. F. M. Suchanek, G. Kasneci, and G. Weikum, YAGO: A core of semantic knowledge, in Proc. Int. Conf. WWW (Banff, Canada), May 2007, pp. 697-706.
  4. N. Kolitsas, O.-E. Ganea, and T. Hofmann, End-to-end neural entity linking, in Proc. Conf. Comput. Nat. Lang. Learn. (Brussels, Belgium), Aug. 2018, pp. 519-529.
  5. B. D. Trisedya, J. Qi, and R. Zhang, Entity alignment between knowledge graphs using attribute embeddings, in Proc. AAAI Conf. on Artif. Intell. (Honolulu, HI, USA), July 2019, pp. 297-304.
  6. B. D. Trisedya, J. Qi, R. Zhang, and W. Wang, GTR-LSTM: A triple encoder for sentence generation from RDF data, in Proc. Annu. Meet. Assoc. Comput. Linguistics (Melbourne, Australia), July 2018, pp. 1627-1637.
  7. M. J. Cafarella et al., WebTables: Exploring the power of tables on the web, in Proc. Very Large Data Base Endowment (Auckland, New Zealand), Aug. 2008, pp. 538-549.
  8. O. Lehmberg et al., A large public corpus of web tables containing time and context metadata, in Proc. Int. Conf. Companion WWW (Montreal, Canada), Apr. 2016, pp. 75-76.
  9. B. Fetahu, A. Anand, and M. Koutraki, TableNet: An approach for determining fine-grained relations for wikipedia tables, in Proc. Int WWW Conf. (San Francisco, CA, USA), May 2019, pp. 2736-2742.
  10. M. Mintz et al., Distant supervision for relation extraction without labeled data, in Proc. Joint Conf. Assoc. Comput. Linguistics & Int. Joint Conf. Natural Lang. Process. AFNLP (Suntec, Singapore), Aug. 2009, pp. 1003-1011.
  11. S. Riedel, L. Yao, and A. McCallum, Modeling relations and their mentions without labeled text, in Proc. Joint Eur. Conf. Mach. Learn. Knowl. Discov. Databases (Barcelona, Spain), Sept. 2010, pp. 148-163.
  12. S. Riedel et al., Relation extraction with matrix factorization and universal schemas, in Proc. N. Am. Chapter Assoc. Comput. Linguistics: Hum. Lang. Technol. (Atlanta, GA, USA), June 2013, pp. 74-84.
  13. D. Zeng et al., Distant supervision for relation extraction via piecewise convolutional neural networks, in Proc. Conf. Empir. Methods Nat. Lang. Process. (Lisbon, Portugal), Sept. 2015, pp. 1753-1762.
  14. Y. Lin et al., Neural relation extraction with selective attention over instances, in Proc. Annu. Meet. Assoc. Comput. Linguistics (Berlin, Germany), Aug. 2016, pp. 2124-2133.
  15. P. Zhou et al., Distant supervision for relation extraction with hierarchical selective attention, Neural Netw. 108 (2018), 240-247. https://doi.org/10.1016/j.neunet.2018.08.016
  16. M. Banko et al., Open information extraction from the web, in Proc. Int. Joint Conf. Artif. Intell. (Hyderabad, India), Jan. 2007, pp. 2670-2676.
  17. A. Fader, S. Soderland, and O. Etzioni, Identifying relations for open information extraction, in Proc. Conf. Empir. Methods Nat. Lang. Process. (Edinburgh, UK), July 2011, pp. 1535-1545.
  18. M. Schmitz et al., Open language learning for information extraction, in Proc. Conf. Empir. Methods Nat. Lang. Process. (Jeju, Republic of Korea), July 2012, pp. 523-534.
  19. L. D. Corro and R. Gemulla, ClausIE: Clause-based open information extraction, in Proc. Int. Conf. WWW (Rio de Janeiro, Brazil), May 2013, pp. 355-366.
  20. K. Gashteovski, R. Gemulla, and L. D. Corro, MinIE: Minimizing facts in open information extraction, in Proc. Conf. Empir. Methods Nat. Lang. Process. (Copenhagen, Denmark), Sept. 2017, pp. 2630-2640.
  21. G. Angeli, M. J. J. Premkumar, and C. D. Manning, Leveraging linguistic structure for open domain information extraction, in Proc. Assoc. Comput. (Beijing, China), July 2015, pp. 344-354.
  22. G. Stanovsky et al., Supervised open information extraction, in Proc. N. Am. Chapter Assoc. Comput. Linguistics: Hum. Lang. Technol. (New Orleans, LA, USA), June 2018, pp. 885-895.
  23. L. Cui, F. Wei, and M. Zhou, Neural open information extraction, in Proc. Annu. Meet. Assoc. Comput. Linguistics (Melbourne, Australia), May 2018, pp. 407-413.
  24. M. Sun et al., Logician: A unified end-to-end neural approach for open-domain information extraction, in Proc. Web Search Data Min. (Los Angeles, CA, USA), Feb. 2018, pp. 556-564.
  25. S. Jia, Y. Xiang, and X. Chen, Supervised neural models revitalize the open relation extraction, arXiv preprint, CoRR, 2018, arXiv: 1809.09408.
  26. Z. Jiang, P. Yin, and G. Neubig, Improving open information extraction via iterative rank-aware learning, in Proc. Annu. Meet. Assoc. Comput. Linguistics (Florence, Italy), May 2019, pp. 5295-5300.
  27. B. D. Trisedya et al., Neural relation extraction for knowledge base enrichment, in Proc. Annu. Meet. Assoc. Comput. Linguistics (Florence, Italy), July 2019, pp. 229-240.
  28. P. Shi and J. Lin, Simple BERT models for relation extraction and semantic role labeling, arXiv preprint, CoRR, 2019, arXiv: 1904.05255.
  29. Y. Papanikolaou, I. Roberts, and A. Pierleoni, Deep bidirectional transformers for relation extraction without supervision, arXiv preprint, CoRR, 2019, arXiv: 1911.00313.
  30. J. Devlin et al., BERT: Pre-training of deep bidirectional transformers for language understanding, in Proc. N. Am. Chapter Assoc. Comput. Linguistics: Hum. Lang. Technol. (Minneapolis, MN, USA), May 2019, pp. 4171-4186.
  31. M. D. Marneffe et al., Universal Stanford dependencies: A cross-linguistic typology, in Proc. Int. Conf. Lang. Resour. Eval. (Reykjavik, Iceland), May 2014, pp. 4585-4592.
  32. J. Nivre et al., Universal dependencies v1: A multilingual treebank collection, in Proc. Int. Conf. Lang. Resour. Eval. (Portoroz, Slovenia), May 2016, pp. 1659-1666.
  33. N. Nakashole, G. Weikum, and F. Suchanek, PATTY: A taxonomy of relational patterns with semantic types, in Proc. Conf. Empir. Methods Nat. Lang. Process. (Jeju, Republic of Korea), July 2012, pp. 1135-1145.
  34. D. Klein and C. D. Manning, Accurate unlexicalized parsing, in Proc. Annu. Meet. Assoc. Comput. Linguistics (Sapporo, Japan), 34 (2003), pp. 423-430.
  35. D. Chen and C. D. Manning, A fast and accurate dependency parser using neural networks, in Proc. Conf. Empir. Methods Nat. Lang. Process. (Doha, Qatar), Oct. 2014, pp. 740-750.
  36. T. Mikolov et al., Efficient estimation of word representations in vector space, arXiv preprint, CoRR, 2013, arXiv: 1301.3781.
  37. T. Mikolov et al., Distributed representations of words and phrases and their compositionality, in Proc. Neural Inf. Process. Syst. (Lake Tahoe, NV, USA), Dec. 2013, pp. 3111-3119.
  38. J. Pennington, R. Socher, and C. D. Manning, GloVe: Global vectors for word representation, in Proc. Conf. Empir. Methods Nat. Lang. Process. (Doha, Qatar), Oct. 2014, pp. 1532-1543.
  39. R. Jozefowicz, W. Zaremba, and I. Sutskever, An empirical exploration of recurrent network architectures, in Proc. Int. Conf. Mach. Learn. (Lille, France), June 2015, pp. 2342-2350.
  40. M. F. Y. Ghadikolaie, E. Kabir, and F. Razzazi, Sub-word based offline handwritten farsi word recognition using recurrent neural network, ETRI J. 38 (2016), no. 4, 703-713. https://doi.org/10.4218/etrij.16.0115.0542
  41. W. Khan et al., Deep recurrent neural networks with word embeddings for Urdu named entity recognition, ETRI J. 42 (2020), no. 1, 90-100. https://doi.org/10.4218/etrij.2018-0553
  42. Y. Bengio, P. Simard, and P. Frasconi, Learning long-term dependencies with gradient descent is difficult, IEEE Trans. Neural Netw. 5 (1994), no. 2, 157-166. https://doi.org/10.1109/72.279181
  43. S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Comput. 9 (1997), no. 8, 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
  44. F. A. Gers, J. Schmidhuber, and F. Cummins, Learning to forget: Continual prediction with LSTM, in Proc. Int. Conf. Artif. Neural Netw. (Edinburgh, UK), Oct. 1999, pp. 850-855.
  45. F. A. Gers and J. Schmidhuber, Recurrent nets that time and count, in Proc. Int. Joint Conf. Neural Netw. (Como, Italy), July 2000, pp. 189-194.
  46. A. Graves and J. Schmidhuber, Framewise phoneme classification with bidirectional LSTM and other neural network architectures, Neural Netw. 18 (2005), no. 5-6, 602-610. https://doi.org/10.1016/j.neunet.2005.06.042
  47. M. Schuster and K. K. Paliwal, Bidirectional recurrent neural networks, IEEE Trans. Signal Process. 45 (1997), no. 11, 2673-2681. https://doi.org/10.1109/78.650093
  48. A. Graves, N. Jaitly, and A. Mohamed, Hybrid speech recognition with deep bidirectional LSTM, in Proc. IEEE Workshop Autom. Speech Recognit. Underst. (Olomouc, Czech Republic), Dec. 2013, pp. 273-278.
  49. F. U. M. Ullah et al., Short-term prediction of residential power energy consumption via CNN and multilayer bi-directional LSTM Networks, IEEE Access 8 (2019), 123369-123380. https://doi.org/10.1109/access.2019.2963045
  50. V. Mnih et al., Recurrent models of visual attention, in Proc. Int. Conf. Neural Inf. Process. Syst. (Montreal, Canada), Dec. 2014, pp. 2204-2212.
  51. D. Bahdanau, K. Cho, and Y. Bengio, Neural machine translation by jointly learning to align and translate, arXiv preprint, CoRR, 2014, arXiv: 1409.0473.
  52. A. M. Rush, S. Chopra, and J. Weston, A neural attention model for abstractive sentence summarization, in Proc. Empir. Methods Nat. Lang. Process. (Lisbon, Portugal), Sept. 2015, pp. 379-389.
  53. Z. Zhang, Y. Zou, and C. Gan, Textual sentiment analysis via three different attention convolutional neural networks and cross-modality consistent regression, Neurocomputing 275 (2018), 1407-1415. https://doi.org/10.1016/j.neucom.2017.09.080
  54. G. Liu and J. Guo, Bidirectional LSTM with attention mechanism and convolutional layer for text classification, Neurocomputing 337 (2019), 325-338. https://doi.org/10.1016/j.neucom.2019.01.078
  55. M. P. Akhter et al., Document-level text classification using single-layer multisize filters convolutional neural network, IEEE Access 8 (2020), 42689-42707. https://doi.org/10.1109/access.2020.2976744
  56. A. Vaswani et al., Attention is all you need, in Proc. Conf. Neural Inf. Process. Syst. (Long Beach, CA, USA), Dec. 2017, pp. 6000-6010.
  57. J. Cheng, L. Dong, and M. Lapata, Long short-term memorynetworks for machine reading, in Proc. Conf. Empir. Methods Nat. Lang. Process. (Austin, TX, USA), Nov. 2016, pp. 551-561.
  58. A. Parikh et al., A decomposable attention model for natural language inference, in Proc. Empir. Methods Nat. Lang. Process. (Austin, TX, USA), Nov. 2016, pp. 2249-2255.