DG-based SPO tuple recognition using self-attention M-Bi-LSTM

Jung, Joon-young;

doi:10.4218/etrij.2020-0460

ETRI Journal

Volume 44 Issue 3
/
Pages.438-449
/
2022
/
1225-6463(pISSN)
/
2233-7326(eISSN)

Electronics and Telecommunications Research Institute (한국전자통신연구원)

DOI QR Code

DG-based SPO tuple recognition using self-attention M-Bi-LSTM

Jung, Joon-young (Artificial Intelligence Research Laboratory, Electronics and Telecommunications Research Institute)

Received : 2020.12.07
Accepted : 2021.08.23
Published : 2022.06.10

https://doi.org/10.4218/etrij.2020-0460 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

This study proposes a dependency grammar-based self-attention multilayered bidirectional long short-term memory (DG-M-Bi-LSTM) model for subject-predicate-object (SPO) tuple recognition from natural language (NL) sentences. To add recent knowledge to the knowledge base autonomously, it is essential to extract knowledge from numerous NL data. Therefore, this study proposes a high-accuracy SPO tuple recognition model that requires a small amount of learning data to extract knowledge from NL sentences. The accuracy of SPO tuple recognition using DG-M-Bi-LSTM is compared with that using NL-based self-attention multilayered bidirectional LSTM, DG-based bidirectional encoder representations from transformers (BERT), and NL-based BERT to evaluate its effectiveness. The DG-M-Bi-LSTM model achieves the best results in terms of recognition accuracy for extracting SPO tuples from NL sentences even if it has fewer deep neural network (DNN) parameters than BERT. In particular, its accuracy is better than that of BERT when the learning data are limited. Additionally, its pretrained DNN parameters can be applied to other domains because it learns the structural relations in NL sentences.

Keywords

Acknowledgement

This work was supported by Electronics and Telecommunications Research Institute (ETRI) grant funded by the Korean government (21ZS1100, Core Technology Research for Self-Improving Integrated Artificial Intelligence System).

References

S. Auer et al., DBpedia: A nucleus for a web of open data, in Proc. Int. Semantic Web Conf. (Busan, Republic of Korea), Nov. 2007, pp. 722-735.
D. Vrandecic and M. Krotzsch, Wikidata: A free collaborative knowledgebase, Commun. ACM 57 (2014), no. 10, 78-85. https://doi.org/10.1145/2629489
F. M. Suchanek, G. Kasneci, and G. Weikum, YAGO: A core of semantic knowledge, in Proc. Int. Conf. WWW (Banff, Canada), May 2007, pp. 697-706.
N. Kolitsas, O.-E. Ganea, and T. Hofmann, End-to-end neural entity linking, in Proc. Conf. Comput. Nat. Lang. Learn. (Brussels, Belgium), Aug. 2018, pp. 519-529.
B. D. Trisedya, J. Qi, and R. Zhang, Entity alignment between knowledge graphs using attribute embeddings, in Proc. AAAI Conf. on Artif. Intell. (Honolulu, HI, USA), July 2019, pp. 297-304.
B. D. Trisedya, J. Qi, R. Zhang, and W. Wang, GTR-LSTM: A triple encoder for sentence generation from RDF data, in Proc. Annu. Meet. Assoc. Comput. Linguistics (Melbourne, Australia), July 2018, pp. 1627-1637.
M. J. Cafarella et al., WebTables: Exploring the power of tables on the web, in Proc. Very Large Data Base Endowment (Auckland, New Zealand), Aug. 2008, pp. 538-549.
O. Lehmberg et al., A large public corpus of web tables containing time and context metadata, in Proc. Int. Conf. Companion WWW (Montreal, Canada), Apr. 2016, pp. 75-76.
B. Fetahu, A. Anand, and M. Koutraki, TableNet: An approach for determining fine-grained relations for wikipedia tables, in Proc. Int WWW Conf. (San Francisco, CA, USA), May 2019, pp. 2736-2742.
M. Mintz et al., Distant supervision for relation extraction without labeled data, in Proc. Joint Conf. Assoc. Comput. Linguistics & Int. Joint Conf. Natural Lang. Process. AFNLP (Suntec, Singapore), Aug. 2009, pp. 1003-1011.
S. Riedel, L. Yao, and A. McCallum, Modeling relations and their mentions without labeled text, in Proc. Joint Eur. Conf. Mach. Learn. Knowl. Discov. Databases (Barcelona, Spain), Sept. 2010, pp. 148-163.
S. Riedel et al., Relation extraction with matrix factorization and universal schemas, in Proc. N. Am. Chapter Assoc. Comput. Linguistics: Hum. Lang. Technol. (Atlanta, GA, USA), June 2013, pp. 74-84.
D. Zeng et al., Distant supervision for relation extraction via piecewise convolutional neural networks, in Proc. Conf. Empir. Methods Nat. Lang. Process. (Lisbon, Portugal), Sept. 2015, pp. 1753-1762.
Y. Lin et al., Neural relation extraction with selective attention over instances, in Proc. Annu. Meet. Assoc. Comput. Linguistics (Berlin, Germany), Aug. 2016, pp. 2124-2133.
P. Zhou et al., Distant supervision for relation extraction with hierarchical selective attention, Neural Netw. 108 (2018), 240-247. https://doi.org/10.1016/j.neunet.2018.08.016
M. Banko et al., Open information extraction from the web, in Proc. Int. Joint Conf. Artif. Intell. (Hyderabad, India), Jan. 2007, pp. 2670-2676.
A. Fader, S. Soderland, and O. Etzioni, Identifying relations for open information extraction, in Proc. Conf. Empir. Methods Nat. Lang. Process. (Edinburgh, UK), July 2011, pp. 1535-1545.
M. Schmitz et al., Open language learning for information extraction, in Proc. Conf. Empir. Methods Nat. Lang. Process. (Jeju, Republic of Korea), July 2012, pp. 523-534.
L. D. Corro and R. Gemulla, ClausIE: Clause-based open information extraction, in Proc. Int. Conf. WWW (Rio de Janeiro, Brazil), May 2013, pp. 355-366.
K. Gashteovski, R. Gemulla, and L. D. Corro, MinIE: Minimizing facts in open information extraction, in Proc. Conf. Empir. Methods Nat. Lang. Process. (Copenhagen, Denmark), Sept. 2017, pp. 2630-2640.
G. Angeli, M. J. J. Premkumar, and C. D. Manning, Leveraging linguistic structure for open domain information extraction, in Proc. Assoc. Comput. (Beijing, China), July 2015, pp. 344-354.
G. Stanovsky et al., Supervised open information extraction, in Proc. N. Am. Chapter Assoc. Comput. Linguistics: Hum. Lang. Technol. (New Orleans, LA, USA), June 2018, pp. 885-895.
L. Cui, F. Wei, and M. Zhou, Neural open information extraction, in Proc. Annu. Meet. Assoc. Comput. Linguistics (Melbourne, Australia), May 2018, pp. 407-413.
M. Sun et al., Logician: A unified end-to-end neural approach for open-domain information extraction, in Proc. Web Search Data Min. (Los Angeles, CA, USA), Feb. 2018, pp. 556-564.
S. Jia, Y. Xiang, and X. Chen, Supervised neural models revitalize the open relation extraction, arXiv preprint, CoRR, 2018, arXiv: 1809.09408.
Z. Jiang, P. Yin, and G. Neubig, Improving open information extraction via iterative rank-aware learning, in Proc. Annu. Meet. Assoc. Comput. Linguistics (Florence, Italy), May 2019, pp. 5295-5300.
B. D. Trisedya et al., Neural relation extraction for knowledge base enrichment, in Proc. Annu. Meet. Assoc. Comput. Linguistics (Florence, Italy), July 2019, pp. 229-240.
P. Shi and J. Lin, Simple BERT models for relation extraction and semantic role labeling, arXiv preprint, CoRR, 2019, arXiv: 1904.05255.
Y. Papanikolaou, I. Roberts, and A. Pierleoni, Deep bidirectional transformers for relation extraction without supervision, arXiv preprint, CoRR, 2019, arXiv: 1911.00313.
J. Devlin et al., BERT: Pre-training of deep bidirectional transformers for language understanding, in Proc. N. Am. Chapter Assoc. Comput. Linguistics: Hum. Lang. Technol. (Minneapolis, MN, USA), May 2019, pp. 4171-4186.
M. D. Marneffe et al., Universal Stanford dependencies: A cross-linguistic typology, in Proc. Int. Conf. Lang. Resour. Eval. (Reykjavik, Iceland), May 2014, pp. 4585-4592.
J. Nivre et al., Universal dependencies v1: A multilingual treebank collection, in Proc. Int. Conf. Lang. Resour. Eval. (Portoroz, Slovenia), May 2016, pp. 1659-1666.
N. Nakashole, G. Weikum, and F. Suchanek, PATTY: A taxonomy of relational patterns with semantic types, in Proc. Conf. Empir. Methods Nat. Lang. Process. (Jeju, Republic of Korea), July 2012, pp. 1135-1145.
D. Klein and C. D. Manning, Accurate unlexicalized parsing, in Proc. Annu. Meet. Assoc. Comput. Linguistics (Sapporo, Japan), 34 (2003), pp. 423-430.
D. Chen and C. D. Manning, A fast and accurate dependency parser using neural networks, in Proc. Conf. Empir. Methods Nat. Lang. Process. (Doha, Qatar), Oct. 2014, pp. 740-750.
T. Mikolov et al., Efficient estimation of word representations in vector space, arXiv preprint, CoRR, 2013, arXiv: 1301.3781.
T. Mikolov et al., Distributed representations of words and phrases and their compositionality, in Proc. Neural Inf. Process. Syst. (Lake Tahoe, NV, USA), Dec. 2013, pp. 3111-3119.
J. Pennington, R. Socher, and C. D. Manning, GloVe: Global vectors for word representation, in Proc. Conf. Empir. Methods Nat. Lang. Process. (Doha, Qatar), Oct. 2014, pp. 1532-1543.
R. Jozefowicz, W. Zaremba, and I. Sutskever, An empirical exploration of recurrent network architectures, in Proc. Int. Conf. Mach. Learn. (Lille, France), June 2015, pp. 2342-2350.
M. F. Y. Ghadikolaie, E. Kabir, and F. Razzazi, Sub-word based offline handwritten farsi word recognition using recurrent neural network, ETRI J. 38 (2016), no. 4, 703-713. https://doi.org/10.4218/etrij.16.0115.0542
W. Khan et al., Deep recurrent neural networks with word embeddings for Urdu named entity recognition, ETRI J. 42 (2020), no. 1, 90-100. https://doi.org/10.4218/etrij.2018-0553
Y. Bengio, P. Simard, and P. Frasconi, Learning long-term dependencies with gradient descent is difficult, IEEE Trans. Neural Netw. 5 (1994), no. 2, 157-166. https://doi.org/10.1109/72.279181
S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Comput. 9 (1997), no. 8, 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
F. A. Gers, J. Schmidhuber, and F. Cummins, Learning to forget: Continual prediction with LSTM, in Proc. Int. Conf. Artif. Neural Netw. (Edinburgh, UK), Oct. 1999, pp. 850-855.
F. A. Gers and J. Schmidhuber, Recurrent nets that time and count, in Proc. Int. Joint Conf. Neural Netw. (Como, Italy), July 2000, pp. 189-194.
A. Graves and J. Schmidhuber, Framewise phoneme classification with bidirectional LSTM and other neural network architectures, Neural Netw. 18 (2005), no. 5-6, 602-610. https://doi.org/10.1016/j.neunet.2005.06.042
M. Schuster and K. K. Paliwal, Bidirectional recurrent neural networks, IEEE Trans. Signal Process. 45 (1997), no. 11, 2673-2681. https://doi.org/10.1109/78.650093
A. Graves, N. Jaitly, and A. Mohamed, Hybrid speech recognition with deep bidirectional LSTM, in Proc. IEEE Workshop Autom. Speech Recognit. Underst. (Olomouc, Czech Republic), Dec. 2013, pp. 273-278.
F. U. M. Ullah et al., Short-term prediction of residential power energy consumption via CNN and multilayer bi-directional LSTM Networks, IEEE Access 8 (2019), 123369-123380. https://doi.org/10.1109/access.2019.2963045
V. Mnih et al., Recurrent models of visual attention, in Proc. Int. Conf. Neural Inf. Process. Syst. (Montreal, Canada), Dec. 2014, pp. 2204-2212.
D. Bahdanau, K. Cho, and Y. Bengio, Neural machine translation by jointly learning to align and translate, arXiv preprint, CoRR, 2014, arXiv: 1409.0473.
A. M. Rush, S. Chopra, and J. Weston, A neural attention model for abstractive sentence summarization, in Proc. Empir. Methods Nat. Lang. Process. (Lisbon, Portugal), Sept. 2015, pp. 379-389.
Z. Zhang, Y. Zou, and C. Gan, Textual sentiment analysis via three different attention convolutional neural networks and cross-modality consistent regression, Neurocomputing 275 (2018), 1407-1415. https://doi.org/10.1016/j.neucom.2017.09.080
G. Liu and J. Guo, Bidirectional LSTM with attention mechanism and convolutional layer for text classification, Neurocomputing 337 (2019), 325-338. https://doi.org/10.1016/j.neucom.2019.01.078
M. P. Akhter et al., Document-level text classification using single-layer multisize filters convolutional neural network, IEEE Access 8 (2020), 42689-42707. https://doi.org/10.1109/access.2020.2976744
A. Vaswani et al., Attention is all you need, in Proc. Conf. Neural Inf. Process. Syst. (Long Beach, CA, USA), Dec. 2017, pp. 6000-6010.
J. Cheng, L. Dong, and M. Lapata, Long short-term memorynetworks for machine reading, in Proc. Conf. Empir. Methods Nat. Lang. Process. (Austin, TX, USA), Nov. 2016, pp. 551-561.
A. Parikh et al., A decomposable attention model for natural language inference, in Proc. Empir. Methods Nat. Lang. Process. (Austin, TX, USA), Nov. 2016, pp. 2249-2255.

ETRI Journal

DG-based SPO tuple recognition using self-attention M-Bi-LSTM

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)