Browse > Article
http://dx.doi.org/10.4218/etrij.2020-0460

DG-based SPO tuple recognition using self-attention M-Bi-LSTM  

Jung, Joon-young (Artificial Intelligence Research Laboratory, Electronics and Telecommunications Research Institute)
Publication Information
ETRI Journal / v.44, no.3, 2022 , pp. 438-449 More about this Journal
Abstract
This study proposes a dependency grammar-based self-attention multilayered bidirectional long short-term memory (DG-M-Bi-LSTM) model for subject-predicate-object (SPO) tuple recognition from natural language (NL) sentences. To add recent knowledge to the knowledge base autonomously, it is essential to extract knowledge from numerous NL data. Therefore, this study proposes a high-accuracy SPO tuple recognition model that requires a small amount of learning data to extract knowledge from NL sentences. The accuracy of SPO tuple recognition using DG-M-Bi-LSTM is compared with that using NL-based self-attention multilayered bidirectional LSTM, DG-based bidirectional encoder representations from transformers (BERT), and NL-based BERT to evaluate its effectiveness. The DG-M-Bi-LSTM model achieves the best results in terms of recognition accuracy for extracting SPO tuples from NL sentences even if it has fewer deep neural network (DNN) parameters than BERT. In particular, its accuracy is better than that of BERT when the learning data are limited. Additionally, its pretrained DNN parameters can be applied to other domains because it learns the structural relations in NL sentences.
Keywords
dependency grammar; information extraction; long short-term memory; SPO tuple;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 P. Shi and J. Lin, Simple BERT models for relation extraction and semantic role labeling, arXiv preprint, CoRR, 2019, arXiv: 1904.05255.
2 M. Sun et al., Logician: A unified end-to-end neural approach for open-domain information extraction, in Proc. Web Search Data Min. (Los Angeles, CA, USA), Feb. 2018, pp. 556-564.
3 S. Auer et al., DBpedia: A nucleus for a web of open data, in Proc. Int. Semantic Web Conf. (Busan, Republic of Korea), Nov. 2007, pp. 722-735.
4 F. M. Suchanek, G. Kasneci, and G. Weikum, YAGO: A core of semantic knowledge, in Proc. Int. Conf. WWW (Banff, Canada), May 2007, pp. 697-706.
5 B. D. Trisedya, J. Qi, R. Zhang, and W. Wang, GTR-LSTM: A triple encoder for sentence generation from RDF data, in Proc. Annu. Meet. Assoc. Comput. Linguistics (Melbourne, Australia), July 2018, pp. 1627-1637.
6 R. Jozefowicz, W. Zaremba, and I. Sutskever, An empirical exploration of recurrent network architectures, in Proc. Int. Conf. Mach. Learn. (Lille, France), June 2015, pp. 2342-2350.
7 M. F. Y. Ghadikolaie, E. Kabir, and F. Razzazi, Sub-word based offline handwritten farsi word recognition using recurrent neural network, ETRI J. 38 (2016), no. 4, 703-713.   DOI
8 Y. Bengio, P. Simard, and P. Frasconi, Learning long-term dependencies with gradient descent is difficult, IEEE Trans. Neural Netw. 5 (1994), no. 2, 157-166.   DOI
9 S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Comput. 9 (1997), no. 8, 1735-1780.   DOI
10 F. A. Gers and J. Schmidhuber, Recurrent nets that time and count, in Proc. Int. Joint Conf. Neural Netw. (Como, Italy), July 2000, pp. 189-194.
11 A. Graves and J. Schmidhuber, Framewise phoneme classification with bidirectional LSTM and other neural network architectures, Neural Netw. 18 (2005), no. 5-6, 602-610.   DOI
12 A. Graves, N. Jaitly, and A. Mohamed, Hybrid speech recognition with deep bidirectional LSTM, in Proc. IEEE Workshop Autom. Speech Recognit. Underst. (Olomouc, Czech Republic), Dec. 2013, pp. 273-278.
13 A. Parikh et al., A decomposable attention model for natural language inference, in Proc. Empir. Methods Nat. Lang. Process. (Austin, TX, USA), Nov. 2016, pp. 2249-2255.
14 D. Vrandecic and M. Krotzsch, Wikidata: A free collaborative knowledgebase, Commun. ACM 57 (2014), no. 10, 78-85.   DOI
15 N. Kolitsas, O.-E. Ganea, and T. Hofmann, End-to-end neural entity linking, in Proc. Conf. Comput. Nat. Lang. Learn. (Brussels, Belgium), Aug. 2018, pp. 519-529.
16 F. U. M. Ullah et al., Short-term prediction of residential power energy consumption via CNN and multilayer bi-directional LSTM Networks, IEEE Access 8 (2019), 123369-123380.   DOI
17 A. M. Rush, S. Chopra, and J. Weston, A neural attention model for abstractive sentence summarization, in Proc. Empir. Methods Nat. Lang. Process. (Lisbon, Portugal), Sept. 2015, pp. 379-389.
18 M. P. Akhter et al., Document-level text classification using single-layer multisize filters convolutional neural network, IEEE Access 8 (2020), 42689-42707.   DOI
19 B. D. Trisedya, J. Qi, and R. Zhang, Entity alignment between knowledge graphs using attribute embeddings, in Proc. AAAI Conf. on Artif. Intell. (Honolulu, HI, USA), July 2019, pp. 297-304.
20 T. Mikolov et al., Distributed representations of words and phrases and their compositionality, in Proc. Neural Inf. Process. Syst. (Lake Tahoe, NV, USA), Dec. 2013, pp. 3111-3119.
21 S. Riedel, L. Yao, and A. McCallum, Modeling relations and their mentions without labeled text, in Proc. Joint Eur. Conf. Mach. Learn. Knowl. Discov. Databases (Barcelona, Spain), Sept. 2010, pp. 148-163.
22 O. Lehmberg et al., A large public corpus of web tables containing time and context metadata, in Proc. Int. Conf. Companion WWW (Montreal, Canada), Apr. 2016, pp. 75-76.
23 M. Mintz et al., Distant supervision for relation extraction without labeled data, in Proc. Joint Conf. Assoc. Comput. Linguistics & Int. Joint Conf. Natural Lang. Process. AFNLP (Suntec, Singapore), Aug. 2009, pp. 1003-1011.
24 D. Zeng et al., Distant supervision for relation extraction via piecewise convolutional neural networks, in Proc. Conf. Empir. Methods Nat. Lang. Process. (Lisbon, Portugal), Sept. 2015, pp. 1753-1762.
25 M. J. Cafarella et al., WebTables: Exploring the power of tables on the web, in Proc. Very Large Data Base Endowment (Auckland, New Zealand), Aug. 2008, pp. 538-549.
26 M. Banko et al., Open information extraction from the web, in Proc. Int. Joint Conf. Artif. Intell. (Hyderabad, India), Jan. 2007, pp. 2670-2676.
27 W. Khan et al., Deep recurrent neural networks with word embeddings for Urdu named entity recognition, ETRI J. 42 (2020), no. 1, 90-100.   DOI
28 M. Schuster and K. K. Paliwal, Bidirectional recurrent neural networks, IEEE Trans. Signal Process. 45 (1997), no. 11, 2673-2681.   DOI
29 Z. Jiang, P. Yin, and G. Neubig, Improving open information extraction via iterative rank-aware learning, in Proc. Annu. Meet. Assoc. Comput. Linguistics (Florence, Italy), May 2019, pp. 5295-5300.
30 Y. Lin et al., Neural relation extraction with selective attention over instances, in Proc. Annu. Meet. Assoc. Comput. Linguistics (Berlin, Germany), Aug. 2016, pp. 2124-2133.
31 A. Fader, S. Soderland, and O. Etzioni, Identifying relations for open information extraction, in Proc. Conf. Empir. Methods Nat. Lang. Process. (Edinburgh, UK), July 2011, pp. 1535-1545.
32 L. D. Corro and R. Gemulla, ClausIE: Clause-based open information extraction, in Proc. Int. Conf. WWW (Rio de Janeiro, Brazil), May 2013, pp. 355-366.
33 K. Gashteovski, R. Gemulla, and L. D. Corro, MinIE: Minimizing facts in open information extraction, in Proc. Conf. Empir. Methods Nat. Lang. Process. (Copenhagen, Denmark), Sept. 2017, pp. 2630-2640.
34 G. Stanovsky et al., Supervised open information extraction, in Proc. N. Am. Chapter Assoc. Comput. Linguistics: Hum. Lang. Technol. (New Orleans, LA, USA), June 2018, pp. 885-895.
35 L. Cui, F. Wei, and M. Zhou, Neural open information extraction, in Proc. Annu. Meet. Assoc. Comput. Linguistics (Melbourne, Australia), May 2018, pp. 407-413.
36 S. Jia, Y. Xiang, and X. Chen, Supervised neural models revitalize the open relation extraction, arXiv preprint, CoRR, 2018, arXiv: 1809.09408.
37 G. Liu and J. Guo, Bidirectional LSTM with attention mechanism and convolutional layer for text classification, Neurocomputing 337 (2019), 325-338.   DOI
38 V. Mnih et al., Recurrent models of visual attention, in Proc. Int. Conf. Neural Inf. Process. Syst. (Montreal, Canada), Dec. 2014, pp. 2204-2212.
39 D. Bahdanau, K. Cho, and Y. Bengio, Neural machine translation by jointly learning to align and translate, arXiv preprint, CoRR, 2014, arXiv: 1409.0473.
40 Z. Zhang, Y. Zou, and C. Gan, Textual sentiment analysis via three different attention convolutional neural networks and cross-modality consistent regression, Neurocomputing 275 (2018), 1407-1415.   DOI
41 A. Vaswani et al., Attention is all you need, in Proc. Conf. Neural Inf. Process. Syst. (Long Beach, CA, USA), Dec. 2017, pp. 6000-6010.
42 G. Angeli, M. J. J. Premkumar, and C. D. Manning, Leveraging linguistic structure for open domain information extraction, in Proc. Assoc. Comput. (Beijing, China), July 2015, pp. 344-354.
43 Y. Papanikolaou, I. Roberts, and A. Pierleoni, Deep bidirectional transformers for relation extraction without supervision, arXiv preprint, CoRR, 2019, arXiv: 1911.00313.
44 J. Nivre et al., Universal dependencies v1: A multilingual treebank collection, in Proc. Int. Conf. Lang. Resour. Eval. (Portoroz, Slovenia), May 2016, pp. 1659-1666.
45 D. Chen and C. D. Manning, A fast and accurate dependency parser using neural networks, in Proc. Conf. Empir. Methods Nat. Lang. Process. (Doha, Qatar), Oct. 2014, pp. 740-750.
46 J. Cheng, L. Dong, and M. Lapata, Long short-term memorynetworks for machine reading, in Proc. Conf. Empir. Methods Nat. Lang. Process. (Austin, TX, USA), Nov. 2016, pp. 551-561.
47 B. Fetahu, A. Anand, and M. Koutraki, TableNet: An approach for determining fine-grained relations for wikipedia tables, in Proc. Int WWW Conf. (San Francisco, CA, USA), May 2019, pp. 2736-2742.
48 S. Riedel et al., Relation extraction with matrix factorization and universal schemas, in Proc. N. Am. Chapter Assoc. Comput. Linguistics: Hum. Lang. Technol. (Atlanta, GA, USA), June 2013, pp. 74-84.
49 P. Zhou et al., Distant supervision for relation extraction with hierarchical selective attention, Neural Netw. 108 (2018), 240-247.   DOI
50 M. Schmitz et al., Open language learning for information extraction, in Proc. Conf. Empir. Methods Nat. Lang. Process. (Jeju, Republic of Korea), July 2012, pp. 523-534.
51 B. D. Trisedya et al., Neural relation extraction for knowledge base enrichment, in Proc. Annu. Meet. Assoc. Comput. Linguistics (Florence, Italy), July 2019, pp. 229-240.
52 J. Devlin et al., BERT: Pre-training of deep bidirectional transformers for language understanding, in Proc. N. Am. Chapter Assoc. Comput. Linguistics: Hum. Lang. Technol. (Minneapolis, MN, USA), May 2019, pp. 4171-4186.
53 T. Mikolov et al., Efficient estimation of word representations in vector space, arXiv preprint, CoRR, 2013, arXiv: 1301.3781.
54 F. A. Gers, J. Schmidhuber, and F. Cummins, Learning to forget: Continual prediction with LSTM, in Proc. Int. Conf. Artif. Neural Netw. (Edinburgh, UK), Oct. 1999, pp. 850-855.
55 M. D. Marneffe et al., Universal Stanford dependencies: A cross-linguistic typology, in Proc. Int. Conf. Lang. Resour. Eval. (Reykjavik, Iceland), May 2014, pp. 4585-4592.
56 N. Nakashole, G. Weikum, and F. Suchanek, PATTY: A taxonomy of relational patterns with semantic types, in Proc. Conf. Empir. Methods Nat. Lang. Process. (Jeju, Republic of Korea), July 2012, pp. 1135-1145.
57 D. Klein and C. D. Manning, Accurate unlexicalized parsing, in Proc. Annu. Meet. Assoc. Comput. Linguistics (Sapporo, Japan), 34 (2003), pp. 423-430.
58 J. Pennington, R. Socher, and C. D. Manning, GloVe: Global vectors for word representation, in Proc. Conf. Empir. Methods Nat. Lang. Process. (Doha, Qatar), Oct. 2014, pp. 1532-1543.