DOI QR코드

DOI QR Code

CR-M-SpanBERT: Multiple embedding-based DNN coreference resolution using self-attention SpanBERT

  • Joon-young Jung (Superintelligence Creative Research Laboratory, Electronics and Telecommunications Research Institute)
  • Received : 2023.08.04
  • Accepted : 2023.12.20
  • Published : 2024.02.20

Abstract

This study introduces CR-M-SpanBERT, a coreference resolution (CR) model that utilizes multiple embedding-based span bidirectional encoder representations from transformers, for antecedent recognition in natural language (NL) text. Information extraction studies aimed to extract knowledge from NL text autonomously and cost-effectively. However, the extracted information may not represent knowledge accurately owing to the presence of ambiguous entities. Therefore, we propose a CR model that identifies mentions referring to the same entity in NL text. In the case of CR, it is necessary to understand both the syntax and semantics of the NL text simultaneously. Therefore, multiple embeddings are generated for CR, which can include syntactic and semantic information for each word. We evaluate the effectiveness of CR-M-SpanBERT by comparing it to a model that uses SpanBERT as the language model in CR studies. The results demonstrate that our proposed deep neural network model achieves high-recognition accuracy for extracting antecedents from NL text. Additionally, it requires fewer epochs to achieve an average F1 accuracy greater than 75% compared with the conventional SpanBERT approach.

Keywords

Acknowledgement

This study was supported by an Electronics and Telecommunications Research Institute (ETRI) grant funded by the Korean Government (23ZS1100, Core Technology Research for Self-Improving Integrated Artificial Intelligence System).

References

  1. F. M. Suchanek, G. Kasneci, and G. Weikum, YAGO: a core of semantic knowledge, (Proc. Int. Conf. WWW, Banff, Canada), 2007, pp. 697-706.
  2. D. Vrandecic and M. Krotzsch, Wikidata: a free collaborative knowledgebase, Communications of the ACM 57 (2014), no. 10, 78-85. https://doi.org/10.1145/2629489
  3. S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, Z. Ives, DBpedia: a nucleus for a web of open data, (Proc. Int. Semantic Web Conf., Busan, Republic of Korea), 2007, pp. 722-735.
  4. J. Jung, DG-based SPO tuple recognition using self-attention M-bi-LSTM, ETRI J. 44 (2022), no. 3, 438-449. https://doi.org/10.4218/etrij.2020-0460
  5. N. Q. Luong and A. Popescu-Belis, Improving pronoun translation by modeling coreference uncertainty, (Proc. Conf. Machine Translation, Berlin, Germany), 2016, pp. 12-20.
  6. A. Mitra and C. Baral, Addressing a question answering challenge by combining statistical methods with inductive rule learning and reasoning, (Proc. AAAI Conf. on Artificial Intelligence, Phoenix, AZ, USA), 2016, pp. 2779-2785.
  7. C. Aone and S. W. Bennett, Evaluating automated and manual acquisition of anaphora resolution strategies, (Proc. Association for Computational Linguistics, Cambridge, MA, USA), 1995, pp. 122-129.
  8. J. McCarthy and W. Lehnert, Using decision trees for coreference resolution, (Proc. Int. Conf. on Artificial Intelligence, Montreal, Canada), 1995, pp. 1050-1055.
  9. W. M. Soon, H. T. Ng, and D. C. Y. Lim, A machine learning approach to coreference resolution of noun phrases, Comput. Linguist. 27 (2001), no. 4, 521-544. https://doi.org/10.1162/089120101753342653
  10. V. Ng and C. Cardie, Improving machine learning approaches to coreference resolution, (Proc. Association for Computational Linguistic, Philadelphia, PA, USA), 2002, pp. 104-111.
  11. E. Bengtson and D. Roth, Understanding the value of features for coreference resolution, (Conf. on Empirical Methods in Natural Language Processing, Honolulu, HI, USA), 2008, pp. 294-303.
  12. A. Bjorkelund and J. Kuhn, Learning structured perceptrons for coreference resolution with latent antecedents and non-local features, (Proc. Association for Computational Linguistics, Baltimore, MD, USA), 2014, pp. 47-57.
  13. K. Clark and C. D. Manning, Entity-centric coreference resolution with model stacking, (Proc. Association for Computational Linguistics and Int. Joint Conf. on Natural Language Processing, Beijing, China), 2015, pp. 1405-1415.
  14. S. Wiseman, A. M. Rush, and S. M. Shieber, Learning global features for coreference resolution, (Proc. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA), 2016, pp. 994-1004.
  15. G. Durrett and D. Klein, Easy victories and uphill battles in coreference resolution, (Proc. Empirical Methods in Natural Language Processing, Seattle, WA, USA), 2013, pp. 1971-1982.
  16. S. Wiseman, A. M. Rush, S. Shieber and J. Weston, Learning anaphoricity and antecedent ranking features for coreference resolution, (Proc. Association for Computational Linguistics and Int. Joint Conf. on Natural Language Processing, Beijing, China), 2015, pp. 1416-1426.
  17. A. Marasovic, L. Born, J. Opitz and A. Frank, A mention-ranking model for abstract anaphora resolution, (Proc. Empirical Methods in Natural Language Processing, Copenhagen, Denmark), 2017, pp. 221-232.
  18. K. Lee, L. He, and L. Zettlemoyer, Higher-order coreference resolution with coarse-to-fine inference, (Proc. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, LA, USA), 2018, pp. 687-692.
  19. B. Kantor and A. Globerson, Coreference resolution with entity equalization. (Proc. Association for Computational Linguistics, Florence, Italy), 2019, pp. 673-677.
  20. M. Joshi, O. Levy, L. Zettlemoyer and D. Weld, BERT for coreference resolution: baselines and analysis, (Proc. Empirical Methods in Natural Language Processing and Int. Joint Conf. on Natural Language Processing, Hong Kong, China), 2019, pp. 5803-5808.
  21. M. Joshi, D. Chen, Y. Liu, D. S. Weld, L. Zettlemoyer, and O. Levy, SpanBERT: improving pre-training by representing and predicting spans, Trans. Assoc. Comput. Linguist. 8 (2020), 64-77. https://doi.org/10.1162/tacl_a_00300
  22. C. Park, J. Lim, J. Ryu, H. Kim, and C. Lee, Simple and effective neural coreference resolution for Korean language, ETRI J. 43 (2021), no. 6, 1038-1048. https://doi.org/10.4218/etrij.2020-0282
  23. Y. Kirstain, O. Ram and O. Levy, Coreference resolution without span representations, (Proc. Association for Computational Linguistics and Int. Joint Conf. on Natural Language Processing, Bangkok, Thailand), 2021, pp. 14-19.
  24. I. Beltagy, M. E. Peters and A. Cohan, Longformer: the long-document transformer, arXiv , 2020. https://doi.org/10.48550/arXiv.2004.05150
  25. T. Liu, Y. E. Jiang, N. Monath, R. Cotterell, M. Sachan Autoregressive structured prediction with language models, (Findings of the Association for Computational Linguistics: EMNLP, Abu Dhabi, United Arab Emirates), 2022, pp. 993-1005.
  26. B. Bohnet, C. Alberti, and M. Collins, Coreference resolution through a seq2seq transition-based system, Trans. Assoc. Comput. Linguist 11 (2023), 212-226. https://doi.org/10.1162/tacl_a_00543
  27. J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: pretraining of deep bidirectional transformers for language understanding, (Proc. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA), 2019, pp. 4171-4186.
  28. T. Mikolov, K. Chen, G. S. Corrado and J. Dean, Efficient estimation of word representations in vector space, arXiv, 2013, https://doi.org/48550/arXiv.1301.3781
  29. T. Mikolov, I. Sutskever, K. Chen, G. Corrado and J. Dean, Distributed representations of words and phrases and their compositionality, (Proc. Neural Information Processing Systems, Stateline, NV, USA), 2013, pp. 3111-3119.
  30. J. Pennington, R. Socher, and C. D. Manning, GloVe: global vectors for word representation, (Proc. Empirical Methods in Natural Language Processing, Doha, Qatar), 2014, pp. 1532-1543.
  31. D. Chen and C. D. Manning, A fast and accurate dependency parser using neural networks, (Proc. Empirical Methods in Natural Language Processing, Doha, Qatar), 2014, pp. 740-750.
  32. J. Nivre, M.-C. de Marneffe, F. Ginter, Y. Goldberg, J. Hajic, C. D. Manning, R. McDonald, S. Petrov, S. Pyysalo, N. Silveira, R. Tsarfaty, D. Zeman, Universal dependencies v1: a multilingual treebank collection, (Proc. Int. Conf. on Language Resources and Evaluation, Portoroz, Slovenia), 2016, pp. 1659-1666.
  33. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N Gomez, L. Kaiser, and I. Polosukhin, Attention is all you need, (Proc. Neural Information Processing Systems, Long Beach, CA, USA), 2017, pp. 6000-6010.
  34. N. S. Moosavi and M. Strube, Which coreference evaluation metric do you trust? A proposal for a link-based entity aware metric, (Proc. Association for Computational Linguistics, Berlin, Germany), 2016, pp. 632-642.
  35. L. Miculicich and J. Henderson, Graph refinement for coreference resolution, (Proc. Association for Computational Linguistics, Dublin, Ireland), 2022, pp. 2732-2742.