DOI QR코드

DOI QR Code

Named entity recognition using transfer learning and small human- and meta-pseudo-labeled datasets

  • Kyoungman Bae (Language Intelligence Research Section, Electronics and Telecommunications Research Institute) ;
  • Joon-Ho Lim (Language Intelligence Research Section, Electronics and Telecommunications Research Institute)
  • Received : 2023.08.24
  • Accepted : 2023.12.20
  • Published : 2024.02.20

Abstract

We introduce a high-performance named entity recognition (NER) model for written and spoken language. To overcome challenges related to labeled data scarcity and domain shifts, we use transfer learning to leverage our previously developed KorBERT as the base model. We also adopt a meta-pseudo-label method using a teacher/student framework with labeled and unlabeled data. Our model presents two modifications. First, the student model is updated with an average loss from both human- and pseudo-labeled data. Second, the influence of noisy pseudo-labeled data is mitigated by considering feedback scores and updating the teacher model only when below a threshold (0.0005). We achieve the target NER performance in the spoken language domain and improve that in the written language domain by proposing a straightforward rollback method that reverts to the best model based on scarce human-labeled data. Further improvement is achieved by adjusting the label vector weights in the named entity dictionary.

Keywords

Acknowledgement

Institute for Information and Communications Technology Promotion, Grant/Award Numbers: 2013-2-00131, 2022-0-00369

References

  1. Named-entity recognition, [last accessed 10 August 2023], Available at: https://en.wikipedia.org/wiki/Named-entity_recognition
  2. X Ma and E. Hovy, End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF, arXiv Preprint, 2016, DOI https://doi.org/10.48550/arXiv.1603.01354.
  3. J. Li, A. Sun, J. Han, and C. Li, A survey on deep learning for named entity recognition, IEEE Trans. Knowl. Data Eng. 34 (2022), 50-70. https://doi.org/10.1109/TKDE.2020.2981314
  4. R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, Natural language processing (almost) from scratch, J. Mach. Learn. Res. 12 (2011), 2493-2537.
  5. Y. Lin, S. Yang, V. Stoyanov, and H. Ji, A multi-lingual multitask architecture for low-resource sequence labeling, (Proc. 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia), 2018, pp. 799-809.
  6. L. Liu, J. Shang, X. Ren, F. Xu, H. Gui, J. Peng, and J. Han, Empower sequence labeling with task-aware neural language model, (Proc. Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA), 2018, pp. 5253-5260.
  7. W. Zhou and M. Chen, Learning from noisy labels for entity-centric information extraction, (Proc. 2021 Conf. Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic), 2021, pp. 5381-5392.
  8. I. Yamada, A. Asai, H. Shindo, H. Takeda, and Y. Matsumoto, LUKE: Deep contextualized entity representations with entity-aware self-attention, (Proc. 2020 Conf. Empirical Methods in Natural Language Processing, Online), 2020, pp. 6442-6454.
  9. X. Li, X. Sun, Y. Meng, J. Liang, F. Wu, and J. Li, Dice loss for data-imbalanced NLP tasks, (Proc. 58th Annual Meeting of the Association for Computational Linguistics, Online), 2020, pp. 465-476.
  10. J. Li, S. Shang, and L. Shao, MetaNER: Named entity recognition with meta-learning, (Proc. Web Conference, Taipei, Taiwan), 2020, pp. 429-440.
  11. S. Niu, Y. Liu, J. Wang, and H. Song, A decade survey of transfer learning, IEEE Trans. Artif. Intell. 1 (2020), 151-166. https://doi.org/10.1109/TAI.2021.3054609
  12. X. Yang, Z. Song, I. King, and Z. Xu, A survey on deep semi-supervised learning, IEEE Trans. Knowl. Data Eng. 35 (2022), 8934-8954. https://doi.org/10.1109/TKDE.2022.3220219
  13. H. Pham, Z. Dai, Q. Xie, and Q. V. Le, Meta pseudo labels, (2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Online), 2021, pp. 11557-11568.
  14. Y. Wang, S. Mukherjee, H. Chu, Y. Tu, M. Wu, J. Gao, and A. H. Awadallah, Meta self-training for few-shot neural sequence labeling, (Proc. 27th ACM SIGKDD Conf. Knowledge Discovery & Data Mining, Online), 2021, pp. 1737-1747.
  15. K. He, R. Mao, T. Gong, C. Li, and E. Cambria, Meta-based self-training and re-weighting for aspect-based sentiment analysis, IEEE Trans. Affect. Comput. 14 (2022), no. 3, 1-13.
  16. S. Ruder, Neural transfer learning for natural language processing, Ph.D. Dissertation, National Univ. of Ireland, 2019.
  17. M. Arjovsky, S. Chintala, and L. Bottou, Wasserstein generative adversarial networks, (Proc. 34th International Conf. Machine Learning, Sydney, Australia), 2017, pp. 214-223.
  18. A. Margolis, K. Livescu, and M. Ostendorf, Domain adaptation with unlabeled data for dialog act tagging, (Proc. 2010 Workshop on Domain Adaptation for Natural Language Processing, Uppsala, Sweden), 2010, pp. 45-52.
  19. S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang, Domain adaptation via transfer component analysis, IEEE Trans. Neural Netw. 22 (2010), 199-210.
  20. X. Glorot, A. Bordes, and Y. Bengio, Domain adaptation for large-scale sentiment classification: A deep learning approach, (Proc. 28th International Conf. Machine Learning, Bellevue, WA, USA), 2011, pp. 513-520.
  21. L. Qu, G. Ferraro, L. Zhou, W. Hou, and T. Baldwin, Named entity recognition for novel types by transfer learning, (Proc. 2016 Conf. Empirical Methods in Natural Language Processing, Austin, TX, USA), 2016, pp. 899-905.
  22. Z. Wang, Y. Qu, L. Chen, J. Shen, W. Zhang, S. Zhang, Y. Gao, G. Gu, K. Chen, and Y. Yu, Label-aware double transfer learning for cross-specialty medical named entity recognition, (Proc. 2018 Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, LA, USA), 2018, pp. 1-15.
  23. B. Plank, A. Johannsen, and A. Sogaard, Importance weighting and unsupervised domain adaptation of POS taggers: A negative result, (Proc. 2014 Conf. Empirical Methods in Natural Language Processing, Doha, Qatar), 2014, pp. 968-973.
  24. A. Sogaard and M. Haulrich, Sentence-level instance-weighting for graph-based and transition-based dependency parsing, (Proc. 12th International Conf. Parsing Technologies, Dublin, Ireland), 2011, pp. 43-47.
  25. M. van der Wees, A. Bisazza, and C. Monz, Dynamic data selection for neural machine translation, (Proc. 2017 Conf. Empirical Methods in Natural Language Processing, Copenhagen, Denmark), 2017, pp. 1400-1410.
  26. S. Ruder, P. Ghaffari, and J. G. Breslin, Knowledge adaptation: Teaching to adapt, arXiv Preprint, 2017, DOI https://doi.org/10.48550/arXiv.1702.02052
  27. X. J. Zhu, Semi-supervised learning literature survey. Technical Report 1530, Computer Sciences, Univ. of Wisconsin-Madison, 2005.
  28. D. McClosky, E. Charniak, and M. Johnson, Effective self-training for parsing, (Proc. Main Conf. Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, New York, NY, USA), 2006, pp. 152-159.
  29. O. Sandu, G. Carenini, G. Murray, and R. Ng, Domain adaptation to summarize human conversations, (Proc. 2010 Workshop on Domain Adaptation for Natural Language Processing, Uppsala, Sweden), 2010, pp. 16-22.
  30. Y. He and D. Zhou, Self-training from labeled features for sentiment analysis, Inf. Process. Manag. 47 (2011), 606-616. https://doi.org/10.1016/j.ipm.2010.11.003
  31. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Attention is all you need, (31st Conf. Neural Information Processing Systems, Long Beach, CA, USA), 2017, pp. 5998-6008.
  32. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, (Proc. Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2019, Minneapolis, MN, USA), 2019, pp. 4171-4186.
  33. C. Jia, Y. Shi, Q. Yang, and Y. Zhang, Entity enhanced BERT pre-training for Chinese NER, (Proc. 2020 Conf. Empirical Methods in Natural Language Processing, Online), 202, pp. 6384-6396.
  34. S. Lee, H. Jang, Y. Baik, S. Park, H. Shin, KR-BERT: A small-scale Korean-specific language model, arXiv Preprint, 2020, DOI https://doi.org/10.48550/arXiv.2008.03979
  35. Y. Gong, L. Mao, and C. Li, Few-shot learning for named entity recognition based on BERT and two-level model fusion, Data Intell. 3 (2021), no. 4, 568-577.
  36. DH Lee, Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks, (ICML 2013 Workshop on Challenges in Representation Learning, Atlanta, GA, USA), 2013, pp. 896-901.
  37. M. U. Ahmed, Y. H. Kim, and P. K. Rhee, EER-ASSL: combining rollback learning and deep learning for rapid adaptive object detection, KSII Trans. Internet Inf. Syst. 14 (2020), 4776-4794.
  38. X. Wang, Y. Jiang, N. Bach, T. Wang, Z. Huang, F. Huang, and K. Tu, Improving named entity recognition by external context retrieving and cooperative learning, (Proc. 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conf. Natural Language Processing, Online), 2021, pp. 1800-1812.