DOI QR코드

DOI QR Code

Bi-LSTM-CRF 앙상블 모델을 이용한 한국어 공간 정보 추출

Korean Spatial Information Extraction using Bi-LSTM-CRF Ensemble Model

  • 민태홍 (충북대학교 컴퓨터과학과) ;
  • 신형진 (충북대학교 컴퓨터과학과) ;
  • 이재성 (충북대학교 소프트웨어학과)
  • 투고 : 2019.09.20
  • 심사 : 2019.10.24
  • 발행 : 2019.11.28

초록

공간 정보 추출은 자연어 텍스트에 있는 정적 및 동적인 공간 정보를 공간 개체와 그들 사이의 관계로 명확히 표시하여 추출하는 것을 말한다. 이 논문은 2단계 양방향 LSTM-CRF 앙상블 모델을 사용하여 한국어 공간 정보를 추출할 수 있는 심층 학습 방법을 제안한다. 또한 공간 개체 추출과 공간 관계 속성 추출을 통합한 모델을 소개한다. 한국어 공간정보 말뭉치(Korean SpaceBank)를 사용하여 실험한 결과 제안한 심층학습 방법이 기존의 CRF 모델보다 우수함을 보였으며, 특히 제안한 앙상블 모델이 단일 모델보다 더 우수한 성능을 보였다.

Spatial information extraction is to retrieve static and dynamic aspects in natural language text by explicitly marking spatial elements and their relational words. This paper proposes a deep learning approach for spatial information extraction for Korean language using a two-step bidirectional LSTM-CRF ensemble model. The integrated model of spatial element extraction and spatial relation attribute extraction is proposed too. An experiment with the Korean SpaceBank demonstrates the better efficiency of the proposed deep learning model than that of the previous CRF model, also showing that the proposed ensemble model performed better than the single model.

키워드

참고문헌

  1. I. Mani, J. Hitzeman, J. Richer, D. Harris, R. Quimby, and B. Wellner, "SpatialML: Annotation Scheme, Corpora, and Tools," In LREC, 2008.
  2. P. Kordjamshidi, M. F. Moens, and M. van Ctterlo, "Spatial role labeling: Task definition and annotation scheme," In Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10), European Language Resources Association (ELRA), pp.413-420, 2010.
  3. ISO-24617-7:2014, Language resource management - part 7: Spatial information (ISOspace).
  4. J. Pustejovsky, P. Kordjamshidi, M. F. Moens, A. Levine, S. Dworman, and Z. Yocum, "SemEval-2015 Task 8: SpaceEval," In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp.884-894, 2015.
  5. P. Kordjamshidi, S. Bethard, and M. F. Moens, "SemEval-2012 task 3: Spatial role labeling," Proceedings of the First Joint Conference on Lexical and Computational Semantics-Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, Association for Computational Linguistics, 2012.
  6. O. Kolomiyets, P. Kordjamshidi, M. F. Moens, and S. Bethard, "Semeval-2013 task 3: Spatial role labeling," Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), Vol.2, 2013.
  7. J. Pustejovsky, P. Kordjamshidi, M. F. Moens, A. Levine, S. Dworman, and Z. Yocum, "SemEval-2015 Task 8: SpaceEval," In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp.884-894, 2015.
  8. H. Salaberri, O. Arregi, and B. Zapirain, "IXAGroupEHUSpaceEval:(X-Space) A WordNet-based approach towards the automatic recognition of spatial information following the ISO-Space annotation scheme," In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp.856-861, 2015.
  9. E. Nichols and F. Botros, "SpRL-CWW: Spatial relation classification with independent multi-class models," In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), 2015.
  10. B. Kim and J. S. Lee, "Extracting Spatial Entities and Relations in Korean Text," In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp.2389-2396, 2016.
  11. P. N. Golshan, H. R. Dashti, S. Azizi, and L. Safari, "A Study of Recent Contributions on Information Extraction," arXiv:1803.05667, 2018.
  12. M. Miwa and Y. Sasaki, "Modeling joint entity and relation extraction with table representation," In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.1858-1869, 2014.
  13. M. Miwa and M. Bansal, "End-to-end relation extraction using lstms on sequences and tree structures," arXiv preprint arXiv:1601.00770, 2016.
  14. L. He, K. Lee, M. Lewis, and L. Zettlemoyer, "Deep semantic role labeling: What works and what's next," In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol.1, pp.473-483, 2017.
  15. I. Hendrickx, S. N. Kim, Z. Kozareva, P. Nakov, D. O Seaghdha, S. Pado, and S. Szpakowicz, "Semeval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals," In Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions, Association for Computational Linguistics, pp.99-99, 2009.
  16. L. Marquez, X. Carreras, K. C. Litkowski, and S. Stevenson, "Semantic role labeling: an introduction to the special issue," Computational Linguistics, Vol.34, No.2, pp.145-159, 2008. https://doi.org/10.1162/coli.2008.34.2.145
  17. A. Mazalov, B. Martins, and D. Matos, "Spatial role labeling with convolutional neural networks," Proceedings of the 9th Workshop on Geographic Information Retrieval, ACM, 2015.
  18. B. Kim, M. Y. Kang, and J. S. Lee. "Issues in spatial information annotation in Korean texts," In Proceedings of 2016 International Conference on Big Data and Smart Computing (BigComp), 2016.
  19. G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer, "Neural architectures for named entity recognition," arXiv preprint arXiv:1603.01360, 2016.
  20. S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, Vol.9, No.8, pp.1735-1780, 1997. https://doi.org/10.1162/neco.1997.9.8.1735
  21. Z. Huang, W. XU, and K. Yu, "Bidirectional LSTM-CRF models for sequence tagging," arXiv preprint arXiv:1508.01991, 2015.
  22. P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, "Enriching word vectors with subword information," arXiv preprint arXiv:1607.04606, 2016.
  23. CBNU(Chungbuk National University), Language and Knowledge Engineering Lab, Korean SpaceBank v2.0 Guideline , 2017.
  24. NIKL(National Institute of Korean Language), 21st century Sejong project final result , revised edition. 2011.
  25. TTAK.KO-11.0010/R1, "Part-of-Speech Tag Set for Morphological Annotation of Written Texts," 2015.
  26. TTAK.KO-10.0852, "Tag Set and Tagged Corpus for Named Entity Recognition," 2015.
  27. TTAK.KO-10.0853, "Dependency Tag Sets and Dependency Relation Establishment Methods for Constructing Dependency Tagged Corpora," 2015.
  28. M. E. Peters, M. Neumann, M. lyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, "Deep contextualized word representations," arXiv preprint arXiv:1802.05365, 2018.
  29. J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.