DOI QR코드

DOI QR Code

Three-Phase English Syntactic Analysis for Improving the Parsing Efficiency

영어 구문 분석의 효율 개선을 위한 3단계 구문 분석

  • 김성동 (한성대학교 컴퓨터공학과)
  • Received : 2015.10.22
  • Accepted : 2015.11.30
  • Published : 2016.01.31

Abstract

The performance of an English-Korean machine translation system depends heavily on its English parser. The parser in this paper is a part of the rule-based English-Korean MT system, which includes many syntactic rules and performs the chart-based parsing. The parser generates too many structures due to many syntactic rules, so much time and memory are required. The rule-based parser has difficulty in analyzing and translating the long sentences including the commas because they cause high parsing complexity. In this paper, we propose the 3-phase parsing method with sentence segmentation to efficiently translate the long sentences appearing in usual. Each phase of the syntactic analysis applies its own independent syntactic rules in order to reduce parsing complexity. For the purpose, we classify the syntactic rules into 3 classes and design the 3-phase parsing algorithm. Especially, the syntactic rules in the 3rd class are for the sentence structures composed with commas. We present the automatic rule acquisition method for 3rd class rules from the syntactic analysis of the corpus, with which we aim to continuously improve the coverage of the parsing. The experimental results shows that the proposed 3-phase parsing method is superior to the prior parsing method using only intra-sentence segmentation in terms of the parsing speed/memory efficiency with keeping the translation quality.

영어 구문 분석기는 영한 기계번역 시스템의 성능에 가장 큰 영향을 미치는 부분이다. 본 논문에서의 영어 구문 분석기는 규칙 기반 영한 기계번역 시스템의 한 부분으로서, 많은 구문 규칙을 구축하고 차트 파싱 기법으로 구문 분석을 수행한다. 구문 규칙의 수가 많기 때문에 구문 분석 과정에서 많은 구조가 생성되는데, 이로 인해 구문 분석 속도가 저하되고 많은 메모리를 필요로 하여 번역의 실용성이 떨어진다. 또한 쉼표를 포함하는 긴 문장들은 구문 분석 복잡도가 매우 높아 구문 분석 시간/공간 효율이 떨어지고 정확한 번역을 생성하기 매우 어렵다. 본 논문에서는 실제 생활에서 나타나는 긴 문장들을 효율적으로 번역하기 위해 문장 분할 방법을 적용한 3단계 구문 분석 방법을 제안한다. 구문 분석의 각 단계는 독립된 구문 규칙들을 적용하여 구문 분석을 수행함으로써 구문 분석의 복잡도를 줄이려 하였다. 이를 위해 구문 규칙을 3가지 부류로 분류하고 이를 이용한 3단계 구문 분석 알고리즘을 고안하였다. 특히 세 번째 부류의 구문 규칙은 쉼표로 구성되는 문장 구조에 대한 규칙으로 구성되는데, 이들 규칙들을 말뭉치의 분석을 통해 획득하는 방법을 제안하여 구문 분석의 적용률을 지속적으로 개선하고자 하였다. 실험을 통해 제안한 방법이 문장 분할만을 적용한 기존 2단계 구문 분석 방법에 비해 유사한 번역 품질을 유지하면서도 시간/공간 효율 면에서 우수함을 확인하였다.

Keywords

References

  1. Sung-Dong Kim, Byoung-Tak Zhang, and Yung Taek Kim, "Learning-based Intrasentence Segmentation for Efficient Parsing of Long Sentences," Journal of Machine Translation, Vol.16, No.3, pp.151-174, 2001. https://doi.org/10.1023/A:1019896420277
  2. Sung-Dong Kim, "Intra-Sentence Segmentation using Maximum Entropy Model for Efficient Parsing of English Sentences," Journal of Korean Institute of Information Science and Engineering, Vol.32, No.5, 2005.
  3. Hye-Kyum Kim, Kyung-Mi Park, Yeo-Chan Yoon, Hae- Chang Rim, and So-Young Park, "Tree Tagging Tool using Two-phrase Parsing," Proceedings of the 17th Annual Conference on Human & Cognitive Language Technology (HCLT 2005), 2005.
  4. M. Jin, M.-Y. Kim, and J.-H. Lee, "Two-Phase Shift-Reduce Deterministic De pendency Parser of Chinese," in Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP), 2005.
  5. Joseph Turial, "Constituent Parsing By Classification," PhD dissertation, Computer Science Department, Sept., 2007.
  6. Xiao Chen, "Discriminative Constituent Parsing with Localized Features," PhD thesis, City University of Hong Kong, 2012.
  7. J. Nivre and M. Scholz, "Deterministic dependency parsing of English text," in Proceedings of the 20th International Conference on Computational Linguistics, pp.64-70, Geneva, Switzerland, 2004.
  8. A. Michael and A. Covington, "Fundamental Algorithm for Dependency Parsing," in Proceedings of the 39th Annual ACM Southeast Conference, ed. John A. Miller and Jeffrey W. Smith, pp.95-102, 2001.
  9. L. Banarescu et. al., "Abstract meaning representation for sembanking," in Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, pp.178-186, 2013.
  10. J. Flanigan et. al., "A discriminative graph-based parser for the abstract meaning representation," in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp.1426-1436, 2014.
  11. C. Wang et. al., "A transition-based algorithm for amr parsing," in Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics, pp.366-375, 2015.
  12. M. Pust et. al., "Using Syntax-Based Machine Translation to Parse English into Abstract Meaning Representation," in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp.1143-1154, 2015.
  13. An International Handbook of Contemporary Research, Edited by V. Agel, L.M. Eichinger, H.-W. Eroms, P. Hellwig, H.-J. Heringer, H. Lobin. Volume II. pp. 1081-1108. Mouton: 2006.
  14. S. Abney, "Part-of-Speech Tagging and Partial Parsing," Corpus-Based Methods in Language and Speech, pp.118-136, 1996.
  15. B. Srinivas, "A lightweight dependency analyzer for partial parsing," Natural Language Engineering, Vol.6, No.2, pp. 113-138, 2000. https://doi.org/10.1017/S1351324900002345
  16. H. Faili, "From Partial toward Full Parsing," Proceedings of the International Conference on Recent Advances on Natural Language Processing 2009, pp.71-75, 2009.
  17. Y. Oda et. al., "Optimizing Segmentation Strategies for Simultaneous Speech Recognition," Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp.551-556, 2014.
  18. Sung-Dong Kim, "English Syntactic Rule Management System for Rule-Based English-Korean Machine Translation System," KIISE(Korean Institute of Information Science and Engineering) Transactions on Computing Practice, Vol.20, No.7, pp.398-407, 2014.
  19. Sung-Dong Kim, "Intra-sentence Segmentation using Finite Automata for Efficient English Syntactic Analysis," KIISE(Korean Institute of Information Science and Engineering) Transactions on Computing Practices, Vol.19, No.4, pp.186-193, 2013.