술어-논항 튜플 기반 근사 정렬을 이용한 문장 단위 바꿔쓰기표현 유형 및 오류 분석

Analysis of Sentential Paraphrase Patterns and Errors through Predicate-Argument Tuple-based Approximate Alignment

  • 최성필 (한국과학기술정보연구원 SW연구실) ;
  • 송사광 (한국과학기술정보연구원 SW연구실) ;
  • 맹성현 (한국과학기술원 전산학과)
  • 투고 : 2012.02.07
  • 심사 : 2012.03.02
  • 발행 : 2012.04.30


본 논문에서는 Predicate-Argument Tuple (PAT)를 기반으로 텍스트 간 심층적 근사 정렬(Approximate Alignment)을 통한 문장 단위 바꿔쓰기표현(sentential paraphrase) 식별 모델을 제안한다. 두 문장 간의 PAT 기반 근사 정렬 결과를 바탕으로, 두 문장의 의미적 연관성을 효과적으로 표현하는 다양한 정렬 자질(alignment feature)들을 정의함으로써, 바꿔쓰기표현 식별 문제를 지도 학습(supervised learning) 기반의 자동 분류 모델로 접근하였다. 실험을 통해서 제안 모델의 가능성을 확인할 수 있었으며, 시스템의 오류 분석을 통해 제안 방법이 아직 해결하지 못하는 다양한 바꿔쓰기표현 유형들을 식별함으로써 향후 시스템의 성능 개선 방향을 도출하였다.

This paper proposes a model for recognizing sentential paraphrases through Predicate-Argument Tuple (PAT)-based approximate alignment between two texts. We cast the paraphrase recognition problem as a binary classification by defining and applying various alignment features which could effectively express the semantic relatedness between two sentences. Experiment confirmed the potential of our approach and error analysis revealed various paraphrase patterns not being solved by our system, which can help us devise methods for further performance improvement.



  1. R. Barzilay and K. R. McKeown (2001), "Extracting paraphrases from a parallel corpus," in ACL '01: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, Morristown, NJ, USA, pp.50-57.
  2. F. Rinaldi, J. Dowdall, K. Kaljurand, M. Hess, and Moll\'a, Diego (2003), "Exploiting paraphrases in a Question Answering system," in Proceedings of the second international workshop on Paraphrasing, Morristown, NJ, USA, pp.25-32.
  3. R. Barzilay and L. Lee (2003), "Learning to paraphrase: an unsupervised approach using multiple-sequence alignment," in NAACL '03: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Morristown, NJ, USA, pp.16-23.
  4. C. Quirk, C. Brockett, and W. Dolan (2004), "Monolingual machine translation for paraphrase generation," in Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp.142-149.
  5. C. Brockett and W. B. Dolan (2005), "Support Vector Machines for Paraphrase Identification and Corpus Construction," in Third International Workshop on Paraphrasing (IWP2005) , pp.1-9.
  6. X. Wang, D. Lo, J. Jiang, L. Zhang, and H. Mei (2009), "Extracting paraphrases of technical terms from noisy parallel software corpora," in ACL-IJCNLP '09: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, Morristown, NJ, USA, pp.197-200.
  7. P. Malakasiotis (2009), "Paraphrase recognition using machine learning to combine similarity measures," in ACL-IJCNLP '09: Proceedings of the ACL-IJCNLP 2009 Student Research Workshop, Morristown, NJ, USA, pp.27-35.
  8. F. Keshtkar and D. Inkpen (2010), "A Corpus-based Method for Extracting Paraphrases of Emotion Terms," in Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, Los Angeles, CA, pp.35-44.
  9. 조정현, 정현기, 김유섭 (2009), "웹 검색과 문서 유사도를 활용한 2 단계 신문 기사 표절 탐지 시스템," 정보처리학회논문지B, Vol.16B, pp.181-194,
  10. 박경미, 문영성 (2010), "부분 구문 분석 결과에 기반한 두 단계 부분 의미 분석 시스템," 정보처리학회논문지B, Vol.17B, pp.85-92.
  11. 이공주, 윤보현 (2006), "정렬된 성경 코퍼스로부터 바꿔쓰기표현(paraphrase)의 자동 추출," 인지과학, Vol.17, pp.323-336.
  12. B. Pang, K. Knight, and D. Marcu (2003), "Syntax-based alignment of multiple translations: extracting paraphrases and generating new sentences," in NAACL '03: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Morristown, NJ, USA, pp.102-109.
  13. Y. Shinyama and S. Sekine (2003), "Paraphrase acquisition for information extraction," in Proceedings of the second international workshop on Paraphrasing, Morristown, NJ, USA, pp.65-71.
  14. I. Androutsopoulos and P. Malakasiotis (2010), "A Survey of Paraphrasing and Textual Entailment Methods," Journal of Artificial Intelligence Research, Vol.38, pp.135-187,
  15. A. Hickl and J. Bensley (2007), "A discourse commitment-based framework for recognizing textual entailment," in RTE '07: Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, Morristown, NJ, USA, pp.171-176.
  16. S. Zhao, H. Wang, T. Liu, and S. Li (2008), "Pivot Approach for Extracting Paraphrase Patterns from Bilingual Corpora," in Proceedings of ACL-08: HLT, Columbus, Ohio, pp.780-788.
  17. A. Finch, Y.-S. Hwang, and E. Sumita (2005), "Using Machine Translation Evaluation Techniques to Determine Sentence-level Semantic Equivalence," in Proceedings of the Third International Workshop on Paraphrasing (IWP2005).
  18. C. Bannard and C. Callison-Burch (2005), "Paraphrasing with bilingual parallel corpora," in ACL '05: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Morristown, NJ, USA, pp.597-604.
  19. L. Qiu, M.-Y. Kan, and T.-S. Chua (2006), "Paraphrase recognition via dissimilarity significance classification," in EMNLP '06: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, Morristown, NJ, USA, pp.18-26.
  20. A. D. Haghighi, A. Y. Ng, and C. D. Manning (2005), "Robust textual inference via graph matching," in HLT '05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, Morristown, NJ, USA, pp.387-394.
  21. Y. Miyao and J. i. Tsujii (2008), "Feature Forest Models for Probabilistic HPSG Parsing," Computational Linguistics, Vol.34, pp.35-80.
  22. Y. Zhang and J. Patrick (2005), "Paraphrase Identification by Text Canonicalization," in Proceedings of the Australasian Language Technology Workshop 2005, Sydney, Australia, pp.160-166.
  23. Z. Kozareva and A. Montoyo, "Paraphrase Identification on the Basis of Supervised Machine Learning Techniques," in Advances in Natural Language Processing. Vol.4139, T. Salakoski, et al., Eds., ed: Springer Berlin / Heidelberg, 2006, pp.524-533.
  24. S. Fernando and M. Stevenson (2008), "A Semantic Similarity Approach to Paraphrase Detection," in Computational Linguistics UK (CLUK 2008) 11th Annual Research Colloquium.
  25. V. Rus, P. M. McCarthy, M. C. Lintean, D. S. McNamara, and A. C. Graesser (2008), "Paraphrase Identification with Lexico-Syntactic Graph Subsumption," in FLAIRS Conference, pp.201-206.
  26. B. Dolan, C. Quirk, and C. Brockett (2004), "Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources," in COLING '04: Proceedings of the 20th international conference on Computational Linguistics, Morristown, NJ, USA, pp.350.
  27. T. Pedersen, S. Patwardhan, and J. Michelizzi (2004), "WordNet::Similarity - Measuring the Relatedness of Concepts," in Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI-04) , San Jose, CA pp.1024-1025.
  28. T. Chklovski and P. Pantel (2004), "VerbOcean: Mining the Web for Fine-Grained Semantic Verb Relations," in Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP-04), Barcelona, Spain.
  29. K. K. Schuler, A. Korhonen, and S. Brown (2009), "VerbNet overview, extensions, mappings and applications," in HLT-NAACL, pp.13-14.
  30. F. J. Och and H. Ney (2003), "A systematic comparison of various statistical alignment models," Comput. Linguist., Vol.29, pp.19-51.
  31. P. Liang, B. Taskar, and D. Klein (2006), "Alignment by Agreement," in Proceedings of NAACL 2006, New York City, USA, pp.104-111.
  32. H. W. Kuhn (1955), "The Hungarian Method for the assignment problem," Naval Research Logistics Quarterly, Vol.2, pp.83-97.
  33. R. Mihalcea, C. Corley, and C. Strapparava (2006), "Corpus-based and knowledge-based measures of text semantic similarity," in AAAI'06: Proceedings of the 21st national conference on Artificial intelligence, pp.775-780.
  34. D. Das and N. A. Smith (2009), "Paraphrase identification as probabilistic quasi-synchronous recognition," in ACLIJCNLP '09: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1, Morristown, NJ, USA, pp.468-476.