Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2012.19B.2.135

Analysis of Sentential Paraphrase Patterns and Errors through Predicate-Argument Tuple-based Approximate Alignment  

Choi, Sung-Pil (한국과학기술정보연구원 SW연구실)
Song, Sa-Kwang (한국과학기술정보연구원 SW연구실)
Myaeng, Sung-Hyon (한국과학기술원 전산학과)
Abstract
This paper proposes a model for recognizing sentential paraphrases through Predicate-Argument Tuple (PAT)-based approximate alignment between two texts. We cast the paraphrase recognition problem as a binary classification by defining and applying various alignment features which could effectively express the semantic relatedness between two sentences. Experiment confirmed the potential of our approach and error analysis revealed various paraphrase patterns not being solved by our system, which can help us devise methods for further performance improvement.
Keywords
Paraphrase Recognition; Predicate-Argument Structure; Textual Entailment; Text Mining; Machine Learning;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 K. K. Schuler, A. Korhonen, and S. Brown (2009), "VerbNet overview, extensions, mappings and applications," in HLT-NAACL, pp.13-14.
2 F. J. Och and H. Ney (2003), "A systematic comparison of various statistical alignment models," Comput. Linguist., Vol.29, pp.19-51.   DOI   ScienceOn
3 P. Liang, B. Taskar, and D. Klein (2006), "Alignment by Agreement," in Proceedings of NAACL 2006, New York City, USA, pp.104-111.
4 H. W. Kuhn (1955), "The Hungarian Method for the assignment problem," Naval Research Logistics Quarterly, Vol.2, pp.83-97.   DOI
5 R. Mihalcea, C. Corley, and C. Strapparava (2006), "Corpus-based and knowledge-based measures of text semantic similarity," in AAAI'06: Proceedings of the 21st national conference on Artificial intelligence, pp.775-780.
6 D. Das and N. A. Smith (2009), "Paraphrase identification as probabilistic quasi-synchronous recognition," in ACLIJCNLP '09: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1, Morristown, NJ, USA, pp.468-476.
7 A. D. Haghighi, A. Y. Ng, and C. D. Manning (2005), "Robust textual inference via graph matching," in HLT '05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, Morristown, NJ, USA, pp.387-394.
8 Z. Kozareva and A. Montoyo, "Paraphrase Identification on the Basis of Supervised Machine Learning Techniques," in Advances in Natural Language Processing. Vol.4139, T. Salakoski, et al., Eds., ed: Springer Berlin / Heidelberg, 2006, pp.524-533.
9 Y. Miyao and J. i. Tsujii (2008), "Feature Forest Models for Probabilistic HPSG Parsing," Computational Linguistics, Vol.34, pp.35-80.   DOI   ScienceOn
10 Y. Zhang and J. Patrick (2005), "Paraphrase Identification by Text Canonicalization," in Proceedings of the Australasian Language Technology Workshop 2005, Sydney, Australia, pp.160-166.
11 S. Fernando and M. Stevenson (2008), "A Semantic Similarity Approach to Paraphrase Detection," in Computational Linguistics UK (CLUK 2008) 11th Annual Research Colloquium.
12 V. Rus, P. M. McCarthy, M. C. Lintean, D. S. McNamara, and A. C. Graesser (2008), "Paraphrase Identification with Lexico-Syntactic Graph Subsumption," in FLAIRS Conference, pp.201-206.
13 B. Dolan, C. Quirk, and C. Brockett (2004), "Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources," in COLING '04: Proceedings of the 20th international conference on Computational Linguistics, Morristown, NJ, USA, pp.350.
14 T. Pedersen, S. Patwardhan, and J. Michelizzi (2004), "WordNet::Similarity - Measuring the Relatedness of Concepts," in Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI-04) , San Jose, CA pp.1024-1025.
15 T. Chklovski and P. Pantel (2004), "VerbOcean: Mining the Web for Fine-Grained Semantic Verb Relations," in Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP-04), Barcelona, Spain.
16 이공주, 윤보현 (2006), "정렬된 성경 코퍼스로부터 바꿔쓰기표현(paraphrase)의 자동 추출," 인지과학, Vol.17, pp.323-336.   과학기술학회마을
17 A. Hickl and J. Bensley (2007), "A discourse commitment-based framework for recognizing textual entailment," in RTE '07: Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, Morristown, NJ, USA, pp.171-176.
18 B. Pang, K. Knight, and D. Marcu (2003), "Syntax-based alignment of multiple translations: extracting paraphrases and generating new sentences," in NAACL '03: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Morristown, NJ, USA, pp.102-109.
19 Y. Shinyama and S. Sekine (2003), "Paraphrase acquisition for information extraction," in Proceedings of the second international workshop on Paraphrasing, Morristown, NJ, USA, pp.65-71.
20 I. Androutsopoulos and P. Malakasiotis (2010), "A Survey of Paraphrasing and Textual Entailment Methods," Journal of Artificial Intelligence Research, Vol.38, pp.135-187,
21 S. Zhao, H. Wang, T. Liu, and S. Li (2008), "Pivot Approach for Extracting Paraphrase Patterns from Bilingual Corpora," in Proceedings of ACL-08: HLT, Columbus, Ohio, pp.780-788.
22 A. Finch, Y.-S. Hwang, and E. Sumita (2005), "Using Machine Translation Evaluation Techniques to Determine Sentence-level Semantic Equivalence," in Proceedings of the Third International Workshop on Paraphrasing (IWP2005).
23 C. Bannard and C. Callison-Burch (2005), "Paraphrasing with bilingual parallel corpora," in ACL '05: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Morristown, NJ, USA, pp.597-604.
24 L. Qiu, M.-Y. Kan, and T.-S. Chua (2006), "Paraphrase recognition via dissimilarity significance classification," in EMNLP '06: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, Morristown, NJ, USA, pp.18-26.
25 C. Brockett and W. B. Dolan (2005), "Support Vector Machines for Paraphrase Identification and Corpus Construction," in Third International Workshop on Paraphrasing (IWP2005) , pp.1-9.
26 F. Rinaldi, J. Dowdall, K. Kaljurand, M. Hess, and Moll\'a, Diego (2003), "Exploiting paraphrases in a Question Answering system," in Proceedings of the second international workshop on Paraphrasing, Morristown, NJ, USA, pp.25-32.
27 R. Barzilay and L. Lee (2003), "Learning to paraphrase: an unsupervised approach using multiple-sequence alignment," in NAACL '03: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Morristown, NJ, USA, pp.16-23.
28 C. Quirk, C. Brockett, and W. Dolan (2004), "Monolingual machine translation for paraphrase generation," in Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp.142-149.
29 X. Wang, D. Lo, J. Jiang, L. Zhang, and H. Mei (2009), "Extracting paraphrases of technical terms from noisy parallel software corpora," in ACL-IJCNLP '09: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, Morristown, NJ, USA, pp.197-200.
30 P. Malakasiotis (2009), "Paraphrase recognition using machine learning to combine similarity measures," in ACL-IJCNLP '09: Proceedings of the ACL-IJCNLP 2009 Student Research Workshop, Morristown, NJ, USA, pp.27-35.
31 F. Keshtkar and D. Inkpen (2010), "A Corpus-based Method for Extracting Paraphrases of Emotion Terms," in Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, Los Angeles, CA, pp.35-44.
32 조정현, 정현기, 김유섭 (2009), "웹 검색과 문서 유사도를 활용한 2 단계 신문 기사 표절 탐지 시스템," 정보처리학회논문지B, Vol.16B, pp.181-194,   과학기술학회마을   DOI   ScienceOn
33 R. Barzilay and K. R. McKeown (2001), "Extracting paraphrases from a parallel corpus," in ACL '01: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, Morristown, NJ, USA, pp.50-57.
34 박경미, 문영성 (2010), "부분 구문 분석 결과에 기반한 두 단계 부분 의미 분석 시스템," 정보처리학회논문지B, Vol.17B, pp.85-92.   과학기술학회마을   DOI   ScienceOn