DOI QR코드

DOI QR Code

A Study on the Construction of the Automatic Extracts and Summaries - On the Basis of Scientific Journal Articles -

자동 발췌문/요약 시스템 구축에 관한 연구 - 학술지 논문기사를 중심으로 -

  • 이태영 (전북대학교 인문대학 문헌정보학과)
  • Published : 2005.09.01

Abstract

Various corpus-based approaches, rhetorical roles of discourse structure, and unifications of similar sentences were applied to construct the automatic Ext/Sums(extracts and summaries). Rhetorical roles of sentences like objective, method, background, result, conclusion, etc. for making elastic Ext/Sums were established and extraction engines according to respective role were prepared. The $90\%$ of Success rate in extracting the important sentences of sample articles was accomplished. Rearranging the selected sentences, it used unification of similar sentences using the cosine coefficient equation, deletion of unnecessary modification and insertion clauses, junction of short sentences, and connection of sentences able to link. They suggest the methods applying rhetorical roles of sentences, meaning and signature of noun and verb in clauses, and cue words and location will be researched to construct the more effective Ext/Sums.

코퍼스 기반의 제 방법. 담화구조의 수사역할, 유사문장의 통합을 이용하여 발췌문과 기초적 요약문을 자동으로 작성하는 방법론을 구축하였다. 코퍼스에 따른 기법들의 효율적 한계치를 사전에 확인하였고 발췌/요약문의 신축적 작성을 위해서 요약문을 이루는 문장들의 수사역할을 목적, 배경, 방법, 결과. 결론 등으로 정하고 각각의 발췌기를 적용하였다. 발췌 성공률은 $90\%$이었다. 수사역할별로 선정된 문장의 합성과 분리를 위하여 유사도 공식을 이용한 유사문장의 통합, 불필요한 의미의 수식절, 삽입절의 제거, 짧은 문장들과 연결이 가능한 문장들의 합성을 시도하였다. 높은 발췌 성공률을 바탕으로 문장의 수사역할, 절의 용언어미 표징, 단서적 어구와 소재를 가미한 문장 정리 시스템의 개발이 요망된다.

Keywords

References

  1. 이재윤. 1993. '동적 시소러스의 구축에 관한 실험적 연구', 연세대학교 대학원 석사학위 논문
  2. 이태영. 1992. '한국어 초록 작성의 자동화에 대한 연구-미생물학분야 학술지의 논문을 대상으로-', 연세대학교 대학원 박사학위 논문
  3. 최상희. 2004. '질의응답을 위한 복수문서 요약에 관한 실헙적 연구', 연세대학교 대학원 박사학위논문
  4. 최인숙. 2000. '술어기반 문형정보를 이용한 자동요약시스템에 관한 연구', 연세대학교 대학원 박사학위 논문
  5. Alone, C., M. E. Okurowski, J. Gorlinsky, and B. Larsen. 1999. 'A Trainable Summarizer with Konwledge Acquired from Robust NLP Techniques', quo- ted in I. Mani and M.T. Maybury (eds.). 1999. Advanced in Automatic Text Summarization. Cambridge, Ma- ssachusetts: the MIT Press
  6. Barzilay, R. and M. Elhaadad. 1997. 'Using Lexical Chains for Text Summari- zation', In Proceedings of the Work- shop on Intelligent Scalable Text Summarization at the ACL/EACL Conference, 2-9. Madrid, Spain.
  7. Boguraev, B. and C. Kennedy. 1997. 'Salience- based Content Characterization of Text Documents', In Proceedings of the Workshop on Intelligent Scalable Text summarization at the ACL/ EACL Conference, 2-9. Madrid, Spain.
  8. Brandow, R., K. Mite, and L. Rau. 1995. 'Automatic condensation of Electronic Publications by Sentence Selection.' Information Processing & Management, 31(5): 675-685 https://doi.org/10.1016/0306-4573(95)00052-I
  9. Chowdhury, G. G. 1999. Introduction to Mordern Information Retrieval. London: Library Association Publishing
  10. Earl, L. L. 1970. 'Experiments in Automatic Extracting and Indexing.' Information Storage & Retrieval, 6(4): 313-334. quoted in F. W. Lancaster. Indexing and Abstracting in Theory and Prac- tice. London: 1998. 270
  11. Edmundson, H. P. 1969. 'New Methods in Automatic Extracting.' Journal. of ACM, 16(2): 377-391. quoted in F. W. Lancaster. Indexing and Abstrac- ting in Theory and Practice, London: 1998. 269
  12. Hirst, G. and D. ST-Onge. 1998[to appear]. 'Lexical Chains as representation of context for the detection and correc- tion of malapropisms'. In Fellbaum, C., ed., WordNet: An Electronic Lexical Database and Some of its Applications. Cambridge, MA: The MIT Press
  13. Hovy, E. and C. Lin. 1999. 'Automated Text Summarization in SUMMARIST', In Proceedings of the Workshop on Gaps and Bridges in NL Planning and Generation, 53-58. ECAI Con- ference. Budapest, Hungary
  14. Jones, K. S. 1999. 'Automatic summarizing: factors and directions', quoted in I. Mani and M.T. Maybury(eds.). 1999. Advanced in Automatic Text Sum- marization. Cambridge, Massachusetts: the MIT Press.
  15. Kupiec, J., J. Pedersen, and F. Chen. 1995. 'A Trainable document summarizer'. Proceedings of the Eighteenth Annual International ACM Conference on Research and Development in Infor- maton Retrieval (SIGIR), 68-73. seattle, WA
  16. Li, W., K-F. Wong, and C. Yuan. 2001. 'Toward Automatic Chinese Temporal Information Extraction.' JASIST, 52 (9): 748-62 https://doi.org/10.1002/asi.1126
  17. Llorens, J., M. Velasco, A. de Amescua, J. A. Moreiro, and V. Martinez. 2004. 'Automatic Generation of Domain Representations Using Thesaurus Struc- tures', JASIST, 55(10): 846-858 https://doi.org/10.1002/asi.20039
  18. Mani, I. and M. T. Maybury(eds.). 1999. Advanced in Automatic Text Sum- marization. Cambridge, Massachusetts: the MIT Press
  19. Mani, I. 2001. Automatic Summarization. Amsterdam: John Benjamins Publi- shing Company
  20. McKeown, K., J. Robin, and K. Kukich. 1995. 'Generating Concise Natural Language Summaries', Information Processing and Management: an Inter- national Journal, 31(5): 703-733 https://doi.org/10.1016/0306-4573(95)00026-D
  21. Meadow, C. T., B. R. Boyce, and D. H. Kraft. 2000. Text Information Retrieval Systems. San Diego: Academic Press. 208-211
  22. Moens, M-F., C. Uyttendaele, and J. Du- mortier. 1999. 'Abstracting of Legal Cases: The Pontential of Clustering Based on the Selection of Represen- tative Objects.' JASIS, 50: 151-161 https://doi.org/10.1002/(SICI)1097-4571(1999)50:2<151::AID-ASI6>3.0.CO;2-I
  23. Myaeng, S. H. and D. H. Jang. 1999. 'Development and Evaluation of a Statistically-based Document Summari- zation System', quoted in I. Mani and M.T. Maybury(eds.). 1999. Ad- vanced in Automatic Text Summari- zation. Cambridge, Massachusetts: the MIT Press
  24. Paice, C. D. 1990. 'Constructing Literature Abstract by Computer : Techniques and Prospects.' Information Processing & Management, 26(1): 171-186 https://doi.org/10.1016/0306-4573(90)90014-S
  25. Rush, J. E. et al. 1971. 'Automatic Abst- racting and Indexing. II. Production of Indicative Abstracts by Appli- cation of Contextual Inference and Syntactic Coherence Criteria.' JASIS, 22(4): 260-274 https://doi.org/10.1002/asi.4630220405
  26. Salton, G., J. Allen, and A. Singhal. 1996. 'Automatic text decomposition and structuring.' Information Processing & Management, 32: 127-138 https://doi.org/10.1016/S0306-4573(96)85001-1
  27. Salton, G., Singhal, A., Mitra, M., Buckley, C., 1997. 'Automatic text structuring and summarization.' Information Pro- cessing & Management, 33: 193-207 https://doi.org/10.1016/S0306-4573(96)00062-3
  28. Schutze, H. 1998. 'Automatic word sensec discrimination.' Computational Lingui- stics, 24: 97-123
  29. Teufel, S. and M. Moens. 1999. 'Argu- mentive classification of extracted sentences as a first step towards flexible abstracting', quoted in I. Mani and M.T. Maybury(eds.). 1999. Advanced in Automatic Text Sum- marization. Cambridge, Massachusetts: the MIT Press
  30. van Dijk, T. A. 1979. 'Recalling and Summarizing Complex Discourse'. In W. Burchart and K. Hulker(eds.), Text Processing Science, 49-93, Berlin: Walter de Gruyter. quoted in I. Mani. 2001. Automatic Summari- zation. Amsterdam, John Benjamins Publishing Company, 139-142