DOI QR코드

DOI QR Code

Automatic Document Summary Technique Using Fuzzy Theory

퍼지이론을 이용한 자동문서 요약 기술

  • 이상훈 (조지아주립대학교 대학원 컴퓨터학과) ;
  • 문승진 (수원대학교 컴퓨터학과)
  • Received : 2014.06.13
  • Accepted : 2014.10.18
  • Published : 2014.12.31

Abstract

With the very large quantity of information available on the Internet, techniques for dealing with the abundance of documents have become increasingly necessary but the problem of processing information in the documents is still technically challenging and remains under study. Automatic document summary techniques have been considered as one of critical solutions for processing documents to retain the important points and to remove duplicated contents of the original documents. In this paper, we propose a document summarization technique that uses a fuzzy theory. Proposed summary technique solves the ambiguous problem of various features determining the importance of the sentence and the experiment result shows that the technique generates better results than other previous techniques.

인터넷에서 사용 가능한 수많은 정보로 인해서 대용량의 문서를 다루는 기술은 점차 그 필요성이 증가되어 왔지만, 효과적으로 문서 내 정보를 처리하기 위한 기술의 문제는 여전히 풀어야 할 과제로 남아 있다. 자동문서 요약 기술은 문서 내 중요한 부분을 유지하고, 중복된 내용을 제거함으로써 이러한 대용량의 문서를 처리하는 데 중요한 방법으로 인식되어 왔다. 본 논문에서는 이러한 요약문을 만들 때 중요도를 결정하는 문제를 해결하기 위해서 퍼지 이론을 이용한 문서 요약 기술을 제안한다. 제안된 요약 기술은 중요도를 결정하는 여러 특징들의 애매모호한 문제를 해결하고, 그 실험결과는 기존의 다른 방법과 비교해서 전반적으로 높은 결과를 보인다.

Keywords

References

  1. R. Witte and S. Bergler, "Fuzzy coreference resolution for Summarization," In Proceedings of International Symposium on Reference Resolution and Its Applications to Question Answering and Summarization (ARQAS). Venice, Italy: Universit Ca Foscari, pp.43-50, 2003.
  2. L. Suanmali, N. Salim, and M. S. Binwahlan, "Fuzzy Logic Based Method for Improving Text Summarization," International Journal of Computer Science and Information Security (IJCSIS), Vol.2, No.1, pp.65-70, 2009.
  3. G. Ravindra, N. Balakrishnan, and K.R. Ramakrishnan, "Automatic Evaluation of Extract Summaries Using Fuzzy F-score Measure," In Proceedings of 5th International Conference on Knowledge Based Computer Systems, pp. 487-497, 2004.
  4. D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent Dirichlet Allocation," Journal of Machine Learning Research, Vol.3, pp.993-1022, 2003.
  5. C.Y. Lin, "ROUGE: A Package for Automatic Evaluation of Summaries", In Proceedings of Workshop on Text Summarization of ACL, Spain, 2004.
  6. D. Gillick, "Sentence Boundary Detection and the Problem with the U.S," The Association for Computational Linguistics, pp.241-244, 2009.
  7. J. C. Reynar and A. Ratnaparkhi, "A Maximum Entropy Approach to Identifying Sentence Boundaries," In Proceedings of 5th Conference on Applied Natural Language Processing, pp.16-19, 1997.
  8. M. F. Porter, "An Algorithm for Suffix Stripping," Program, Vol.14, No.3, pp.130-137, 1980. https://doi.org/10.1108/eb046814
  9. D. Newman, Topic modeling tool, Available in: .
  10. K. McKeown, R. Barzilay, J. Chen, D. K. Elson, D. K. Evans, J. Klavans, A. Nenkova, B. Schiffman, and S. Sigelman, "Columbia's Newsblaster: New Features and Future Directions," HLT-NAACL, pp.15-16, 2003.
  11. G. Salton and C. Buckley, "Term-weighting Approaches in Automatic Text Retrieval," Information Processing and Management, Vol.24, pp.513-523, 1988. Reprinted in: Sparck Jones K. and Willet P. (eds.), Readings in Information Retrieval, Morgan Kaufmann, pp.323-328, 1997. https://doi.org/10.1016/0306-4573(88)90021-0
  12. I. Dhillon, S. Mallela, and R. Kumar, "Enhanced word clustering for hierarchical classification," In Proceedings of 8th ACM Intl. Conf. on Knowledge Discovery and Data Mining, 2002.
  13. P. Jaccard, "Etude comparative de la distribution florale dans une portion des Alpes et des Jura," Bulletin de la Socit Vaudoise des Sciences Naturelles, Vol.37, pp.547-579, 1901.
  14. L. A. Zadeh, "Fuzzy Sets," Information and Control 8, Vol. 8, No.3, pp.338-353, 1965. https://doi.org/10.1016/S0019-9958(65)90241-X
  15. A. Louis and A. Nenkova, "Summary Evaluation without Human Models," Text Analysis Conference, 2008.
  16. D. R. Timothy, T. Allison, S. Blair-goldensohn, J. Blitzer, A. Celebi, S. Dimitrov, E. Drabek, A. Hakim, W. Lam, D. Liu, J. Otterbacher, H. Qi, H. Saggion, S. Teufel, A. Winkel, and Z. Zhang, "MEAD a platform for multidocument multilingual text summarization," In Proceedings of International Conference on Language Resources and Evaluation, pp.1-4, 2004.
  17. S. Kullback and R. A. Leibler, "On Information and Sufficiency," Annals of Mathematical Statistics, Vol.22, No.1, pp.79-86, 1951. https://doi.org/10.1214/aoms/1177729694
  18. T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley, New York, NY, 1991.
  19. L. A. Zadeh, "The Concept of a Linguistic Variable and Its Application to Approximate Reasoning," Information Sciences, Vol.8, pp.199-249, 1975. https://doi.org/10.1016/0020-0255(75)90036-5
  20. C. W. Kim and S. Park, "Document Summarization using Pseudo Relevance Feedback and Term Weighting," Journal of Korea Institute of Information and Communication Engineering(JKIICE), Vol.16, No.3, pp.533-540, 2012. https://doi.org/10.6109/jkiice.2012.16.3.533
  21. R. L. Summerscales, S. Argamon, S. Bai, J. Huperff, and A. Schwartzff, "Automatic Summarization of Results from Clinical Trials," BIBM, pp.372-377, 2011.
  22. S. Kiritchenko, B. Bruijn, S. Carini, J. Martin, and I. Sim, "Exact: automatic extraction of clinical trial characteristics from journal publications," BMC Med Inform Decis Mak, Vol.10, No.1, pp.56-17, 2010. https://doi.org/10.1186/1472-6947-10-56