DOI QR코드

DOI QR Code

PMCN: Combining PDF-modified Similarity and Complex Network in Multi-document Summarization

  • Tu, Yi-Ning (School of Fu Jen Catholic University, Department of Statistics and Information Science, (R.O.C.)) ;
  • Hsu, Wei-Tse (National Taipei University)
  • Received : 2018.10.14
  • Accepted : 2019.09.01
  • Published : 2019.09.30

Abstract

This study combines the concept of degree centrality in complex network with the Term Frequency $^*$ Proportional Document Frequency ($TF^*PDF$) algorithm; the combined method, called PMCN (PDF-Modified similarity and Complex Network), constructs relationship networks among sentences for writing news summaries. The PMCN method is a multi-document summarization extension of the ideas of Bun and Ishizuka (2002), who first published the $TF^*PDF$ algorithm for detecting hot topics. In their $TF^*PDF$ algorithm, Bun and Ishizuka defined the publisher of a news item as its channel. If the PDF weight of a term is higher than the weights of other terms, then the term is hotter than the other terms. However, this study attempts to develop summaries for news items. Because the $TF^*PDF$ algorithm summarizes daily news, PMCN replaces the concept of "channel" with "the date of the news event", and uses the resulting chronicle ordering for a multi-document summarization algorithm, of which the F-measure scores were 0.042 and 0.051 higher than LexRank for the famous d30001t and d30003t tasks, respectively.

Keywords

References

  1. Freeman, L. C. (1978). Centrality in social networks conceptual clarification. Social Networks, 1(3), 215-239. https://doi.org/10.1016/0378-8733(78)90021-7
  2. Luhn, H. P. (1960). Key word-in-context index for technical literature (kwic index). American Documentation, 11(4), 288-295. https://doi.org/10.1002/asi.5090110403
  3. Allan, J., Carbonell, J. G., Doddington, G., Yamron, J., & Yang, Y. (2003). Topic detection and tracking pilot study final report. Retrieved from https://kilthub.cmu.edu/articles/Topic_Detection_and_Tracking_Pilot_Study_Final_Report/6610943
  4. Antiqueira, L., Oliveira Jr, O. N., da Fontoura Costa, L., & Nunes, M. D. G. V. (2009). A complex network approach to text summarization. Information Sciences, 179(5), 584-599. https://doi.org/10.1016/j.ins.2008.10.032
  5. Bun, K. K., & Ishizuka, M. (2002, December). Topic extraction from news archive using TF* PDF algorithm. In Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002. (pp. 73-82). IEEE.
  6. Carbonell, J. G., Yang, Y., Lafferty, J., Brown, R. D., Pierce, T., & Liu, X. (1999). CMU Approach to TDT-2: Segmentation, Detection, and Tracking. Retrieved from https://kilthub.cmu.edu/articles/CMU_Approach_to_TDT-2_Segmentation_Detection_and_Tracking/6621371/files/12117779.pdf
  7. Daniel, N., Radev, D., & Allison, T. (2003, May). Sub-event based multi-document summarization. In Proceedings of the HLT-NAACL 03 on Text summarization workshop-Volume 5 (pp. 9-16). Association for Computational Linguistics.
  8. Erkan, G., & Radev, D. R. (2004). Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research, 22, 457-479. https://doi.org/10.1613/jair.1523
  9. Hu, M., & Liu, B. (2004, August). Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 168-177). ACM.
  10. Lin, C. Y. (2004). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out (pp. 74-81).
  11. Lovins, J. B. (1968). Development of a stemming algorithm. Mech. Translat. & Comp. Linguistics, 11(1-2), 22-31.
  12. Marujo, L., Ling W., Ribeiro, R., Gershman, A., Carbonell, J., Matos, D. M., & Neto, H. P. (2016). Exploring events and distributed representations of text in multi-document summarization. Knowledge-Based Systems, 94, 33-42. https://doi.org/10.1016/j.knosys.2015.11.005
  13. Mihalcea, R., & Tarau, P. (2004). Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing (pp. 404-411).
  14. Ouyang, Y., Li, W., Li, S., & Lu, Q. (2011). Applying regression models to query-focused multidocument summarization. Information Processing & Management, 47(2), 227-237. https://doi.org/10.1016/j.ipm.2010.03.005
  15. Popescu, A. M., & Etzioni, O. (2007). Extracting product features and opinions from reviews. In Natural language processing and text mining (pp. 9-28). Springer, London.
  16. Walker, C., Strassel, S., Medero, J., & Maeda, K. (2006). ACE 2005 Multilingual Training Corpus. In Linguistic Data Consortium, Philadelphia, 57.
  17. Wilson, T., Wiebe, J., & Hoffmann, P. (2005). Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing.
  18. Yang, Y., Carbonell, J. G., Brown, R. D., Pierce, T., Archibald, B. T., & Liu, X. (1999). Learning approaches for detecting and tracking news events. IEEE Intelligent System, 14(4), 32-43.
  19. Yang, Y., Pierce, T., & Carbonell, J. (1998). A study of retrospective and on-line event detection. In Proceedings of the 21st Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, 28-36.