Browse > Article
http://dx.doi.org/10.6109/jkiice.2014.18.9.2155

Document Summarization using Semantic Feature and Hadoop  

Kim, Chul-Won (Department of Computer Engineering, Honam University)
Abstract
In this paper, we proposes a new document summarization method using the extracted semantic feature which the semantic feature is extracted by distributed parallel processing based Hadoop. The proposed method can well represent the inherent structure of documents using the semantic feature by the non-negative matrix factorization (NMF). In addition, it can summarize the big data document using Hadoop. The experimental results demonstrate that the proposed method can summarize the big data document which a single computer can not summarize those.
Keywords
Document summarization; semantic feature; Hadoop; distributed parallel processing;
Citations & Related Records
연도 인용수 순위
  • Reference
1 K. Ramanathan, Y. Sankarasubramaniam, N. Mathur, A. Gupta, "Document Summarization using Wikipedia", in Proceedings of the First International Conference on HCI, Japan, 2009.
2 S. Ye, T. S. Chua, J. Lu, "Summarization Definition from Wikipedia", in Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, Singapore, pp. 199-207, 2009.
3 S. Gong, Y. Qu, S. Tian, "Summarization using Wikipedia", in Proceedings of Text Analysis Conference 2010, Gaithersburg, Maryland, USA, 2010.
4 M., Sanderson, "Accurate user directed summarization from existing tools", in Proceeding of the international conference on information and knowledge management, Bethesda, Maryland, USA, pp.45-51, 1998.
5 A., Tombros, M., Sanderson, "Advantages of Query Biased summaries in Information Retrieval", in Proceeding of ACM Special Interest Group on Information Retrieval, pp.2-10, Melbourne, Australia, 1998.
6 R., Varadarajan, V., Hristidis, "A System for Query Specific Document Summarization", in Proceeding of the International Conference on Information and Knowledge Management, Arlington, Virginia, USA, pp.622-631, 2006.
7 S. Owen, R. Anil, T. Dunning, E. Friedman, Mahout in Action, Manning Publiications, 2011.
8 D. D. Lee, H. S. Seung, "Algorithms for non-negative matrix factorization," In Advances in Neural Information Processing Systems, vol. 13, pp.556-562, Aug. 2001.
9 C. Liu, H. C. Yang, J. Fan, L. W. He, Y. M. Wang, "Distributed Nonnegative Matrix Factorization for Web-Scale Dyadic Data Analysis on MapReduce," in Proceeding of the International World Wide Web Conferene Comittee, USA, pp.1-10, 2010.
10 B. Y. Ricardo, Berthier, R. N., Moden Information Retrieval, ACM Press. 1999.
11 T. White, Hadoop: The Definitive Guide, 3th ed. O'Reilly Media, 2012.
12 V. Nastase, "Topic-Driven Multi-Document Summarization with Encyclopedic Knowledge and Spreading Activation," in Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, Honolulu, Hawaii, USA, pp.763-772, 2008.