Browse > Article
http://dx.doi.org/10.5391/JKIIS.2015.25.2.180

Latent Keyphrase Extraction Using LDA Model  

Cho, Taemin (Department of Electrical and Computer Engineering, Sungkyunkwan University)
Lee, Jee-Hyong (Department of Electrical and Computer Engineering, Sungkyunkwan University)
Publication Information
Journal of the Korean Institute of Intelligent Systems / v.25, no.2, 2015 , pp. 180-185 More about this Journal
Abstract
As the number of document resources is continuously increasing, automatically extracting keyphrases from a document becomes one of the main issues in recent days. However, most previous works have tried to extract keyphrases from words in documents, so they overlooked latent keyphrases which did not appear in documents. Although latent keyphrases do not appear in documents, they can undertake an important role in text summarization and information retrieval because they implicate meaningful concepts or contents of documents. Also, they cover more than one fourth of the entire keyphrases in the real-world datasets and they can be utilized in short articles such as SNS which rarely have explicit keyphrases. In this paper, we propose a new approach that selects candidate keyphrases from the keyphrases of neighbor documents which are similar to the given document and evaluates the importance of the candidates with the individual words in the candidates. Experiment result shows that latent keyphrases can be extracted at a reasonable level.
Keywords
Latent Keyphrase; Latent Dirichlet Allocation(LDA); Keyphrase Extraction; Neighbor Document;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 W. You, D. Fontaine, and J. P. Barthes, "An automatic keyphrase extraction system for scientific documents," Knowledge and information systems, vol. 34, no .3, pp. 691-724, 2013.   DOI
2 K. Zhang, H. Xu, J. Tang, and J. Li, "Keyword exraction using support vector machine," Proceedings of the 7th international conference on web-age information management, pp 86-96, 2006.
3 C. Zhang, H. Wang, Y. Liu, D. Wu, Y. Liao, and B. Wang, "Automatic keyword extraction from documents using conditional random fields," Journal of Computational Information System, vol. 4, no. 3, pp. 1169-1180, 2008.
4 K. S. Jones, "A statistical interpretation of term specificity and its application in retrieval," Journal of documentation, vol. 28, no. 1, pp. 11-21, 1972.   DOI
5 M. Haddoud, and S. Abdeddaïm, "Accurate keyphrase extraction by discriminating overlapping phrases," Journal of Information Science, 2014.
6 R. Mihalcea, and P. Tarau, "Textrank: bringing order into texts," Association for Computational Linguistics, 2004.
7 X. Wan, and J. Xiao, "Single Document Keyphrase Extraction Using Neighborhood Knowledge," Association for the Advancement of Artificial Intelligence, vol. 8, 2008.
8 Z. Liu, W. Huang, Y. Zheng, and M. Sun, "Automatic keyphrase extraction via topic decomposition," Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2010.
9 Y. Matsuo, and M. Ishizuka, "Keyword extraction from a single document using word co-occurrence statistical information," International Journal on Artificial Intelligence Tools, vol. 13, no. 1, pp. 157-169, 2004.   DOI
10 J. Park, J. Kim, J. Lee, and J. H. Lee, "Keyword extraction for blogs based on content richness," Journal of Information Science, vol. 40, no.1, pp. 38-49.   DOI
11 T. Cho, H. Cho, and H. J. Lee, "Latent Keyphrase Generation by Combining Contextually Similar Primitive Words," Joint 7th International Conference on Soft Computing and Intelligent Systems and 15th International Symposium on Advanced Intelligent Systems, pp. 600-604, 2014.
12 M. G. Kim, N. G. Kim, and I. H. Jung, "A Methodology for Extracting Shopping-Related Keywords by Analyzing Internet Navigation Patterns," Journal of Intelligence and Information Systems, vol. 20, no. 2, pp. 123-136, 2014.   DOI
13 J. Go, J. W. Son, H. J. Song, and S. Y. Park, "Personalized Keyword Extraction using Dialogue History," Journal of the Korean Institute of Information Scientists and Engineers: Computing Practices and Letters, vol. 18, no. 12, pp. 896-900, 2012.
14 D. J. Choi, S. W. Lee, J. K. Kim, and J. H. Lee, "A Study on Graph-based Topic Extraction from Microblogs," Journal of The Korean Institute of Intelligent System, vol. 21, no. 5, pp. 564-568, 2011.   DOI
15 M. Krapivin, A. Autaeu, and M. Marchese, "Large dataset for keyphrases extraction," Technical Report DISI-09-055, 2009.
16 S. N. Kim, O. Medelyan, M. K. Kan, and T. Baldwin, "Semeval-2010 task 5: automatic keyphrase extraction from scientific articles," Proceedings of the 5th International Workshop on Semantic Evaluation. Association for Computational Linguistics, 2010.
17 lextek, "Stop Word List 1," Available: http://www.lextek.com/manuals/onix/stopwords1.html, [Accessed: March 10, 2015].
18 M. F. Porter, "An algorithm for suffix stripping," Program: electronic library and information systems, vol. 14, no. 3, pp. 130-137, 1980.   DOI
19 E. Frank, G. W. Paynter, I. H. Witten, C. Gutwin, and C. G. Nevill-Manning, "Domain-specific keyphrase extraction," Proceedings of the 16th international joint conference on artificial intelligence, pp 668-673, 1999.
20 X. H. Phan and C. T. Nguyen, "Jgibblda," Available: http://jgibblda.sourceforge.net, [Accessed: January 16, 2015].
21 A. Hulth, "Improved automatic keyword extraction given more linguistic knowledge," Proceedings of the 2003 conference on Empirical methods in natural language processing. Association for Computational Linguistics, 2003.