Browse > Article
http://dx.doi.org/10.7465/jkdi.2013.24.5.999

Documents recommendation using large citation data  

Chae, Minwoo (Department of Statistics, Seoul National University)
Kang, Minsoo (Kwang Gae To Laboratory)
Kim, Yongdai (Department of Statistics, Seoul National University)
Publication Information
Journal of the Korean Data and Information Science Society / v.24, no.5, 2013 , pp. 999-1011 More about this Journal
Abstract
In this research, we propose a document recommendation method which can find documents that are relatively important to a specific document based on citation information. The key idea is parameter tuning in the Neumann kernal which is an intermediate between a measure of importance (HITS) and of relatedness (co-citation). Our method properly selects the tuning parameter ${\gamma}$ in the Neumann kernal minimizing the prediction error in future citation. We also discuss some comutational issues needed for analysing large citation data. Finally, results of analyzing patents data from the US Patent Office are given.
Keywords
Big data; citation data analysis; Neumann kernel; recommendation; sparse matrix computation;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Blei, D. M. and Lafferty, J. D. (2007) A correlated topic model of science. The Annals of Applied Statistics, 1, 17-35.   DOI
2 Blei, D. M., NG, A. Y. and Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993-1022.
3 Brin, S. and Page, L. (1998). The anatomy of a large-scale hypertextual (web) search engine. Computer Network and ISDN Systems, 30, 107-117.   DOI   ScienceOn
4 Cook, D. J. and Holder, L. B. (2006). Mining graph data, John Wiley & Sons, New Jersey.
5 Garfield, E. and Merton, R. K. (1979). Citation indexing: Its theory and application in science, technology, and humanities, Wiley, New York.
6 Golub, G. H. and Van Loan, C. F. (2012). Matrix computations, Johns Hopkins University Press, Baltimore.
7 He, Q., Pei, J., Kifer, D., Mitra, P. and Giles, C. L. (2010). Context-aware citation recommendation. Proceedings of the 19th International Conference on World Wide Web, 421-430.
8 Hofmann, T. (2004). Latent semantic models for collaborative filtering. ACM Transactions on Information Systems, 22, 22, 89-115.   DOI   ScienceOn
9 Jannach, D., Zanker, M., Felfernig, A. and Friedrich, G. (2010). Recommender systems: An introduction, Cambridge University Press, New York.
10 Kandola, J., Shawe-Taylor, J. and Cristianini, N. (2003). Learning semantic similarity. In Neural Information Processing Systems, 673-680.
11 Kessler, M. M. (1963). Bibliographic coupling between scientific papers. American Documentation, 14, 10-25.   DOI
12 Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46, 604-632.   DOI   ScienceOn
13 Lam, C. (2010). Hadoop in action, Manning Publications Company, Stamford.
14 Lehoucq, R. B., Sorensen, D. C. and Yang, C. (1998). ARPACK users’ guide: Solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods, 6, Siam, Philadelphia.
15 Li, W. and McCallum, A. (2006). Pachinko allocation: DAG-structured mixture models of topic correlations. Proceedings of the 23rd International Conference on Machine Learning, 577-584.
16 Liben-Nowell, D. and Kleinberg, J. (2007). The link prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58, 1019-1031.   DOI   ScienceOn
17 Saad, Y. (1990). SPARSKIT: A basic toolkit for sparse matrix computations, Research Institute for Advanced Computer Science, NASA Ames Research Center Moffet Field, CA.
18 McNee, S. M., Albert, I., Cosley, D., Gopalkrishnan, P., Lam, S. K., Rashid, A. M., Konstan, J. A. and Riedl, J. (2002). On the recommending of citations for research papers. Proceedings of the 2002 ACM Conference on Computer Supported Cooperative Work, 116-125.
19 Page, L. and Brin S. (1999). The PageRank citation ranking: Bringing order to the web, Stanford InfoLab, California.
20 Shimbo, M. and Ito, T. (2006). Kernels as link analysis measures, John Wiley & Sons, New Jersey, 283-310.
21 Sanders, J. and Kandrot, E. (2010). CUDA by example: An introduction to general-purpose GPU programming, Addison-Wesley Professional, Boston.
22 Small, H. (1973). Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24, 265-269.   DOI   ScienceOn
23 Strohman, T., Croft, W. and Jensen, D. (2007). Recommending citations for academic papers. Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 705-706.
24 Tang, J. and Zhang, J. (2009). A discriminative approach to topic-based citation recommendation. Advances in Knowledge Discovery and Data Mining, 572-579.
25 Teh, Y. W., Jordan M. I., Beal, M. J. and Blei, D. M. (2006). Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101, 1566-1581.   DOI
26 Wei, X. and Croft W. B. (2006). LDA-based document models for ad-hoc retrieval. Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 178-185.
27 White, S. and Smyth P. (2003). Algorithms for estimating relative importance in networks. Proceedings of the KDD’03, 266-275.