[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.7465/jkdi.2013.24.5.999

Documents recommendation using large citation data

Chae, Minwoo (Department of Statistics, Seoul National University)
Kang, Minsoo (Kwang Gae To Laboratory)
Kim, Yongdai (Department of Statistics, Seoul National University)

Publication Information

Journal of the Korean Data and Information Science Society / v.24, no.5, 2013 , pp. 999-1011 More about this Journal

Abstract

In this research, we propose a document recommendation method which can find documents that are relatively important to a specific document based on citation information. The key idea is parameter tuning in the Neumann kernal which is an intermediate between a measure of importance (HITS) and of relatedness (co-citation). Our method properly selects the tuning parameter ${\gamma}$ in the Neumann kernal minimizing the prediction error in future citation. We also discuss some comutational issues needed for analysing large citation data. Finally, results of analyzing patents data from the US Patent Office are given.

Keywords

Big data; citation data analysis; Neumann kernel; recommendation; sparse matrix computation;

Citations & Related Records

Reference

1	Blei, D. M. and Lafferty, J. D. (2007) A correlated topic model of science. The Annals of Applied Statistics, 1, 17-35. DOI
2	Blei, D. M., NG, A. Y. and Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993-1022.
3	Brin, S. and Page, L. (1998). The anatomy of a large-scale hypertextual (web) search engine. Computer Network and ISDN Systems, 30, 107-117. DOI ScienceOn
4	Cook, D. J. and Holder, L. B. (2006). Mining graph data, John Wiley & Sons, New Jersey.
5	Garfield, E. and Merton, R. K. (1979). Citation indexing: Its theory and application in science, technology, and humanities, Wiley, New York.
6	Golub, G. H. and Van Loan, C. F. (2012). Matrix computations, Johns Hopkins University Press, Baltimore.
7	He, Q., Pei, J., Kifer, D., Mitra, P. and Giles, C. L. (2010). Context-aware citation recommendation. Proceedings of the 19th International Conference on World Wide Web, 421-430.
8	Hofmann, T. (2004). Latent semantic models for collaborative filtering. ACM Transactions on Information Systems, 22, 22, 89-115. DOI ScienceOn
9	Jannach, D., Zanker, M., Felfernig, A. and Friedrich, G. (2010). Recommender systems: An introduction, Cambridge University Press, New York.
10	Kandola, J., Shawe-Taylor, J. and Cristianini, N. (2003). Learning semantic similarity. In Neural Information Processing Systems, 673-680.
11	Kessler, M. M. (1963). Bibliographic coupling between scientific papers. American Documentation, 14, 10-25. DOI
12	Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46, 604-632. DOI ScienceOn
13	Lam, C. (2010). Hadoop in action, Manning Publications Company, Stamford.
14	Lehoucq, R. B., Sorensen, D. C. and Yang, C. (1998). ARPACK users’ guide: Solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods, 6, Siam, Philadelphia.
15	Li, W. and McCallum, A. (2006). Pachinko allocation: DAG-structured mixture models of topic correlations. Proceedings of the 23rd International Conference on Machine Learning, 577-584.
16	Liben-Nowell, D. and Kleinberg, J. (2007). The link prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58, 1019-1031. DOI ScienceOn
17	Saad, Y. (1990). SPARSKIT: A basic toolkit for sparse matrix computations, Research Institute for Advanced Computer Science, NASA Ames Research Center Moffet Field, CA.
18	McNee, S. M., Albert, I., Cosley, D., Gopalkrishnan, P., Lam, S. K., Rashid, A. M., Konstan, J. A. and Riedl, J. (2002). On the recommending of citations for research papers. Proceedings of the 2002 ACM Conference on Computer Supported Cooperative Work, 116-125.
19	Page, L. and Brin S. (1999). The PageRank citation ranking: Bringing order to the web, Stanford InfoLab, California.
20	Shimbo, M. and Ito, T. (2006). Kernels as link analysis measures, John Wiley & Sons, New Jersey, 283-310.
21	Sanders, J. and Kandrot, E. (2010). CUDA by example: An introduction to general-purpose GPU programming, Addison-Wesley Professional, Boston.
22	Small, H. (1973). Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24, 265-269. DOI ScienceOn
23	Strohman, T., Croft, W. and Jensen, D. (2007). Recommending citations for academic papers. Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 705-706.
24	Tang, J. and Zhang, J. (2009). A discriminative approach to topic-based citation recommendation. Advances in Knowledge Discovery and Data Mining, 572-579.
25	Teh, Y. W., Jordan M. I., Beal, M. J. and Blei, D. M. (2006). Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101, 1566-1581. DOI
26	Wei, X. and Croft W. B. (2006). LDA-based document models for ad-hoc retrieval. Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 178-185.
27	White, S. and Smyth P. (2003). Algorithms for estimating relative importance in networks. Proceedings of the KDD’03, 266-275.

Reference

4	Dongjun Choi. (2016) Journal of the Korean Data and Information Science Society Classification of ratings in online reviews / 27 (4) , 845
3	Yoon-Joo Park. (2014) Journal of Intelligence and Information Systems The knowledge and human resources distribution system for university-industry cooperation / 20 (3) , 133
3	Yoon-Joo Park. (2015) Journal of Intelligence and Information Systems Social Tagging-based Recommendation Platform for Patented Technology Transfer / 21 (3) , 53
5	Hyon Hee Kim. (2016) Journal of the Korean Data and Information Science Society Patent data analysis using clique analysis in a keyword network / 27 (5) , 1273
6	Kyeongjun Lee. (2014) Journal of the Korean Data and Information Science Society Structuring of unstructured big data and visual interpretation / 25 (6) , 1431

1	The knowledge and human resources distribution system for university-industry cooperation / [Park, Yoon-Joo;] / Journal of Intelligence and Information Systems
2	Structuring of unstructured big data and visual interpretation / [Lee, Kyeongjun;Noh, Yunhwan;Yoon, Sanggyeong;Cho, Youngseuk;] / Journal of the Korean Data and Information Science Society

KSCI

Documents recommendation using large citation data 거대 인용 자료를 이용한 문서 추천 방법

Documents recommendation using large citation data