Browse > Article

The Impact of Name Ambiguity on Properties of Coauthorship Networks  

Kim, Jinseok (Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign)
Kim, Heejun (School of Information and Library Science, University of North Carolina at Chapel Hill)
Diesner, Jana (Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign)
Publication Information
Journal of Information Science Theory and Practice / v.2, no.2, 2014 , pp. 6-15 More about this Journal
Initial based disambiguation of author names is a common data pre-processing step in bibliometrics. It is widely accepted that this procedure can introduce errors into network data and any subsequent analytical results. What is not sufficiently understood is the precise impact of this step on the data and findings. We present an empirical answer to this question by comparing the impact of two commonly used initial based disambiguation methods against a reasonable proxy for ground truth data. We use DBLP, a database covering major journals and conferences in computer science and information science, as a source. We find that initial based disambiguation induces strong distortions in network metrics on the graph and node level: Authors become embedded in ties for which there is no empirical support, thus increasing their sphere of influence and diversity of involvement. Consequently, networks generated with initial-based disambiguation are more coherent and interconnected than the actual underlying networks, and individual authors appear to be more productive and more strongly embedded than they actually are.
bibliometrics; name ambiguity; initial based disambiguation; coauthorship networks; collaboration networks;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Wagner, C. S., & Leydesdorff, L. (2005). Network structure, self-organization, and the growth of international collaboration in science. Research Policy, 34(10), 1608-1618. doi:   DOI   ScienceOn
2 Torvik, V. I., Weeber, M., Swanson, D. R., & Smalheiser, N. R. (2005). A probabilistic similarity metric for Medline records: A model for author name disambiguation. Journal of the American Society for Information Science and Technology, 56(2), 140-158. doi: Doi 10.1002/Asi/20105   DOI   ScienceOn
3 Treeratpituk, P., & Giles, C. L. (2009). Disambiguating Authors in Academic Publications using Random Forests. Paper presented at the Jcdl 09: Proceedings of the 2009 Acm/Ieee Joint Conference on Digital Libraries.
4 Velden, Haque, A., & Lagoze, C. (2011). Resolving author name homonymy to improve resolution of structures in co-author networks. Paper presented at the Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries.
5 Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications. New York, NY: Cambridge University Press.
6 Yoshikane, F., Nozawa, T., Shibui, S., & Suzuki, T. (2009). An analysis of the connection between researchers' productivity and their co-authors' past attributions, including the importance in collaboration networks. Scientometrics, 79(2), 435-449. doi: 10.1007/s11192-008-0429-8   DOI
7 Milojevic, S. (2010). Modes of Collaboration in Modern Science: Beyond Power Laws and Preferential Attachment. Journal of the American Society for Information Science and Technology, 61(7), 1410-1423. doi: 10.1002/asi.21331   DOI   ScienceOn
8 Ley, M. (2009). DBLP: some lessons learned. Proc. VLDB Endow., 2(2), 1493-1500.   DOI
9 Leydesdorff, L., & Sun, Y. (2009). National and International Dimensions of the Triple Helix in Japan: University-Industry-Government Versus International Coauthorship Relations. Journal of the American Society for Information Science and Technology, 60(4), 778-788. doi: 10.1002/asi.20997   DOI   ScienceOn
10 Liben-Nowell, D., & Kleinberg, J. (2007). The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7), 1019-1031. doi: 10.1002/asi.20591   DOI   ScienceOn
11 Milojevic, S. (2013). Accuracy of simple, initials-based methods for author name disambiguation. Journal of Informetrics, 7(4), 767-773. doi:   DOI   ScienceOn
12 Moody, J. (2004). The structure of a social science collaboration network: Disciplinary cohesion from 1963 to 1999. American Sociological Review, 69(2), 213-238.   DOI   ScienceOn
13 Newman, M. E. J. (2001). The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences of the United States of America, 98(2), 404-409. doi: 10.1073/pnas.021544898   DOI   ScienceOn
14 Newman, M. E. J. (2002). Assortative mixing in networks. Physical Review Letters, 89(20), 208701.   DOI   ScienceOn
15 Rorissa, A., & Yuan, X. J. (2012). Visualizing and mapping the intellectual structure of information retrieval. Information Processing & Management, 48(1), 120-135. doi: 10.1016/j.ipm.2011.03.004   DOI   ScienceOn
16 Smalheiser, N. R., & Torvik, V. I. (2009). Author Name Disambiguation. Annual Review of Information Science and Technology, 43, 287-313.
17 Strotmann, A., & Zhao, D. Z. (2012). Author name disambiguation: What difference does it make in author-based citation analysis? Journal of the American Society for Information Science and Technology, 63(9), 1820-1833. doi: Doi 10.1002/Asi.22695   DOI   ScienceOn
18 Torvik, V. I., & Smalheiser, N. R. (2009). Author Name Disambiguation in MEDLINE. Acm Transactions on Knowledge Discovery from Data, 3(3). doi: Doi 10.1145/1552303.1552304   DOI
19 Diesner, J., & Carley, K. M. (2009). He says, she says, pat says, Tricia says: how much reference resolution matters for entity extraction, relation extraction, and social network analysis. Paper presented at the Proceedings of the Second IEEE international conference on Computational intelligence for security and defense applications, Ottawa, Ontario, Canada.
20 Franceschet, M. (2011). Collaboration in Computer Science: A Network Science Approach. Journal of the American Society for Information Science and Technology, 62(10), 1992-2012. doi: 10.1002/asi.21614   DOI   ScienceOn
21 Fegley, B. D., & Torvik, V. I. (2013). Has Large-Scale Named-Entity Network Analysis Been Resting on a Flawed Assumption? Plos One, 8(7). doi: 10.1371/journal.pone.0070299   DOI
22 Fiala, D. (2012). Time-aware PageRank for bibliographic networks. Journal of Informetrics, 6(3), 370-388. doi: 10.1016/j.joi.2012.02.002   DOI   ScienceOn
23 Friedkin, N. E. (1981). The Development of Structure in Random Networks: An Analysis of the Effects of Increasing Network Density on Five Measures of Structure. Social Networks, 3(1), 41-52.   DOI   ScienceOn
24 Goyal, S., van der Leij, M. J., & Moraga-Gonzalez, J. L. (2006). Economics: An emerging small world. Journal of Political Economy, 114(2), 403-412. doi: 10.1086/500990   DOI   ScienceOn
25 He, B., Ding, Y., & Ni, C. (2011). Mining Enriched Contextual Information of Scientific Collaboration: A Meso Perspective. Journal of the American Society for Information Science and Technology, 62(5), 831-845. doi: 10.1002/asi.21510   DOI   ScienceOn
26 Huber, J. C. (2002). A new model that generates Lotka's Law. Journal of the American Society for Information Science and Technology, 53(3), 209-219. doi: 10.1002/asi.10025   DOI   ScienceOn
27 Ley, M. (2002). The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives. In A. F. Laender & A. Oliveira (Eds.), String Processing and Information Retrieval (Vol. 2476, pp. 1-10): Springer Berlin Heidelberg.
28 Knoke, D., & Yang, S. (2008). Social network analysis. Los Angeles, CA: Sage Publications.
29 Lariviere, V., Sugimoto, C. R., & Cronin, B. (2012). A bibliometric chronicling of library and information science's first hundred years. Journal of the American Society for Information Science and Technology, 63(5), 997-1016. doi: 10.1002/asi.22645   DOI   ScienceOn
30 Lee, D., Goh, K. I., Kahng, B., & Kim, D. (2010). Complete trails of coauthorship network evolution. Physical Review E, 82(2). doi: 10.1103/PhysRevE.82.026112   DOI
31 Braun, T., Glanzel, W., & Schubert, A. (2001). Publication and cooperation patterns of the authors of neuroscience journals. Scientometrics, 51(3), 499-510. doi: 10.1023/a:1019643002560   DOI
32 Barabasi, A. L., Jeong, H., Neda, Z., Ravasz, E., Schubert, A., & Vicsek, T. (2002). Evolution of the social network of scientific collaborations. Physica a-Statistical Mechanics and Its Applications, 311(3-4), 590-614. doi: 10.1016/s0378-4371(02)00736-7   DOI   ScienceOn
33 Bettencourt, L. M. A., Lobo, J., & Strumsky, D. (2007). Invention in the city: Increasing returns to patenting as a scaling function of metropolitan size. Research Policy, 36(1), 107-120. doi: 10.1016/j.respol.2006.09.026   DOI   ScienceOn
34 Brandes, U. (2008). On variants of shortest-path betweenness centrality and their generic computation. Social Networks, 30(2), 136-145. doi:   DOI   ScienceOn
35 de Nooy, W., Mrvar, A., & Batagelj, V. (2011). Exploratory social network analysis with Pajek: Cambridge University Press.