인용 필드 정규화와 타입이 인용매칭에 미치는 영향

Influence of Normalization and Types of Citation Fields on Citation Matching

  • 구희관 (한국과학기술정보연구원 정보서비스연구팀) ;
  • 정한민 (한국과학기술정보연구원 정보서비스연구팀) ;
  • 성원경 (한국과학기술정보연구원 정보서비스연구팀)
  • 발행 : 2008.11.28


본 논문은 인용필드의 정규화와 타입이 인용매칭에 미치는 영향에 대한 분석을 제시한다. 인용매칭은 같은 논문을 참조하는 인용레코드를 군집화하는 일련의 과정을 지칭한다. 인용매칭은 인용레코드를 구성하고 있는 인용필드들 간의 비교 결과들을 조합하여 인용레코드의 일치 여부를 판별하는 것이다. 인용매칭 단계 내의 인용필드 간 비교를 위하여 인용필드 정규화 및 인용필드 타입에 대한 연구가 필요하였으나, 인용매칭 방법에 대한 연구에 비해 상대적으로 미흡하였다. 본 연구에서는 인용매칭 성능이 인용필드의 정규화 및 인용필드 타입에 따라 달라진다는 것을 보였다. 추가적으로, 정규화를 적용한 다중 필드 결합을 이용한 인용매칭 성능을 분석하였다. 실험결과에 의하면, 인용필드는 정규화를 통하여 전반적인 성능향상이 있었으며, 인용필드 타입에 따라 성능 양상이 다르게 나타났다.

In this paper, we present the analysis of the fact that normalization and types of citation fields have an effect to the citation matching. Citation matching indicates the series of grouping process for the citation records referring to the same paper. The citation matching combines the comparison results of citation fields, and determines which citation records are the same. For the citation field comparison in the citation matching phase, studies on the normalization and types of citation fields are needed. But they are relatively insufficient when compared with the studies on citation matching methods. In this research, we showed that the citation matching performance was affected by the normalization and types of citation fields. Additionally, we also analyzed the combination of normalized multiple fields. According to the experimental result, the citation field had the overall performance improvement through a normalization, and the performance mode differently showed up at the citation field type.



  1. A. McCallum, K. Nigam, and L. Ungar, "Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching," Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.169-178, 2000.
  2. A. Van Raan, "For Your Citations Only? Hot Topics in Bibliometric Analysis," Measurement Interdisciplinary Research and Perspectives, Vol.3, No.50, pp.50-62, 2005.
  3. B. Wellner, A. McCallum, F. Peng, and M. Hay, "An Integrated, Conditional Model of Information Extraction and Coreference with Application to Citation Matching," Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, pp.593-601, 2004.
  4. F. Peng and A. McCallum, "Accurate information extraction from research papers using conditional random fields," Proceedings of Human Language Technology Conference / North American Chapter of the Association for Computational Linguistics Annual Meeting, pp.329-336, 2004.
  5. G, Councill, H. Li, Z. Zhuang, S. Debnath, L. Bolelli, W. Lee, A. Sivasubramaniam, and C. Giles, "Learning Metadata from the Evidence in an On-line Citation Matching Scheme," Proceedings of Joint Conference on Digital Libraries, pp.276-285, 2006.
  6. H. Han, C. Giles, E. Manavoglu, Z. Hongyuan, Z. Zhenyue, and E. Fox, "Automatic Document Metadata Extraction using Support Vector Machines," Proceedings of Joint Conference on Digital Libraries, pp.37-48, 2003.
  7. H. Pasula, B. Marthi, B. Milch, S. Russell, and I. Shpitser, "Identity Uncertainty and Citation Matching," Advances in Neural Information Processing, pp.1401-1408, 2002.
  8. I. Mansuri and S. Sarawagi,."Integrating Unstructured Data into Relational Databases," Proceedings of the 22th International Conference on Data Engineering, p.29, 2006.
  9. K. Borner, J. Maru, and R. Goldstone, "The Simultaneous Evolution of Author and Paper Networks," Proceedings of the National Academy of Science of the United States, Vol. 101(suppl. 1), pp.5266-5273, 2004.
  10. M. Bilenko, R. Mooney, W. Cohen, P. Ravikumar, and S. Fienberg, "Adaptive Name Matching in Information Integration," J. of IEEE Intelligent Systems, Vol.18, No.5, pp.16-23, 2003.
  11. M. Richardson and P Domingos, "Markov logic Networks," J. of Machine Learning, Vol.62, pp.107-136, 2006.
  12. P. Singla and P. Domingos, "Entity Resolution with Markov Logic," Proceedings of the 6th International Conference on Data Mining, pp.572-582, 2006.
  13. R. Baxter, P. Christen, and T. Churches, "A Comparison of Fast Blocking Methods for Record Linkage," Proceedings of the Workshop on Data Cleaning, Record Linkage and Object Consolidation at the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003.
  14. S. Lawrence, C. Giles, and K. Bollacker, "Digital Libraries and Autonomous Citation Indexing," J. of IEEE Computer, Vol.32, No.6, pp.67-71, 1999.
  15. S. Sarawagi, V. Vydiswaran, S. Srinivasan, and K. Bhudhia, "Resolving Citations in a Paper Repository," Proceedings of SIGKDD Explorations, Vol.5, No.2, pp.156-157, 2003.
  16. W. Winkler, "Overview of Record Linkage and Current Research Directions," Technical Report RRS2006/02, US Bureau of the Census, 2006.