A Ranking Algorithm for Semantic Web Resources: A Class-oriented Approach

시맨틱 웹 자원의 랭킹을 위한 알고리즘: 클래스중심 접근방법

  • 노상규 (서울대학교 경영전문대학원/경영대학) ;
  • 박현정 (서울대학교 경영대학) ;
  • 박진수 (서울대학교 경영전문대학원/경영대학)
  • Published : 2007.12.31

Abstract

We frequently use search engines to find relevant information in the Web but still end up with too much information. In order to solve this problem of information overload, ranking algorithms have been applied to various domains. As more information will be available in the future, effectively and efficiently ranking search results will become more critical. In this paper, we propose a ranking algorithm for the Semantic Web resources, specifically RDF resources. Traditionally, the importance of a particular Web page is estimated based on the number of key words found in the page, which is subject to manipulation. In contrast, link analysis methods such as Google's PageRank capitalize on the information which is inherent in the link structure of the Web graph. PageRank considers a certain page highly important if it is referred to by many other pages. The degree of the importance also increases if the importance of the referring pages is high. Kleinberg's algorithm is another link-structure based ranking algorithm for Web pages. Unlike PageRank, Kleinberg's algorithm utilizes two kinds of scores: the authority score and the hub score. If a page has a high authority score, it is an authority on a given topic and many pages refer to it. A page with a high hub score links to many authoritative pages. As mentioned above, the link-structure based ranking method has been playing an essential role in World Wide Web(WWW), and nowadays, many people recognize the effectiveness and efficiency of it. On the other hand, as Resource Description Framework(RDF) data model forms the foundation of the Semantic Web, any information in the Semantic Web can be expressed with RDF graph, making the ranking algorithm for RDF knowledge bases greatly important. The RDF graph consists of nodes and directional links similar to the Web graph. As a result, the link-structure based ranking method seems to be highly applicable to ranking the Semantic Web resources. However, the information space of the Semantic Web is more complex than that of WWW. For instance, WWW can be considered as one huge class, i.e., a collection of Web pages, which has only a recursive property, i.e., a 'refers to' property corresponding to the hyperlinks. However, the Semantic Web encompasses various kinds of classes and properties, and consequently, ranking methods used in WWW should be modified to reflect the complexity of the information space in the Semantic Web. Previous research addressed the ranking problem of query results retrieved from RDF knowledge bases. Mukherjea and Bamba modified Kleinberg's algorithm in order to apply their algorithm to rank the Semantic Web resources. They defined the objectivity score and the subjectivity score of a resource, which correspond to the authority score and the hub score of Kleinberg's, respectively. They concentrated on the diversity of properties and introduced property weights to control the influence of a resource on another resource depending on the characteristic of the property linking the two resources. A node with a high objectivity score becomes the object of many RDF triples, and a node with a high subjectivity score becomes the subject of many RDF triples. They developed several kinds of Semantic Web systems in order to validate their technique and showed some experimental results verifying the applicability of their method to the Semantic Web. Despite their efforts, however, there remained some limitations which they reported in their paper. First, their algorithm is useful only when a Semantic Web system represents most of the knowledge pertaining to a certain domain. In other words, the ratio of links to nodes should be high, or overall resources should be described in detail, to a certain degree for their algorithm to properly work. Second, a Tightly-Knit Community(TKC) effect, the phenomenon that pages which are less important but yet densely connected have higher scores than the ones that are more important but sparsely connected, remains as problematic. Third, a resource may have a high score, not because it is actually important, but simply because it is very common and as a consequence it has many links pointing to it. In this paper, we examine such ranking problems from a novel perspective and propose a new algorithm which can solve the problems under the previous studies. Our proposed method is based on a class-oriented approach. In contrast to the predicate-oriented approach entertained by the previous research, a user, under our approach, determines the weights of a property by comparing its relative significance to the other properties when evaluating the importance of resources in a specific class. This approach stems from the idea that most queries are supposed to find resources belonging to the same class in the Semantic Web, which consists of many heterogeneous classes in RDF Schema. This approach closely reflects the way that people, in the real world, evaluate something, and will turn out to be superior to the predicate-oriented approach for the Semantic Web. Our proposed algorithm can resolve the TKC(Tightly Knit Community) effect, and further can shed lights on other limitations posed by the previous research. In addition, we propose two ways to incorporate data-type properties which have not been employed even in the case when they have some significance on the resource importance. We designed an experiment to show the effectiveness of our proposed algorithm and the validity of ranking results, which was not tried ever in previous research. We also conducted a comprehensive mathematical analysis, which was overlooked in previous research. The mathematical analysis enabled us to simplify the calculation procedure. Finally, we summarize our experimental results and discuss further research issues.

Keywords

References

  1. 노상규와 박진수, "인터넷 진화의 열쇠 온톨로지," 가즈토이, 2007
  2. Aleman-Meza, B., Halaschek-Wiener, C., Arpinar, I.B., and Sheth, A., "Context- Aware Semantic association Ranking," Semantic Web and Database Workshop Proceedings, Belin, September pp. 7-8, 2003
  3. Aleman-Meza, B., Halaschek-Wiener, C., Arpinar, I.B., Ramakrishnan, C., and Sheth, A., "Ranking Complex Relationships on the Semantic Web," IEEE Internet Computing, Vol. 9, No. 3, 2005, pp. 37-44 https://doi.org/10.1109/MIC.2005.63
  4. Anyanwu, K., Maduko, A., and Sheth, A., "SemRank: Ranking Complex Relationship Search Results on the Semantic Web," International World Wide Web Conference Committee(IW3C2), Chiba, Japan, 2005
  5. Bamba, B. and Mukherjea, S., "Utilizing Resource Importance for Ranking Semantic Web Query Results," Proc. Second Toronto International Work. Semantic Web Databases (SWDB), 2004, pp. 185-198
  6. Berners-Lee, T., "Web for real people," 2005. Available at http://www.w3.org/2005/ Talks/0511-keynote-tbl/#[17]
  7. Boyer, R.S. and Moore, J.S., "A Fast String Searching Algorithm," Comm. ACM, Vol. 20, No. 10, 1977, pp. 762-772 https://doi.org/10.1145/359842.359859
  8. Brickley, D. and Guha, R.V. eds., "RDF Vocabulary Description Language 1.0: RDF Schema," W3C Recommendation, 10 February 2004
  9. Brin, S. and Page, L., "The Anatomy of a Large-Scale Hypertextual Web Search Engine," Special Issu. 7th International World Wide Web Conf. Computer Networks and ISDN Systems, Vol. 30, Vol. 1-7, 1998, pp. 107-117
  10. Brin, S., Motwani, R., Page, L., and Wi- nograd, T., "What can you do with a Web in your Pocket," Bull. IEEE Computer Society Technical Comm. Data Engineering, 1998
  11. Burden, R.L. and Faires, J.D., "Numerical Analysis," seventh edition, BROOKS/COLE, 2001
  12. Ding, L., Finin, T., Joshi, A., Peng, Y., Pan, R., Reddivari, P., "Search on the Semantic Web," IEEE Computer Society, Vol. 38, No. 10, 2005, pp. 62-69
  13. Ding, L., Pan, R., Finin, T., Joshi, A., Peng, Y., and Kolari, P., "Finding and Ranking Knowledge on the Semantic Web," Proc. 4th Galway IE International Semantic Web Conf., 2005, pp. 156-170
  14. Ding, L., Finin, T., Joshi, A., Pan, R., Cost, R.S., Peng, Y., Reddivari, P., Doshi, V.C., and Sachs, J., "Swoogle: A Semantic Web Search and Metadata Engine," Proc. 13th ACM Conf. Info. and Knowledge Management, 2004, pp. 652-659
  15. Ehrlich, L.W., "Rate of Convergence Proofs of the Method for Finding Roots of Polynomials(or Eigenvalues of Matrices) by the Power and Inverse Power Methods," National Technical Information Service AD707331, 1969
  16. Finin, T. and Ding, L., "Search Engines for Semantic Web Knowledge," Proceedings of XTech 2006: Building Web 2.0, Amsterdam, May pp. 16-19, 2006
  17. Friedberg, S.H., Insel, A.J., and Spence, L.E., Linear Algebra(4th Edition), Prentice Hall, 2003
  18. Halaschek, C., Aleman-Meza, B., Arpinar, I.B., and Sheth, A., "Discovering and Ranking Semantic Associations over a Large RDF Metabase," Proceedings of the 30th VLDB Conference, Toronto, Canada, 2004
  19. Haveliwala, T.H., "Efficient Computation of PageRank," Unpublished manuscript, Stanford University, 1999
  20. Karam, N., Benbernou, S., Debrauwer, L., and Schneider, M., "Semantic Ranking of Web Documents," Technical report RR- 04-23, LIMOS, 2004
  21. Kleinberg, J., "Authoritative sources in a hyperlinked environment," Proc. 9th ACM- SIAM Symp. Discrete Algorithms, 1998, pp. 668-677. Extended version in J. ACM, Vol. 46, No. 5, 1999, pp. 604-632
  22. Klyne, G. and Carroll, J. eds., "Resource Description Framework (RDF): Concepts and Abstract Syntax," W3C Recommendation, 10 February 2004
  23. Knuth, D.E., Morris, J.H., and Pratt, V.R., "Fast Pattern Matching in Strings," SIAM Journal on Computing, Vol. 6, No. 2, 1977, pp. 323-350 https://doi.org/10.1137/0206024
  24. Maedche, A. and Staab, S., "Measuring Similarity between Ontologies," European Conference of Knowledge Acquisition and Management(EKAW2002), Lectures Notes in Computer Science, Madrid, Spain, Springer, 2002
  25. Maedche, A., Staab, S., Stojanovic, N., Stu- der, R., and Sure, Y., "SEAL-A Framework for Developing SEmantic Web PortALs," Lecture Notes in Computer Science, 2097, 2001
  26. Manola, F. and Miller, E. eds., "RDF Primer," W3C Recommendation, 2004
  27. Mukherjea, S. and Bamba, B., "BioPatent Miner: An Information Retrieval System for BioMedical Patents," Proc. 30th Toronto Conf. Very Large Databases (VLDB), 2004, pp. 1066-1077
  28. Mukherjea, S., Bamba, B., and Kankar, P., "Information Retrieval and Knowledge Discovery Utilizing a BioMedical Patent Semantic Web," IEEE Trans. Knowledge and Data Eng., Vol. 17, No. 8, 2005, pp. 1099-1110 https://doi.org/10.1109/TKDE.2005.130
  29. National Research Council, "Assessing Research-Doctorate Programs: A Methodology Study," 2003
  30. Page, L., Brin, S., Motwani, R., and Wino- grad, T., "The PageRank Citation Ranking: Bringing Order to the Web," Technical Report, Stanford University, 1998
  31. Perron, O. and Frobenius, F.G., "Perron- Frobenius Theorem," Available at http:// en.wikipedia.org/wiki/Perron%E2% 80%93Frobenius_theorem, 2007
  32. Prud'hommeaux, E. and Seaborne, A. eds., "SPARQL Query Language for RDF," W3C Candidate Recommendation, 14 June 2007
  33. Ren, J. and Taylor, R.N., "Automatic and Versatile Publications Ranking for Research Institutions and Scholars," Communications of the ACM, Vol. 50, No. 6, June 2007
  34. Schneider, P., Hayes, P., Horrocks, I. eds., "OWL Web Ontology Language Semantics and Abstract Syntax," W3C Recommen- dation, 10 February 2004
  35. Sheth, A., Aleman-Meza, B., Arpinar, I.B., Halaschek, C., and Ramakrishnan, C., "Semantic Association Identification and Knowledge Discovery for National Security Applications," Special Issue. Jour. Database Tech. Enhancing National Security, Vol. 16, No. 1, 2005, pp. 33-53