• 제목/요약/키워드: Hub-Authority algorithm

검색결과 6건 처리시간 0.023초

World Wide Web을 위한 개선된 Threshold HITS 알고리즘 (Enhanced Threshold Algorithm for HITS on the World Wide Web)

  • 김혜민;김민구
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 2004년도 가을 학술발표논문집 Vol.31 No.2 (1)
    • /
    • pp.106-108
    • /
    • 2004
  • 링크 구조를 이용하는 대표적인 알고리즘인 HITS는 링크 정보를 이용하여 Authority와 Hub rating을 구하는 알고리즘이다. 그러나 HITS에서는 중요도와는 관계없이 단순히 링크만을 많이 갖는 page의 Authority와 Hub rating이 비정상적으로 높게 계산되는 문제점이 있어 이를 해결하기 위한 연구들이 있었다. 본 논문에서는 이러한 연구들의 결과를 개선시키기 위해 Authority와 Hub rating의 단순한 합이 아닌, 평균과 priority를 적용하였다. 정확도를 측정하는 실험을 통해 제안하는 알고리즘이 기존의 방법보다 우수한 성능을 나타냄을 알 수 있다.

  • PDF

웹의 연결구조로부터 Hub와 Authority를 효과적으로 도출하기 위한 상호강화모델의 확장 (An Extended Mutual Reinforcement Model for Finding Hubs and Authorities from Link Structures on the WWW)

  • 황인수
    • 한국경영과학회지
    • /
    • 제30권2호
    • /
    • pp.1-11
    • /
    • 2005
  • The network structures of a hyperlinked environment can be a rich source of information about the contents of the environment and it provides effective means for understanding it. Recently, there have been a number of algorithms proposed analyzing hypertext link structure so as to determine the best authorities for a given topic or query. In this paper, we review the algorithm of mutual reinforcement relationship for finding hubs and authorities from World Wide Web, and suggest SHA, a new approach for link-structure analysis, which uses the relationships among a set of relative authoritative pages, a set of hub pages, and a set of super hub pages.

연관규칙을 이용한 상품선택과 기대수익 예측 (Item Selection By Estimated Profit Ranking Based on Association Rule)

  • 황인수
    • Asia pacific journal of information systems
    • /
    • 제14권4호
    • /
    • pp.87-97
    • /
    • 2004
  • One of the most fundamental problems in business is ranking items with respect to profit based on historical transactions. The difficulty is that the profit of one item comes from its influence on the sales of other items as well as its own sales, and that there is no well-developed algorithm for estimating overall profit of selected items. In this paper, we developed a product network based on association rule and an algorithm for profit estimation and item selection using the estimated profit ranking(EPR). As a result of computer simulation, the suggested algorithm outperforms the individual approach and the hub-authority profit ranking algorithm.

시맨틱 웹 자원의 랭킹을 위한 알고리즘: 클래스중심 접근방법 (A Ranking Algorithm for Semantic Web Resources: A Class-oriented Approach)

  • 노상규;박현정;박진수
    • Asia pacific journal of information systems
    • /
    • 제17권4호
    • /
    • pp.31-59
    • /
    • 2007
  • We frequently use search engines to find relevant information in the Web but still end up with too much information. In order to solve this problem of information overload, ranking algorithms have been applied to various domains. As more information will be available in the future, effectively and efficiently ranking search results will become more critical. In this paper, we propose a ranking algorithm for the Semantic Web resources, specifically RDF resources. Traditionally, the importance of a particular Web page is estimated based on the number of key words found in the page, which is subject to manipulation. In contrast, link analysis methods such as Google's PageRank capitalize on the information which is inherent in the link structure of the Web graph. PageRank considers a certain page highly important if it is referred to by many other pages. The degree of the importance also increases if the importance of the referring pages is high. Kleinberg's algorithm is another link-structure based ranking algorithm for Web pages. Unlike PageRank, Kleinberg's algorithm utilizes two kinds of scores: the authority score and the hub score. If a page has a high authority score, it is an authority on a given topic and many pages refer to it. A page with a high hub score links to many authoritative pages. As mentioned above, the link-structure based ranking method has been playing an essential role in World Wide Web(WWW), and nowadays, many people recognize the effectiveness and efficiency of it. On the other hand, as Resource Description Framework(RDF) data model forms the foundation of the Semantic Web, any information in the Semantic Web can be expressed with RDF graph, making the ranking algorithm for RDF knowledge bases greatly important. The RDF graph consists of nodes and directional links similar to the Web graph. As a result, the link-structure based ranking method seems to be highly applicable to ranking the Semantic Web resources. However, the information space of the Semantic Web is more complex than that of WWW. For instance, WWW can be considered as one huge class, i.e., a collection of Web pages, which has only a recursive property, i.e., a 'refers to' property corresponding to the hyperlinks. However, the Semantic Web encompasses various kinds of classes and properties, and consequently, ranking methods used in WWW should be modified to reflect the complexity of the information space in the Semantic Web. Previous research addressed the ranking problem of query results retrieved from RDF knowledge bases. Mukherjea and Bamba modified Kleinberg's algorithm in order to apply their algorithm to rank the Semantic Web resources. They defined the objectivity score and the subjectivity score of a resource, which correspond to the authority score and the hub score of Kleinberg's, respectively. They concentrated on the diversity of properties and introduced property weights to control the influence of a resource on another resource depending on the characteristic of the property linking the two resources. A node with a high objectivity score becomes the object of many RDF triples, and a node with a high subjectivity score becomes the subject of many RDF triples. They developed several kinds of Semantic Web systems in order to validate their technique and showed some experimental results verifying the applicability of their method to the Semantic Web. Despite their efforts, however, there remained some limitations which they reported in their paper. First, their algorithm is useful only when a Semantic Web system represents most of the knowledge pertaining to a certain domain. In other words, the ratio of links to nodes should be high, or overall resources should be described in detail, to a certain degree for their algorithm to properly work. Second, a Tightly-Knit Community(TKC) effect, the phenomenon that pages which are less important but yet densely connected have higher scores than the ones that are more important but sparsely connected, remains as problematic. Third, a resource may have a high score, not because it is actually important, but simply because it is very common and as a consequence it has many links pointing to it. In this paper, we examine such ranking problems from a novel perspective and propose a new algorithm which can solve the problems under the previous studies. Our proposed method is based on a class-oriented approach. In contrast to the predicate-oriented approach entertained by the previous research, a user, under our approach, determines the weights of a property by comparing its relative significance to the other properties when evaluating the importance of resources in a specific class. This approach stems from the idea that most queries are supposed to find resources belonging to the same class in the Semantic Web, which consists of many heterogeneous classes in RDF Schema. This approach closely reflects the way that people, in the real world, evaluate something, and will turn out to be superior to the predicate-oriented approach for the Semantic Web. Our proposed algorithm can resolve the TKC(Tightly Knit Community) effect, and further can shed lights on other limitations posed by the previous research. In addition, we propose two ways to incorporate data-type properties which have not been employed even in the case when they have some significance on the resource importance. We designed an experiment to show the effectiveness of our proposed algorithm and the validity of ranking results, which was not tried ever in previous research. We also conducted a comprehensive mathematical analysis, which was overlooked in previous research. The mathematical analysis enabled us to simplify the calculation procedure. Finally, we summarize our experimental results and discuss further research issues.

기항 매력도를 고려한 세계 컨테이너 항만의 성과 평가 (Evaluating Global Container Ports' Performance Considering the Port Calls' Attractiveness)

  • 박병인
    • 한국항만경제학회지
    • /
    • 제38권3호
    • /
    • pp.105-131
    • /
    • 2022
  • 2019년 개선 이후에도 글로벌 컨테이너 항만시장의 성과를 평가하는 UNCTAD의 정기선해운연계지수(LSCI)는 사용이 제한적이다. 특히 정기선해운연계지수는 관계의 거리만을 기준으로 성과를 평가하기 때문에 기항 매력을 결합한 성과지수가 더 효율적일 것이다. 본 연구에서는 일본 Ocean-Commerce사의 2007, 2017, 2019년 데이터에 수정된 Huff 모델, 소셜 네트워크 분석의 허브-권한 알고리즘 및 고유벡터 중심성, 그리고 상관관계 분석을 사용하였다. 연구 결과는 다음과 같다: 첫째, 기항 매력도와 항만의 전반적 성과가 항상 일치하지는 않았다. 기항 매력도 분석에 따르면 부산은 10위권 안에 머물렀다. 더불어 우리나라의 다른 항만에 대한 기항 매력도도 분석 기간 중 낮은 수준에서 서서히 개선됐다. 둘째, 글로벌 컨테이너항은 일반적으로 항로별로 반입항과 반출항 역할로 장기 특화되어 있으며, 전 기간에 걸쳐 전문성을 유지하면서 성장하고 있다. 그러나 우리나라의 항만은 분석 시기마다 역할이 계속 바꿨다. 마지막으로 본 연구에서 제시한 기간별 항만물동량과 확장항만연계지수(Extended Port Connectivity Index, EPCI)는 0.77에서 0.85사이의 상관관계를 보였다. 비록 대서양 자료가 EPCI 분석에 제외되고 항만물동량 대신 선박의 처리능력을 사용하였지만 둘은 높은 상관관계를 보였다. 이러한 결과는 글로벌 항만을 평가하고 분석하는 데 도움이 될 것이다. 연구에 따르면 한국의 항만은 전문성을 유지하면서 성과를 향상하기 위한 장기 전략이 필요하다. 특히 항만의 바람직한 역할을 유지·발전시키기 위해서는 보완항과의 협력과 파트너십을 활용하고 보완항에 기항하는 선사들의 서비스를 유치하는 것이 바람직하다. 본 연구가 장기간에 걸친 많은 데이터와 방법론을 사용한 복잡한 분석을 수행하였지만, 전세계 항만 대상의 연구, 장기적 패널 분석, 기항 매력도 분석에 대한 과학적 매개변수 추정이 수행되면 연구의 완성도가 더욱 높여질 것이다.

폭소노미 사이트를 위한 랭킹 프레임워크 설계: 시맨틱 그래프기반 접근 (A Folksonomy Ranking Framework: A Semantic Graph-based Approach)

  • 박현정;노상규
    • Asia pacific journal of information systems
    • /
    • 제21권2호
    • /
    • pp.89-116
    • /
    • 2011
  • In collaborative tagging systems such as Delicious.com and Flickr.com, users assign keywords or tags to their uploaded resources, such as bookmarks and pictures, for their future use or sharing purposes. The collection of resources and tags generated by a user is called a personomy, and the collection of all personomies constitutes the folksonomy. The most significant need of the folksonomy users Is to efficiently find useful resources or experts on specific topics. An excellent ranking algorithm would assign higher ranking to more useful resources or experts. What resources are considered useful In a folksonomic system? Does a standard superior to frequency or freshness exist? The resource recommended by more users with mere expertise should be worthy of attention. This ranking paradigm can be implemented through a graph-based ranking algorithm. Two well-known representatives of such a paradigm are Page Rank by Google and HITS(Hypertext Induced Topic Selection) by Kleinberg. Both Page Rank and HITS assign a higher evaluation score to pages linked to more higher-scored pages. HITS differs from PageRank in that it utilizes two kinds of scores: authority and hub scores. The ranking objects of these pages are limited to Web pages, whereas the ranking objects of a folksonomic system are somewhat heterogeneous(i.e., users, resources, and tags). Therefore, uniform application of the voting notion of PageRank and HITS based on the links to a folksonomy would be unreasonable, In a folksonomic system, each link corresponding to a property can have an opposite direction, depending on whether the property is an active or a passive voice. The current research stems from the Idea that a graph-based ranking algorithm could be applied to the folksonomic system using the concept of mutual Interactions between entitles, rather than the voting notion of PageRank or HITS. The concept of mutual interactions, proposed for ranking the Semantic Web resources, enables the calculation of importance scores of various resources unaffected by link directions. The weights of a property representing the mutual interaction between classes are assigned depending on the relative significance of the property to the resource importance of each class. This class-oriented approach is based on the fact that, in the Semantic Web, there are many heterogeneous classes; thus, applying a different appraisal standard for each class is more reasonable. This is similar to the evaluation method of humans, where different items are assigned specific weights, which are then summed up to determine the weighted average. We can check for missing properties more easily with this approach than with other predicate-oriented approaches. A user of a tagging system usually assigns more than one tags to the same resource, and there can be more than one tags with the same subjectivity and objectivity. In the case that many users assign similar tags to the same resource, grading the users differently depending on the assignment order becomes necessary. This idea comes from the studies in psychology wherein expertise involves the ability to select the most relevant information for achieving a goal. An expert should be someone who not only has a large collection of documents annotated with a particular tag, but also tends to add documents of high quality to his/her collections. Such documents are identified by the number, as well as the expertise, of users who have the same documents in their collections. In other words, there is a relationship of mutual reinforcement between the expertise of a user and the quality of a document. In addition, there is a need to rank entities related more closely to a certain entity. Considering the property of social media that ensures the popularity of a topic is temporary, recent data should have more weight than old data. We propose a comprehensive folksonomy ranking framework in which all these considerations are dealt with and that can be easily customized to each folksonomy site for ranking purposes. To examine the validity of our ranking algorithm and show the mechanism of adjusting property, time, and expertise weights, we first use a dataset designed for analyzing the effect of each ranking factor independently. We then show the ranking results of a real folksonomy site, with the ranking factors combined. Because the ground truth of a given dataset is not known when it comes to ranking, we inject simulated data whose ranking results can be predicted into the real dataset and compare the ranking results of our algorithm with that of a previous HITS-based algorithm. Our semantic ranking algorithm based on the concept of mutual interaction seems to be preferable to the HITS-based algorithm as a flexible folksonomy ranking framework. Some concrete points of difference are as follows. First, with the time concept applied to the property weights, our algorithm shows superior performance in lowering the scores of older data and raising the scores of newer data. Second, applying the time concept to the expertise weights, as well as to the property weights, our algorithm controls the conflicting influence of expertise weights and enhances overall consistency of time-valued ranking. The expertise weights of the previous study can act as an obstacle to the time-valued ranking because the number of followers increases as time goes on. Third, many new properties and classes can be included in our framework. The previous HITS-based algorithm, based on the voting notion, loses ground in the situation where the domain consists of more than two classes, or where other important properties, such as "sent through twitter" or "registered as a friend," are added to the domain. Forth, there is a big difference in the calculation time and memory use between the two kinds of algorithms. While the matrix multiplication of two matrices, has to be executed twice for the previous HITS-based algorithm, this is unnecessary with our algorithm. In our ranking framework, various folksonomy ranking policies can be expressed with the ranking factors combined and our approach can work, even if the folksonomy site is not implemented with Semantic Web languages. Above all, the time weight proposed in this paper will be applicable to various domains, including social media, where time value is considered important.