• Title/Summary/Keyword: PageRank 알고리즘

Search Result 38, Processing Time 0.024 seconds

Ranking Quality Evaluation of PageRank Variations (PageRank 변형 알고리즘들 간의 순위 품질 평가)

  • Pham, Minh-Duc;Heo, Jun-Seok;Lee, Jeong-Hoon;Whang, Kyu-Young
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.46 no.5
    • /
    • pp.14-28
    • /
    • 2009
  • The PageRank algorithm is an important component for ranking Web pages in Google and other search engines. While many improvements for the original PageRank algorithm have been proposed, it is unclear which variations (and their combinations) provide the "best" ranked results. In this paper, we evaluate the ranking quality of the well-known variations of the original PageRank algorithm and their combinations. In order to do this, we first classify the variations into link-based approaches, which exploit the link structure of the Web, and knowledge-based approaches, which exploit the semantics of the Web. We then propose algorithms that combine the ranking algorithms in these two approaches and implement both the variations and their combinations. For our evaluation, we perform extensive experiments using a real data set of one million Web pages. Through the experiments, we find the algorithms that provide the best ranked results from either the variations or their combinations.

Revisiting PageRank Computation: Norm-leak and Solution (페이지랭크 알고리즘의 재검토 : 놈-누수 현상과 해결 방법)

  • Kim, Sung-Jin;Lee, Sang-Ho
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.11 no.3
    • /
    • pp.268-274
    • /
    • 2005
  • Since introduction of the PageRank technique, it is known that it ranks web pages effectively In spite of its usefulness, we found a computational drawback, which we call norm-leak, that PageRank values become smaller than they should be in some cases. We present an improved PageRank algorithm that computes the PageRank values of the web pages correctly as well as its efficient implementation. Experimental results, in which over 67 million real web pages are used, are also presented.

Improved PageRank Algorithm Using Similarity Information of Documents (문서간의 유사도를 이용한 개선된 PageRank 알고리즘)

  • 이경희;김민구;박승규
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.10a
    • /
    • pp.169-171
    • /
    • 2003
  • 웹에서의 검색 방법에는 크게 Text-Based 기법과 Link-Based 기법이 있다. 본 논문은 그 중에서 Link-Based 기법의 하나인 PageRank 알고리즘에 대해 연구 하고자 한다. 이 PageRank 알고리즘은 각 페이지의 중요성을 수치로 계산하는 방법이다. 하지만 이 알고리즘에서는 페이지에서 페이지로 링크를 따라갈 확률의 값을 일정하게 주어서 모든 페이지의 값을 획일적으로 계산하였기 때문에 각 페이지의 검색 효율성에 문제가 있다고 판단하여, 이를 해결하고자 본 논문은 페이지사이의 유사도를 측정하여 유사도에 따라 링크를 따라가는 확률 값인 Damping factor값을 다르게 부여하여 검색의 효율성을 높였다. 이를 위하여 두 가지 방법의 실험을 통하여 구현, 증명하였다.

  • PDF

Implementation Techniques to Apply the PageRank Algorithm (페이지랭크 알고리즘 적용을 위한 구현 기술)

  • Kim, Sung-Jin;Lee, Sang-Ho;Bang, Ji-Hwan
    • The KIPS Transactions:PartD
    • /
    • v.9D no.5
    • /
    • pp.745-754
    • /
    • 2002
  • The Google search site (http://www.google.com), which was introduced in 1998, implemented the PageRank algorithm for the first time. PageRank is a ranking method based on the link structure of the Web pages. Even though PageRank has been implemented and being used in various commercial search engines, implementation details did not get documented well, primarily due to business reasons. Implementation techniques introduced in [4,8] are not sufficient to produce PageRank values of Web pages. This paper explains the techniques[4,8], and suggests major data structure and four implementation techniques in order to apply the PageRank algorithm. The paper helps understand the methods of applying PageRank algorithm by means of showing a real system that produces PageRank values of Web pages.

An Implementation of the Ranking Algorithm for Web Documents based on Link Analysis (링크 분석에 기반한 웹 문서 중요도 평가 알고리즘의 구현)

  • Lim, Sung-Chae
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2010.07a
    • /
    • pp.75-78
    • /
    • 2010
  • 웹 검색에는 기존의 정보검색(Information Retrieval) 시스템에서와 다르게 문서 간 하이퍼링크 정보를 바탕으로 각 웹 문서의 고유 중요도를 추정하는 방식이 자주 이용된다. 링크 분석에 기반한 알고리즘 중 PageRank 알고리즘은 구글의 웹 검색 서비스에 적용된 것으로 알려져 있다. 이런 PageRank 알고리즘에 따라 중요도를 계산하는 경우 색인된 웹 문서수가 증가함에 따라 계산에 필요한 CPU 자원의 사용도 함께 증가하며, 문서 수가 수 억 페이지에 달하면 하나의 서버에서는 계산을 수행할 수 없다는 문제가 있다. 본 논문에서는 이런 문제점을 해소하기 위해 여러 대의 서버를 PageRank 계산 용 클러스터로 사용할 수 있는 방법을 제시한다. 제시된 방법은 고속의 LAN을 이용하여 여러 대의 서버를 연결하고 반복적인 행렬 계산을 병렬로 수행할 수 있어 계산 시간을 단축시킬 수 있다. 이런 서버 클러스터 구현을 위해 멀티 쓰레딩 프로그램이 작성되었으며, PageRank 계산에 사용되는 행렬 데이터를 적은 양의 메모리만으로 표현 가능하도록 하였다.

  • PDF

An Unplugged Activity to Understand the PageRank Algorithm (PageRank 알고리즘을 이해하기 위한 언플러그드 활동)

  • Park, Youngki
    • Journal of The Korean Association of Information Education
    • /
    • v.22 no.4
    • /
    • pp.409-417
    • /
    • 2018
  • There are unplugged computer science activities for elementary school students to learn the concept of the Internet. However, these activities are not enough to teach the concept of the Web because they focus on teaching how the Internet works. Since the Web is the core technology of the Third Industrial Revolution, it needs to be understood as a basic common sense. In this paper, we developed an unplugged activity to understand the PageRank algorithm which is closely related to the web. The experimental results show that our unplugged activities behave similarly to the PageRank algorithm.

Patent citation network analysis (특허 인용 네트워크 분석)

  • Lee, Minjung;Kim, Yongdai;Jang, Woncheol
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.4
    • /
    • pp.613-625
    • /
    • 2016
  • The development of technology has changed the world drastically. Patent data analysis helps to understand modern technology trends and predict prospective future technology. In this paper, we analyze the patent citation network using the USPTO data between 1985 and 2012 to identify technology trends. We use network centrality measures that include a PageRank algorithm to find core technologies and identify groups of technology with similar properties with statistical network models.

A research on cyber target importance ranking using PageRank algorithm (PageRank 알고리즘을 활용한 사이버표적 중요성 순위 선정 방안 연구)

  • Kim, Kook-jin;Oh, Seung-hwan;Lee, Dong-hwan;Oh, Haeng-rok;Lee, Jung-sik;Shin, Dong-kyoo
    • Journal of Internet Computing and Services
    • /
    • v.22 no.6
    • /
    • pp.115-127
    • /
    • 2021
  • With the development of science and technology around the world, the realm of cyberspace, following land, sea, air, and space, is also recognized as a battlefield area. Accordingly, it is necessary to design and establish various elements such as definitions, systems, procedures, and plans for not only physical operations in land, sea, air, and space but also cyber operations in cyberspace. In this research, the importance of cyber targets that can be considered when prioritizing the list of cyber targets selected through intermediate target development in the target development and prioritization stage of targeting processing of cyber operations was selected as a factor to be considered. We propose a method to calculate the score for the cyber target and use it as a part of the cyber target prioritization score. Accordingly, in the cyber target prioritization process, the cyber target importance category is set, and the cyber target importance concept and reference item are derived. We propose a TIR (Target Importance Rank) algorithm that synthesizes parameters such as Event Prioritization Framework based on PageRank algorithm for score calculation and synthesis for each derived standard item. And, by constructing the Stuxnet case-based network topology and scenario data, a cyber target importance score is derived with the proposed algorithm, and the cyber target is prioritized to verify the proposed algorithm.

Journal PageRank Calculation in the Korean Science Citation Database (국내 인용 데이터베이스에서 저널 페이지랭크 측정 방안)

  • Lee, Jae-Yun
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.22 no.4
    • /
    • pp.361-379
    • /
    • 2011
  • This paper aims to propose the most appropriate method for calculating the journal PageRank in a domestic citation database. Korean journals show relatively high journal self-citation ratios and have many outgoing citations to external journals which are not included in the domestic citation database. Because the PageRank algorithm requires recursive calculation to converge, those two characteristics of domestic citation databases must be accounted for in order to measure the citation impact of Korean journals. Therefore, two PageRank calculation methods and four formulas for self-citation adjustment have been examined and tested for KSCD journals. The results of the correlation analysis and regression analysis show that the SCImago Journal Rank formula with the cr2 type self-citation adjustment method seems to be a more appropriate way to measure the relative impact of domestic journals in the Korean Science Citation Database.

C-rank: A Contribution-Based Approach for Web Page Ranking (C-rank: 웹 페이지 랭킹을 위한 기여도 기반 접근법)

  • Lee, Sang-Chul;Kim, Dong-Jin;Son, Ho-Yong;Kim, Sang-Wook;Lee, Jae-Bum
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.1
    • /
    • pp.100-104
    • /
    • 2010
  • In the past decade, various search engines have been developed to retrieve web pages that web surfers want to find from world wide web. In search engines, one of the most important functions is to evaluate and rank web pages for a given web surfer query. The prior algorithms using hyperlink information like PageRank incur the problem of 'topic drift'. To solve the problem, relevance propagation models have been proposed. However, these models suffer from serious performance degradation, and thus cannot be employed in real search engines. In this paper, we propose a new ranking algorithm that alleviates the topic drift problem and also provides efficient performance. Through a variety of experiments, we verify the superiority of the proposed algorithm over prior ones.