• Title/Summary/Keyword: The PageRank Algorithm

Search Result 41, Processing Time 0.022 seconds

Implementation Techniques to Apply the PageRank Algorithm (페이지랭크 알고리즘 적용을 위한 구현 기술)

  • Kim, Sung-Jin;Lee, Sang-Ho;Bang, Ji-Hwan
    • The KIPS Transactions:PartD
    • /
    • v.9D no.5
    • /
    • pp.745-754
    • /
    • 2002
  • The Google search site (http://www.google.com), which was introduced in 1998, implemented the PageRank algorithm for the first time. PageRank is a ranking method based on the link structure of the Web pages. Even though PageRank has been implemented and being used in various commercial search engines, implementation details did not get documented well, primarily due to business reasons. Implementation techniques introduced in [4,8] are not sufficient to produce PageRank values of Web pages. This paper explains the techniques[4,8], and suggests major data structure and four implementation techniques in order to apply the PageRank algorithm. The paper helps understand the methods of applying PageRank algorithm by means of showing a real system that produces PageRank values of Web pages.

Ranking Quality Evaluation of PageRank Variations (PageRank 변형 알고리즘들 간의 순위 품질 평가)

  • Pham, Minh-Duc;Heo, Jun-Seok;Lee, Jeong-Hoon;Whang, Kyu-Young
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.46 no.5
    • /
    • pp.14-28
    • /
    • 2009
  • The PageRank algorithm is an important component for ranking Web pages in Google and other search engines. While many improvements for the original PageRank algorithm have been proposed, it is unclear which variations (and their combinations) provide the "best" ranked results. In this paper, we evaluate the ranking quality of the well-known variations of the original PageRank algorithm and their combinations. In order to do this, we first classify the variations into link-based approaches, which exploit the link structure of the Web, and knowledge-based approaches, which exploit the semantics of the Web. We then propose algorithms that combine the ranking algorithms in these two approaches and implement both the variations and their combinations. For our evaluation, we perform extensive experiments using a real data set of one million Web pages. Through the experiments, we find the algorithms that provide the best ranked results from either the variations or their combinations.

An Unplugged Activity to Understand the PageRank Algorithm (PageRank 알고리즘을 이해하기 위한 언플러그드 활동)

  • Park, Youngki
    • Journal of The Korean Association of Information Education
    • /
    • v.22 no.4
    • /
    • pp.409-417
    • /
    • 2018
  • There are unplugged computer science activities for elementary school students to learn the concept of the Internet. However, these activities are not enough to teach the concept of the Web because they focus on teaching how the Internet works. Since the Web is the core technology of the Third Industrial Revolution, it needs to be understood as a basic common sense. In this paper, we developed an unplugged activity to understand the PageRank algorithm which is closely related to the web. The experimental results show that our unplugged activities behave similarly to the PageRank algorithm.

Revisiting PageRank Computation: Norm-leak and Solution (페이지랭크 알고리즘의 재검토 : 놈-누수 현상과 해결 방법)

  • Kim, Sung-Jin;Lee, Sang-Ho
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.11 no.3
    • /
    • pp.268-274
    • /
    • 2005
  • Since introduction of the PageRank technique, it is known that it ranks web pages effectively In spite of its usefulness, we found a computational drawback, which we call norm-leak, that PageRank values become smaller than they should be in some cases. We present an improved PageRank algorithm that computes the PageRank values of the web pages correctly as well as its efficient implementation. Experimental results, in which over 67 million real web pages are used, are also presented.

Improved PageRank Algorithm Using Similarity Information of Documents (문서간의 유사도를 이용한 개선된 PageRank 알고리즘)

  • 이경희;김민구;박승규
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.10a
    • /
    • pp.169-171
    • /
    • 2003
  • 웹에서의 검색 방법에는 크게 Text-Based 기법과 Link-Based 기법이 있다. 본 논문은 그 중에서 Link-Based 기법의 하나인 PageRank 알고리즘에 대해 연구 하고자 한다. 이 PageRank 알고리즘은 각 페이지의 중요성을 수치로 계산하는 방법이다. 하지만 이 알고리즘에서는 페이지에서 페이지로 링크를 따라갈 확률의 값을 일정하게 주어서 모든 페이지의 값을 획일적으로 계산하였기 때문에 각 페이지의 검색 효율성에 문제가 있다고 판단하여, 이를 해결하고자 본 논문은 페이지사이의 유사도를 측정하여 유사도에 따라 링크를 따라가는 확률 값인 Damping factor값을 다르게 부여하여 검색의 효율성을 높였다. 이를 위하여 두 가지 방법의 실험을 통하여 구현, 증명하였다.

  • PDF

Detecting Intentionally Biased Web Pages In terms of Hypertext Information (하이퍼텍스트 정보 관점에서 의도적으로 왜곡된 웹 페이지의 검출에 관한 연구)

  • Lee Woo Key
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.1 s.33
    • /
    • pp.59-66
    • /
    • 2005
  • The organization of the web is progressively more being used to improve search and analysis of information on the web as a large collection of heterogeneous documents. Most people begin at a Web search engine to find information. but the user's pertinent search results are often greatly diluted by irrelevant data or sometimes appear on target but still mislead the user in an unwanted direction. One of the intentional, sometimes vicious manipulations of Web databases is a intentionally biased web page like Google bombing that is based on the PageRank algorithm. one of many Web structuring techniques. In this thesis, we regard the World Wide Web as a directed labeled graph that Web pages represent nodes and link edges. In the Present work, we define the label of an edge as having a link context and a similarity measure between link context and target page. With this similarity, we can modify the transition matrix of the PageRank algorithm. By suggesting a motivating example, it is explained how our proposed algorithm can filter the Web intentionally biased web Pages effective about $60\%% rather than the conventional PageRank.

  • PDF

Patent citation network analysis (특허 인용 네트워크 분석)

  • Lee, Minjung;Kim, Yongdai;Jang, Woncheol
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.4
    • /
    • pp.613-625
    • /
    • 2016
  • The development of technology has changed the world drastically. Patent data analysis helps to understand modern technology trends and predict prospective future technology. In this paper, we analyze the patent citation network using the USPTO data between 1985 and 2012 to identify technology trends. We use network centrality measures that include a PageRank algorithm to find core technologies and identify groups of technology with similar properties with statistical network models.

PageRank Algorithm Using Link Context (링크내역을 이용한 페이지점수법 알고리즘)

  • Lee, Woo-Key;Shin, Kwang-Sup;Kang, Suk-Ho
    • Journal of KIISE:Databases
    • /
    • v.33 no.7
    • /
    • pp.708-714
    • /
    • 2006
  • The World Wide Web has become an entrenched global medium for storing and searching information. Most people begin at a Web search engine to find information, but the user's pertinent search results are often greatly diluted by irrelevant data or sometimes appear on target but still mislead the user in an unwanted direction. One of the intentional, sometimes vicious manipulations of Web databases is Web spamming as Google bombing that is based on the PageRank algorithm, one of the most famous Web structuring techniques. In this paper, we regard the Web as a directed labeled graph that Web pages represent nodes and the corresponding hyperlinks edges. In the present work, we define the label of an edge as having a link context and a similarity measure between link context and the target page. With this similarity, we can modify the transition matrix of the PageRank algorithm. A motivating example is investigated in terms of the Singular Value Decomposition with which our algorithm can outperform to filter the Web spamming pages effectively.

An Implementation of the Ranking Algorithm for Web Documents based on Link Analysis (링크 분석에 기반한 웹 문서 중요도 평가 알고리즘의 구현)

  • Lim, Sung-Chae
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2010.07a
    • /
    • pp.75-78
    • /
    • 2010
  • 웹 검색에는 기존의 정보검색(Information Retrieval) 시스템에서와 다르게 문서 간 하이퍼링크 정보를 바탕으로 각 웹 문서의 고유 중요도를 추정하는 방식이 자주 이용된다. 링크 분석에 기반한 알고리즘 중 PageRank 알고리즘은 구글의 웹 검색 서비스에 적용된 것으로 알려져 있다. 이런 PageRank 알고리즘에 따라 중요도를 계산하는 경우 색인된 웹 문서수가 증가함에 따라 계산에 필요한 CPU 자원의 사용도 함께 증가하며, 문서 수가 수 억 페이지에 달하면 하나의 서버에서는 계산을 수행할 수 없다는 문제가 있다. 본 논문에서는 이런 문제점을 해소하기 위해 여러 대의 서버를 PageRank 계산 용 클러스터로 사용할 수 있는 방법을 제시한다. 제시된 방법은 고속의 LAN을 이용하여 여러 대의 서버를 연결하고 반복적인 행렬 계산을 병렬로 수행할 수 있어 계산 시간을 단축시킬 수 있다. 이런 서버 클러스터 구현을 위해 멀티 쓰레딩 프로그램이 작성되었으며, PageRank 계산에 사용되는 행렬 데이터를 적은 양의 메모리만으로 표현 가능하도록 하였다.

  • PDF

A research on cyber target importance ranking using PageRank algorithm (PageRank 알고리즘을 활용한 사이버표적 중요성 순위 선정 방안 연구)

  • Kim, Kook-jin;Oh, Seung-hwan;Lee, Dong-hwan;Oh, Haeng-rok;Lee, Jung-sik;Shin, Dong-kyoo
    • Journal of Internet Computing and Services
    • /
    • v.22 no.6
    • /
    • pp.115-127
    • /
    • 2021
  • With the development of science and technology around the world, the realm of cyberspace, following land, sea, air, and space, is also recognized as a battlefield area. Accordingly, it is necessary to design and establish various elements such as definitions, systems, procedures, and plans for not only physical operations in land, sea, air, and space but also cyber operations in cyberspace. In this research, the importance of cyber targets that can be considered when prioritizing the list of cyber targets selected through intermediate target development in the target development and prioritization stage of targeting processing of cyber operations was selected as a factor to be considered. We propose a method to calculate the score for the cyber target and use it as a part of the cyber target prioritization score. Accordingly, in the cyber target prioritization process, the cyber target importance category is set, and the cyber target importance concept and reference item are derived. We propose a TIR (Target Importance Rank) algorithm that synthesizes parameters such as Event Prioritization Framework based on PageRank algorithm for score calculation and synthesis for each derived standard item. And, by constructing the Stuxnet case-based network topology and scenario data, a cyber target importance score is derived with the proposed algorithm, and the cyber target is prioritized to verify the proposed algorithm.