• 제목/요약/키워드: Page Rank Algorithm

검색결과 41건 처리시간 0.03초

Malware Containment Using Weight based on Incremental PageRank in Dynamic Social Networks

  • Kong, Jong-Hwan;Han, Myung-Mook
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제9권1호
    • /
    • pp.421-433
    • /
    • 2015
  • Recently, there have been fast-growing social network services based on the Internet environment and web technology development, the prevalence of smartphones, etc. Social networks also allow the users to convey the information and news so that they have a great influence on the public opinion formed by social interaction among users as well as the spread of information. On the other hand, these social networks also serve as perfect environments for rampant malware. Malware is rapidly being spread because relationships are formed on trust among the users. In this paper, an effective patch strategy is proposed to deal with malicious worms based on social networks. A graph is formed to analyze the structure of a social network, and subgroups are formed in the graph for the distributed patch strategy. The weighted directions and activities between the nodes are taken into account to select reliable key nodes from the generated subgroups, and the Incremental PageRanking algorithm reflecting dynamic social network features (addition/deletion of users and links) is used for deriving the high influential key nodes. With the patch based on the derived key nodes, the proposed method can prevent worms from spreading over social networks.

PageRank 특징을 활용한 RDP기반 내부전파경로 탐지 및 SHAP를 이용한 설명가능한 시스템 (RDP-based Lateral Movement Detection using PageRank and Interpretable System using SHAP)

  • 윤지영;김동욱;신건윤;김상수;한명묵
    • 인터넷정보학회논문지
    • /
    • 제22권4호
    • /
    • pp.1-11
    • /
    • 2021
  • 인터넷이 발달함에 따라 다양하고 복잡한 사이버공격들이 등장하기 시작했다. 공격들을 방어하기 위해 네트워크 외부에서 다양한 방식의 탐지 시스템들이 활용되었으나 내부에서 공격자를 탐지하는 시스템 및 연구는 현저히 드물어 내부에 들어온 공격자를 탐지하지 못해 큰 문제를 야기하기도 했다. 이를 해결하고자 공격자의 움직임을 추적하고 탐지하는 내부전파경로 탐지 시스템에 대한 연구가 등장하기 시작했다. 특히 그중에서도 Remote Desktop Protocol(RDP) 내 특징을 추출해 탐지하는 방식은 간편하면서도 매우 좋은 결과를 나타내었다. 하지만 그럼에도 불구하고 이전 연구들은 각 로그온 된 노드들 자체의 영향 및 관계성을 고려하지 않았으며, 제시된 특징 또한 일부 모델에서는 떨어지는 결과를 제공하기도 했다. 또한 왜 그렇게 판단했는지 판단에 대해 설명하지 못한다는 문제점도 존재했다. 이는 결과적으로 모델의 신뢰성 및 견고성 문제를 야기하게 된다. 이를 해결하기 위해 본 연구에서는 PageRank 특징을 활용한 RDP기반 내부전파경로 탐지 및 SHAP를 이용한 설명가능한 시스템을 제안한다. 페이지랭크 알고리즘과 여러 통계적인 기법을 활용해 여러 모델에서 활용 가능한 특징들을 생성하고 SHAP을 활용해 모델 예측에 대한 설명을 제공한다. 본 연구에서는 이전 연구에 비해 대부분의 모델에서 더 높은 성능을 보여주는 특징을 생성했고 이를 SHAP을 이용해 효과적으로 증명했다.

데이터 리터러시 연구 분야의 주경로와 지적구조 분석 (Analyzing the Main Paths and Intellectual Structure of the Data Literacy Research Domain)

  • 이재윤
    • 정보관리학회지
    • /
    • 제40권4호
    • /
    • pp.403-428
    • /
    • 2023
  • 이 연구에서는 데이터 리터러시 분야 연구의 발전 경로와 지적구조 및 떠오르는 유망 주제를 파악하고자 하였다. 이를 위해서 Web of Science에서 검색한 데이터 리터러시 관련 논문은 교육학 분야와 문헌정보학 분야 논문이 전체의 60% 가까이를 차지하였다. 우선 인용 네트워크 분석에서는 페이지랭크 알고리즘을 사용해서 인용 영향력이 높은 다양한 주제의 핵심 논문을 파악하였다. 데이터 리터러시 연구의 발전 경로를 파악하기 위해서 기존의 주경로분석법을 적용해보았으나 교육학 분야의 연구 논문만 포함되는 한계가 있었다. 이를 극복할 수 있는 새로운 기법으로 페이지랭크 주경로분석법을 개발한 결과, 교육학 분야와 문헌정보학 분야의 핵심 논문이 모두 포함되는 발전 경로를 파악할 수 있었다. 데이터 리터러시 연구의 지적구조를 분석하기 위해서 키워드 서지결합 분석을 시행하였다. 도출된 키워드 서지결합 네트워크의 세부 구조와 군집 파악을 위해서 병렬최근접이웃클러스터링 알고리즘을 적용한 결과 대군집 2개와 그에 속한 소군집 7개를 파악할 수 있었다. 부상하는 유망 주제를 도출하기 위해서 각 키워드와 군집의 성장지수와 평균출판년도를 측정하였다. 분석 결과 팬데믹 상황과 AI 챗봇의 부상이라는 시대적 배경 하에서 사회정의를 위한 비판적 데이터 리터러시가 고등교육 측면에서 급부상하고 있는 것으로 나타났다. 또한 이 연구에서 연구의 발전경로를 파악하는 수단으로 새롭게 개발한 페이지랭크 주경로분석 기법은 서로 다른 영역에서 병렬적으로 발전하는 둘 이상의 연구흐름을 발견하기에 효과적이었다.

개인별 유전자 네트워크 구축 및 페이지랭크를 이용한 환자 특이적 암 유발 유전자 탐색 방법 (Cancer Patient Specific Driver Gene Identification by Personalized Gene Network and PageRank)

  • 정희원;박지우;안재균
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제10권12호
    • /
    • pp.547-554
    • /
    • 2021
  • 암을 유발하는 유전자는 모든 암 환자에게 공통적인 것은 아니며, 이러한 환자 특이적 암 유발 유전자의 탐색은 개인 맟춤형 암 치료 및 항암제 개발에 있어서 매우 중요하다. 환자 특이적 암 유발 유전자를 찾기 위한 생물 정보학 연구들이 있어왔지만, 아직 정확도 면에서는 발전의 여지가 있다. 본 논문에서는 환자 특이적 암 유발 유전자를 탐색하기 위하여 NPD (Network based Patient-specific Driver gene identification)라는 방법을 제안한다. NPD는 환자 특이적 유전자 네트워크를 구축하고, 여기에 수정된 PageRank 알고리즘을 적용하여 유전자에 점수를 부여한 후, 유전적 변이 데이터를 사용한 승률 계산 방법을 통하여 암 유발 유전자를 찾는 세 단계로 이루어진다. TCGA 데이터 베이스의 여섯 개의 암 데이터에 NPD를 적용한 결과, NPD가 기존의 환자 특이적 암 유발 유전자 탐색 방법들보다 전체적으로 높은 F1 점수를 보여줌을 확인할 수 있었다.

기술의 진보와 혁신, 그리고 사회변화: 특허빅데이터를 이용한 정량적 분석 (Innovation of technology and social changes - quantitative analysis based on patent big data)

  • 김용대;정상조;장원철;이종수
    • 응용통계연구
    • /
    • 제29권6호
    • /
    • pp.1025-1039
    • /
    • 2016
  • 본 논문에서는 특허빅데이터를 분석하여 기술적 혁신과 사회변화의 관계를 규명하는 다양한 방법에 대하여 소개를 한다. 특히, 미국특허청에 1985년부터 2015년까지 등록된 4백만개 이상의 특허자료를 분석하였다. 먼저, 특허법의 변천사를 살펴보고 특허법의 발전이 특허활동에 미차는 영향에 대해서 살펴보았다. 두 번째로는, 국가별 기술군별 등록특허수를 바탕으로 군집분석을 이용하여 기술혁신 패턴이 비슷한 국가들로 군집을 만들고 각 군집의 기술혁신 특징들을 살펴보았다. 세번째로는 특허간의 인용정보를 바탕으로 특허간의 네트워크를 구축하고 page-rank 알고리즘을 이용하여 주요특허를 탐지하는 방법을 설명하였다. 마지막으로, 정준상관분석을 이용하여 기술혁신과 사회변화와의 관계를 규명하였다.

Finding Top-k Answers in Node Proximity Search Using Distribution State Transition Graph

  • Park, Jaehui;Lee, Sang-Goo
    • ETRI Journal
    • /
    • 제38권4호
    • /
    • pp.714-723
    • /
    • 2016
  • Considerable attention has been given to processing graph data in recent years. An efficient method for computing the node proximity is one of the most challenging problems for many applications such as recommendation systems and social networks. Regarding large-scale, mutable datasets and user queries, top-k query processing has gained significant interest. This paper presents a novel method to find top-k answers in a node proximity search based on the well-known measure, Personalized PageRank (PPR). First, we introduce a distribution state transition graph (DSTG) to depict iterative steps for solving the PPR equation. Second, we propose a weight distribution model of a DSTG to capture the states of intermediate PPR scores and their distribution. Using a DSTG, we can selectively follow and compare multiple random paths with different lengths to find the most promising nodes. Moreover, we prove that the results of our method are equivalent to the PPR results. Comparative performance studies using two real datasets clearly show that our method is practical and accurate.

하둡 맵리듀스와 페이지 랭크를 이용한 서울시 대중 교통 인구 이동 분석 (Analysis of the population flow of public transportation in Seoul using Hadoop MapReduce and PageRank algorithm)

  • 백민석;오상윤
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2022년도 추계학술발표대회
    • /
    • pp.354-356
    • /
    • 2022
  • 소셜 네트워크 및 웹 데이터와 같은 대규모 그래프 데이터를 처리하기 위해 병렬 처리 기반의 기법들이 많이 사용되어 왔다. 본 연구에서는 그래프 형식의 대규모 교통 데이터를 하둡 맵리듀스를 이용하여 처리하는 효과적인 기법을 제안한다. 제안하는 방식에서는 도시의 유동 인구 흐름을 가중치로 고려할 수 있도록 Weighted PageRank 알고리즘을 기반으로 하는 병렬 그래프 알고리즘을 사용하며, 해당 알고리즘을 하둡 맵리듀스에 적용하여 주거 및 근무지 등의 지역을 분류하도록 결과를 분석하였다. 제안 기법을 통한 분석 결과를 기반으로 지역 간 유동 인구 그래프 데이터에서 각 도시의 영향력을 측정하는 페이지랭크, 하둡 맵리듀스 기반의 기법을 제시한다.

시맨틱 웹 자원의 랭킹을 위한 알고리즘: 클래스중심 접근방법 (A Ranking Algorithm for Semantic Web Resources: A Class-oriented Approach)

  • 노상규;박현정;박진수
    • Asia pacific journal of information systems
    • /
    • 제17권4호
    • /
    • pp.31-59
    • /
    • 2007
  • We frequently use search engines to find relevant information in the Web but still end up with too much information. In order to solve this problem of information overload, ranking algorithms have been applied to various domains. As more information will be available in the future, effectively and efficiently ranking search results will become more critical. In this paper, we propose a ranking algorithm for the Semantic Web resources, specifically RDF resources. Traditionally, the importance of a particular Web page is estimated based on the number of key words found in the page, which is subject to manipulation. In contrast, link analysis methods such as Google's PageRank capitalize on the information which is inherent in the link structure of the Web graph. PageRank considers a certain page highly important if it is referred to by many other pages. The degree of the importance also increases if the importance of the referring pages is high. Kleinberg's algorithm is another link-structure based ranking algorithm for Web pages. Unlike PageRank, Kleinberg's algorithm utilizes two kinds of scores: the authority score and the hub score. If a page has a high authority score, it is an authority on a given topic and many pages refer to it. A page with a high hub score links to many authoritative pages. As mentioned above, the link-structure based ranking method has been playing an essential role in World Wide Web(WWW), and nowadays, many people recognize the effectiveness and efficiency of it. On the other hand, as Resource Description Framework(RDF) data model forms the foundation of the Semantic Web, any information in the Semantic Web can be expressed with RDF graph, making the ranking algorithm for RDF knowledge bases greatly important. The RDF graph consists of nodes and directional links similar to the Web graph. As a result, the link-structure based ranking method seems to be highly applicable to ranking the Semantic Web resources. However, the information space of the Semantic Web is more complex than that of WWW. For instance, WWW can be considered as one huge class, i.e., a collection of Web pages, which has only a recursive property, i.e., a 'refers to' property corresponding to the hyperlinks. However, the Semantic Web encompasses various kinds of classes and properties, and consequently, ranking methods used in WWW should be modified to reflect the complexity of the information space in the Semantic Web. Previous research addressed the ranking problem of query results retrieved from RDF knowledge bases. Mukherjea and Bamba modified Kleinberg's algorithm in order to apply their algorithm to rank the Semantic Web resources. They defined the objectivity score and the subjectivity score of a resource, which correspond to the authority score and the hub score of Kleinberg's, respectively. They concentrated on the diversity of properties and introduced property weights to control the influence of a resource on another resource depending on the characteristic of the property linking the two resources. A node with a high objectivity score becomes the object of many RDF triples, and a node with a high subjectivity score becomes the subject of many RDF triples. They developed several kinds of Semantic Web systems in order to validate their technique and showed some experimental results verifying the applicability of their method to the Semantic Web. Despite their efforts, however, there remained some limitations which they reported in their paper. First, their algorithm is useful only when a Semantic Web system represents most of the knowledge pertaining to a certain domain. In other words, the ratio of links to nodes should be high, or overall resources should be described in detail, to a certain degree for their algorithm to properly work. Second, a Tightly-Knit Community(TKC) effect, the phenomenon that pages which are less important but yet densely connected have higher scores than the ones that are more important but sparsely connected, remains as problematic. Third, a resource may have a high score, not because it is actually important, but simply because it is very common and as a consequence it has many links pointing to it. In this paper, we examine such ranking problems from a novel perspective and propose a new algorithm which can solve the problems under the previous studies. Our proposed method is based on a class-oriented approach. In contrast to the predicate-oriented approach entertained by the previous research, a user, under our approach, determines the weights of a property by comparing its relative significance to the other properties when evaluating the importance of resources in a specific class. This approach stems from the idea that most queries are supposed to find resources belonging to the same class in the Semantic Web, which consists of many heterogeneous classes in RDF Schema. This approach closely reflects the way that people, in the real world, evaluate something, and will turn out to be superior to the predicate-oriented approach for the Semantic Web. Our proposed algorithm can resolve the TKC(Tightly Knit Community) effect, and further can shed lights on other limitations posed by the previous research. In addition, we propose two ways to incorporate data-type properties which have not been employed even in the case when they have some significance on the resource importance. We designed an experiment to show the effectiveness of our proposed algorithm and the validity of ranking results, which was not tried ever in previous research. We also conducted a comprehensive mathematical analysis, which was overlooked in previous research. The mathematical analysis enabled us to simplify the calculation procedure. Finally, we summarize our experimental results and discuss further research issues.

폭소노미 사이트를 위한 랭킹 프레임워크 설계: 시맨틱 그래프기반 접근 (A Folksonomy Ranking Framework: A Semantic Graph-based Approach)

  • 박현정;노상규
    • Asia pacific journal of information systems
    • /
    • 제21권2호
    • /
    • pp.89-116
    • /
    • 2011
  • In collaborative tagging systems such as Delicious.com and Flickr.com, users assign keywords or tags to their uploaded resources, such as bookmarks and pictures, for their future use or sharing purposes. The collection of resources and tags generated by a user is called a personomy, and the collection of all personomies constitutes the folksonomy. The most significant need of the folksonomy users Is to efficiently find useful resources or experts on specific topics. An excellent ranking algorithm would assign higher ranking to more useful resources or experts. What resources are considered useful In a folksonomic system? Does a standard superior to frequency or freshness exist? The resource recommended by more users with mere expertise should be worthy of attention. This ranking paradigm can be implemented through a graph-based ranking algorithm. Two well-known representatives of such a paradigm are Page Rank by Google and HITS(Hypertext Induced Topic Selection) by Kleinberg. Both Page Rank and HITS assign a higher evaluation score to pages linked to more higher-scored pages. HITS differs from PageRank in that it utilizes two kinds of scores: authority and hub scores. The ranking objects of these pages are limited to Web pages, whereas the ranking objects of a folksonomic system are somewhat heterogeneous(i.e., users, resources, and tags). Therefore, uniform application of the voting notion of PageRank and HITS based on the links to a folksonomy would be unreasonable, In a folksonomic system, each link corresponding to a property can have an opposite direction, depending on whether the property is an active or a passive voice. The current research stems from the Idea that a graph-based ranking algorithm could be applied to the folksonomic system using the concept of mutual Interactions between entitles, rather than the voting notion of PageRank or HITS. The concept of mutual interactions, proposed for ranking the Semantic Web resources, enables the calculation of importance scores of various resources unaffected by link directions. The weights of a property representing the mutual interaction between classes are assigned depending on the relative significance of the property to the resource importance of each class. This class-oriented approach is based on the fact that, in the Semantic Web, there are many heterogeneous classes; thus, applying a different appraisal standard for each class is more reasonable. This is similar to the evaluation method of humans, where different items are assigned specific weights, which are then summed up to determine the weighted average. We can check for missing properties more easily with this approach than with other predicate-oriented approaches. A user of a tagging system usually assigns more than one tags to the same resource, and there can be more than one tags with the same subjectivity and objectivity. In the case that many users assign similar tags to the same resource, grading the users differently depending on the assignment order becomes necessary. This idea comes from the studies in psychology wherein expertise involves the ability to select the most relevant information for achieving a goal. An expert should be someone who not only has a large collection of documents annotated with a particular tag, but also tends to add documents of high quality to his/her collections. Such documents are identified by the number, as well as the expertise, of users who have the same documents in their collections. In other words, there is a relationship of mutual reinforcement between the expertise of a user and the quality of a document. In addition, there is a need to rank entities related more closely to a certain entity. Considering the property of social media that ensures the popularity of a topic is temporary, recent data should have more weight than old data. We propose a comprehensive folksonomy ranking framework in which all these considerations are dealt with and that can be easily customized to each folksonomy site for ranking purposes. To examine the validity of our ranking algorithm and show the mechanism of adjusting property, time, and expertise weights, we first use a dataset designed for analyzing the effect of each ranking factor independently. We then show the ranking results of a real folksonomy site, with the ranking factors combined. Because the ground truth of a given dataset is not known when it comes to ranking, we inject simulated data whose ranking results can be predicted into the real dataset and compare the ranking results of our algorithm with that of a previous HITS-based algorithm. Our semantic ranking algorithm based on the concept of mutual interaction seems to be preferable to the HITS-based algorithm as a flexible folksonomy ranking framework. Some concrete points of difference are as follows. First, with the time concept applied to the property weights, our algorithm shows superior performance in lowering the scores of older data and raising the scores of newer data. Second, applying the time concept to the expertise weights, as well as to the property weights, our algorithm controls the conflicting influence of expertise weights and enhances overall consistency of time-valued ranking. The expertise weights of the previous study can act as an obstacle to the time-valued ranking because the number of followers increases as time goes on. Third, many new properties and classes can be included in our framework. The previous HITS-based algorithm, based on the voting notion, loses ground in the situation where the domain consists of more than two classes, or where other important properties, such as "sent through twitter" or "registered as a friend," are added to the domain. Forth, there is a big difference in the calculation time and memory use between the two kinds of algorithms. While the matrix multiplication of two matrices, has to be executed twice for the previous HITS-based algorithm, this is unnecessary with our algorithm. In our ranking framework, various folksonomy ranking policies can be expressed with the ranking factors combined and our approach can work, even if the folksonomy site is not implemented with Semantic Web languages. Above all, the time weight proposed in this paper will be applicable to various domains, including social media, where time value is considered important.

계량정보학분야의 협력연구 네트워크 및 문헌네트워크 분석 : 국가, 기관, 문헌단위 분석 (Collaboration Networks and Document Networks in Informetrics Research from 2001 to 2011: Finding Influential Nations, Institutions, Documents)

  • 이재윤;최상희
    • 정보관리학회지
    • /
    • 제30권1호
    • /
    • pp.179-191
    • /
    • 2013
  • 계량정보학자들이 학술논문을 통해 과학 연구 동향을 분석하기 시작한 이후 계량서지학, 과학계량학, 계량정보학, 웹계량학, 인용분석 등은 정보학의 주요 분야로 성장하였다. 계량정보학의 최근 동향을 분석하기 위하여 이 연구에서는 계량정보학 연구출판물을 기반으로 하여 이 분야 연구 발전에 기여한 국가, 기관, 논문을 파악하고자 네트워크 분석을 수행하였다. 데이터 수집을 위해서는 SCI 데이터베이스를 이용하였으며 2001년부터 2011년까지 출판된 논문을 대상으로 하였다. 분석기법으로 Pathfinder 네트워크 분석과 PNNC기법을 사용하고, 협력관계와 연구영향도를 측정하기 위한 지표로 PageRank와 h-index 기반의 지표들을 사용하였다. 협력연구네트워크에서 주요한 역할을 하는 국가는 미국과 영국인 것으로 조사되었으며 기관으로는 유럽의 암스테르담 대학과 루벤 카톨릭대학 그리고 미국의 인디아나 대학과 해군연구개발국이 기여를 하고 있는 것으로 나타났다. 개인 논문 수준에서는 PageRank와 single paper h-index 척도로 분석한 결과 Hirsch의 h-index 논문과 Ingwersen의 웹 영향력 지수 논문이 가장 영향력 있는 것으로 조사되었다.