• Title/Summary/Keyword: The PageRank Algorithm

Search Result 41, Processing Time 0.025 seconds

The Effective Blog Search Algorithm based on the Structural Features in the Blogspace (블로그의 구조적 특성을 고려한 효율적인 블로그 검색 알고리즘)

  • Kim, Jung-Hoon;Yoon, Tae-Bok;Lee, Jee-Hyong
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.7
    • /
    • pp.580-589
    • /
    • 2009
  • Today, most web pages are being created in the blogspace or evolving into the blogspace. A blog entry (blog page) includes non-traditional features of Web pages, such as trackback links, bloggers' authority, tags, and comments. Thus, the traditional rank algorithms are not proper to evaluate blog entries because those algorithms do not consider the blog specific features. In this paper, a new algorithm called "Blog-Rank" is proposed. This algorithm ranks blog entries by calculating bloggers' reputation scores, trackback scores, and comment scores based on the features of the blog entries. This algorithm is also applied to searching for information related to the users' queries in the blogspace. The experiment shows that it finds the much more relevant information than the traditional ranking algorithms.

User Reputation Evaluation Using Co-occurrence Feature and Collective Intelligence (동시출현 자질과 집단 지성을 이용한 지식검색 문서 사용자 명성 평가)

  • Lee, Hyun-Woo;Han, Yo-Sub;Kim, Lae-Hyun;Cha, Jeong-Won
    • Korean Journal of Cognitive Science
    • /
    • v.19 no.4
    • /
    • pp.459-476
    • /
    • 2008
  • The user needs to find the answer to your question is growing fast at the service using collective intelligent knowledge. In the previous researches, it was proven that the non-text information like view counting, referrer number, and number of answer is good in evaluating answers. There were also many works about evaluating answers using the various kinds of word dictionaries. In this work, we propose new method to evaluate answers to question effectively using user reputation that estimated by the social activity. We use a modified PageRank algorithm for estimating user reputation. We also use the similarity between question and answer. From the result of experiment in the Naver GisikiN corpus, we can see that the proposed method gives meaningful performance to complement the answer selection rate.

  • PDF

A Query Randomizing Technique for breaking 'Filter Bubble'

  • Joo, Sangdon;Seo, Sukyung;Yoon, Youngmi
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.12
    • /
    • pp.117-123
    • /
    • 2017
  • The personalized search algorithm is a search system that analyzes the user's IP, cookies, log data, and search history to recommend the desired information. As a result, users are isolated in the information frame recommended by the algorithm. This is called 'Filter bubble' phenomenon. Most of the personalized data can be deleted or changed by the user, but data stored in the service provider's server is difficult to access. This study suggests a way to neutralize personalization by keeping on sending random query words. This is to confuse the data accumulated in the server while performing search activities with words that are not related to the user. We have analyzed the rank change of the URL while conducting the search activity with 500 random query words once using the personalized account as the experimental group. To prove the effect, we set up a new account and set it as a control. We then searched the same set of queries with these two accounts, stored the URL data, and scored the rank variation. The URLs ranked on the upper page are weighted more than the lower-ranked URLs. At the beginning of the experiment, the difference between the scores of the two accounts was insignificant. As experiments continue, the number of random query words accumulated in the server increases and results show meaningful difference.

Global Technical Knowledge Flow Analysis in Intelligent Information Technology : Focusing on South Korea (지능정보기술 분야에서의 글로벌 기술 지식 경쟁력 분석 : 한국을 중심으로)

  • Kwak, Gihyun;Yoon, Jungsub
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.1
    • /
    • pp.24-38
    • /
    • 2021
  • This study aims to measure Korea's global competitiveness in intelligent information technology, which is the core technology of the 4th industrial revolution. For analysis, we collect patents of each field and prior patents cited by them, which are applied at the U.S. Patent Office (USPTO) between 2010 and 2018 from PATSTAT Online. A global knowledge transfer network was established by grouping citing- and cited-relationships at a national level. The in-degree centrality is used to evaluate technology acceptance, which indicates the process of absorbing existing technological knowledge to create new knowledge in each field. Second, to evaluate the impact of existing technological knowledge on the creation of new one, the out-degree centrality is investigated. Third, we apply the PageRank algorithm to qualitatively and quantitatively investigate the importance of the relationships between countries. As a result, it is confirmed through all the indicators that the AI sector is currently the least competitive.

A Research for Web Documents Genre Classification using STW (STW를 이용한 웹 문서 장르 분류에 관한 연구)

  • Ko, Byeong-Kyu;Oh, Kun-Seok;Kim, Pan-Koo
    • Journal of Information Technology and Architecture
    • /
    • v.9 no.4
    • /
    • pp.413-422
    • /
    • 2012
  • Many researchers have been studied to reveal human natural language to let machine understand its meaning by text based, page rank based or more. Particularly, it has been considered that URL and HTML Tag information in web documents are attracting people' attention again to analyze huge amount of web document automatically. In this paper, we propose a STW (Semantic Term Weight) approach based on syntactic and linguistic structure of web documents in order to classify what genres are. For the evaluation, we analyzed more than 1,000 documents from 20-Genre-collection corpus for training the documents based on SVM algorithm. Afterwards, we tested KI-04 corpus to evaluate performance of our proposed method. This paper measured their accuracy by classifying them into an experiment using STW and one without u sing STW. As the results, the proposed STW based approach showed approximately 10.2% which Is higher than one without use of STW.

Efficient Internet Information Extraction Using Hyperlink Structure and Fitness of Hypertext Document (웹의 연결구조와 웹문서의 적합도를 이용한 효율적인 인터넷 정보추출)

  • Hwang Insoo
    • Journal of Information Technology Applications and Management
    • /
    • v.11 no.4
    • /
    • pp.49-60
    • /
    • 2004
  • While the World-Wide Web offers an incredibly rich base of information, organized as a hypertext it does not provide a uniform and efficient way to retrieve specific information. Therefore, it is needed to develop an efficient web crawler for gathering useful information in acceptable amount of time. In this paper, we studied the order in which the web crawler visit URLs to rapidly obtain more important web pages. We also developed an internet agent for efficient web crawling using hyperlink structure and fitness of hypertext documents. As a result of experiment on a website. it is shown that proposed agent outperforms other web crawlers using BackLink and PageRank algorithm.

  • PDF

Using PageRank Algorithm to Improve Coupling Metrics (페이지랭크 알고리즘을 이용한 결합도 척도의 개선)

  • Park, Cheol-Hyun;Ryu, Sung-Tae;Lee, Eun-Seok
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.04a
    • /
    • pp.1405-1408
    • /
    • 2011
  • 소프트웨어 품질 측정은 소프트웨어 공학의 필수적인 요소이다. 소프트웨어 품질 척도 중 하나인 결합도는 모듈간의 얼마나 강하게 연결되어있는지를 나타낸다. 결합도는 소프트웨어의 결함-경향성, 모듈화, 재사용성, 변경-경향성 등 다양한 목적으로 사용된다. 기존의 결합도 척도들은 메소드호출 횟수에 의해서 결정되는데, 이는 메소드의 가중치를 고려하지 않기 때문에 결합도를 정확히 측정 하지 못한다. 본 논문은 페이지랭크 알고리즘을 이용하여 메소드의 가중치를 측정하고, 이를 이용한 결합도 척도 개선 방법에 대해 제안한다. 본 논문의 유효성을 검증하기 위하여, 4 개의 오픈 소스 프로젝트를 대상으로 기존의 방법과 개선된 방법으로 결합도 척도 3 개를 측정하였다. 개선된 결합도 3 개는 유지보수의 척도로 사용되는 변경-경향성(Change-Proneness)과의 상관계수가 기존의 결합도 척도에 비하여 눈의 띄게 향상되었다. 따라서 개선된 결합도 척도는 소프트웨어 품질을 더 정확하게 측정할 수 있다.

A Distributed Vertex Rearrangement Algorithm for Compressing and Mining Big Graphs (대용량 그래프 압축과 마이닝을 위한 그래프 정점 재배치 분산 알고리즘)

  • Park, Namyong;Park, Chiwan;Kang, U
    • Journal of KIISE
    • /
    • v.43 no.10
    • /
    • pp.1131-1143
    • /
    • 2016
  • How can we effectively compress big graphs composed of billions of edges? By concentrating non-zeros in the adjacency matrix through vertex rearrangement, we can compress big graphs more efficiently. Also, we can boost the performance of several graph mining algorithms such as PageRank. SlashBurn is a state-of-the-art vertex rearrangement method. It processes real-world graphs effectively by utilizing the power-law characteristic of the real-world networks. However, the original SlashBurn algorithm displays a noticeable slowdown for large-scale graphs, and cannot be used at all when graphs are too large to fit in a single machine since it is designed to run on a single machine. In this paper, we propose a distributed SlashBurn algorithm to overcome these limitations. Distributed SlashBurn processes big graphs much faster than the original SlashBurn algorithm does. In addition, it scales up well by performing the large-scale vertex rearrangement process in a distributed fashion. In our experiments using real-world big graphs, the proposed distributed SlashBurn algorithm was found to run more than 45 times faster than the single machine counterpart, and process graphs that are 16 times bigger compared to the original method.

A Study on the Design and Implementation of System for Predicting Attack Target Based on Attack Graph (공격 그래프 기반의 공격 대상 예측 시스템 설계 및 구현에 대한 연구)

  • Kauh, Janghyuk;Lee, Dongho
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.16 no.1
    • /
    • pp.79-92
    • /
    • 2020
  • As the number of systems increases and the network size increases, automated attack prediction systems are urgently needed to respond to cyber attacks. In this study, we developed four types of information gathering sensors for collecting asset and vulnerability information, and developed technology to automatically generate attack graphs and predict attack targets. To improve performance, the attack graph generation method is divided into the reachability calculation process and the vulnerability assignment process. It always keeps up to date by starting calculations whenever asset and vulnerability information changes. In order to improve the accuracy of the attack target prediction, the degree of asset risk and the degree of asset reference are reflected. We refer to CVSS(Common Vulnerability Scoring System) for asset risk, and Google's PageRank algorithm for asset reference. The results of attack target prediction is displayed on the web screen and CyCOP(Cyber Common Operation Picture) to help both analysts and decision makers.

Industrial Technology Leak Detection System on the Dark Web (다크웹 환경에서 산업기술 유출 탐지 시스템)

  • Young Jae, Kong;Hang Bae, Chang
    • Smart Media Journal
    • /
    • v.11 no.10
    • /
    • pp.46-53
    • /
    • 2022
  • Today, due to the 4th industrial revolution and extensive R&D funding, domestic companies have begun to possess world-class industrial technologies and have grown into important assets. The national government has designated it as a "national core technology" in order to protect companies' critical industrial technologies. Particularly, technology leaks in the shipbuilding, display, and semiconductor industries can result in a significant loss of competitiveness not only at the company level but also at the national level. Every year, there are more insider leaks, ransomware attacks, and attempts to steal industrial technology through industrial spy. The stolen industrial technology is then traded covertly on the dark web. In this paper, we propose a system for detecting industrial technology leaks in the dark web environment. The proposed model first builds a database through dark web crawling using information collected from the OSINT environment. Afterwards, keywords for industrial technology leakage are extracted using the KeyBERT model, and signs of industrial technology leakage in the dark web environment are proposed as quantitative figures. Finally, based on the identified industrial technology leakage sites in the dark web environment, the possibility of secondary leakage is detected through the PageRank algorithm. The proposed method accepted for the collection of 27,317 unique dark web domains and the extraction of 15,028 nuclear energy-related keywords from 100 nuclear power patents. 12 dark web sites identified as a result of detecting secondary leaks based on the highest nuclear leak dark web sites.