• Title/Summary/Keyword: Web Search Rank

Search Result 43, Processing Time 0.029 seconds

Implementation Techniques to Apply the PageRank Algorithm (페이지랭크 알고리즘 적용을 위한 구현 기술)

  • Kim, Sung-Jin;Lee, Sang-Ho;Bang, Ji-Hwan
    • The KIPS Transactions:PartD
    • /
    • v.9D no.5
    • /
    • pp.745-754
    • /
    • 2002
  • The Google search site (http://www.google.com), which was introduced in 1998, implemented the PageRank algorithm for the first time. PageRank is a ranking method based on the link structure of the Web pages. Even though PageRank has been implemented and being used in various commercial search engines, implementation details did not get documented well, primarily due to business reasons. Implementation techniques introduced in [4,8] are not sufficient to produce PageRank values of Web pages. This paper explains the techniques[4,8], and suggests major data structure and four implementation techniques in order to apply the PageRank algorithm. The paper helps understand the methods of applying PageRank algorithm by means of showing a real system that produces PageRank values of Web pages.

'Hot Search Keyword' Rank-Change Prediction (인기 검색어의 순위 변화 예측)

  • Kim, Dohyeong;Kang, Byeong Ho;Lee, Sungyoung
    • Journal of KIISE
    • /
    • v.44 no.8
    • /
    • pp.782-790
    • /
    • 2017
  • The service, 'Hot Search Keywords', provides a list of the most hot search terms of different web services such as Naver or Daum. The service, bases the changes in rank of a specific search keyword on changes in its users' interest. This paper introduces a temporal modelling framework for predicting the rank change of hot search keywords using past rank data and machine learning. Past rank data shows that more than 70% of hot search keywords tend to disappear and reappear later. The authors processed missing rank value, using deletion, dummy variables, mean substitution, and expectation maximization. It is however crucial to calculate the optimal window size of the past rank data. We proposed an optimal window size selection approach based on the minimum amount of time a topic within the same or a differing context disappeared. The experiments were conducted with four different machine-learning techniques using the Naver, Daum, and Nate 'Hot Search Keywords' datasets, which were collected for 2 years.

A Study of Personalized Information Retrieval (개인화 정보 검색에 대한 연구)

  • Kim, Tae-Hwan;Jeon, Ho-Chul;Choi, Joong-Min
    • 한국HCI학회:학술대회논문집
    • /
    • 2008.02a
    • /
    • pp.683-687
    • /
    • 2008
  • Many search algorithms have been implemented by many researchers on the world wide web. One of the best algorithms is Google using PageRank technology, PageRank approach, computes the number of inlink of each documents then represents documents in order of many inlink. But It is difficult to find the results that user needs. Because this method finds documents not valueable for a person but valueable for public, this paper propose a personalized search engine mixed public with personal worth to solve this problem.

  • PDF

Effective Web Crawling Orderings from Graph Search Techniques (그래프 탐색 기법을 이용한 효율적인 웹 크롤링 방법들)

  • Kim, Jin-Il;Kwon, Yoo-Jin;Kim, Jin-Wook;Kim, Sung-Ryul;Park, Kun-Soo
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.37 no.1
    • /
    • pp.27-34
    • /
    • 2010
  • Web crawlers are fundamental programs which iteratively download web pages by following links of web pages starting from a small set of initial URLs. Previously several web crawling orderings have been proposed to crawl popular web pages in preference to other pages, but some graph search techniques whose characteristics and efficient implementations had been studied in graph theory community have not been applied yet for web crawling orderings. In this paper we consider various graph search techniques including lexicographic breadth-first search, lexicographic depth-first search and maximum cardinality search as well as well-known breadth-first search and depth-first search, and then choose effective web crawling orderings which have linear time complexity and crawl popular pages early. Especially, for maximum cardinality search and lexicographic breadth-first search whose implementations are non-trivial, we propose linear-time web crawling orderings by applying the partition refinement method. Experimental results show that maximum cardinality search has desirable properties in both time complexity and the quality of crawled pages.

Design of Advanced HITS Algorithm by Suitability for Importance-Evaluation of Web-Documents (웹 문서 중요도 평가를 위한 적합도 향상 HITS 알고리즘 설계)

  • 김분희;한상용;김영찬
    • The Journal of Society for e-Business Studies
    • /
    • v.8 no.2
    • /
    • pp.23-31
    • /
    • 2003
  • Link-based search engines generate the rank using linked information of related web-documents . HITS(Hypertext Internet Topic Search), representative ranking evaluation algorithm using a special feature of web-documents based on such link, evaluates the importance degree of related pages from linked information and presents by ranking information. Problem of such HITS algorithm only is considered the link frequency within documents and depends on the set of web documents as input value. In this paper, we design the search agent based on better HITS algorithm according to advanced suitability between query and search-result in the set of given documents from link-based web search engine. It then complements locality of advanced search performance and result.

  • PDF

Improving the Performance of Web Search using Query Types (질의유형에 기반한 웹 검색의 성능 향상)

  • Kang, In-Ho;An, Dong-Un
    • The KIPS Transactions:PartB
    • /
    • v.11B no.5
    • /
    • pp.537-544
    • /
    • 2004
  • The Web is rich with various sources of information. Due to the massive and heterogeneous web document collections, users want to find various types of target pages. Each type of information for Web search has designated queries. If a user query is not a designated query, then we cannot have good result documents. Different strategies are needed to utilize the goodness of each type of information for a search engine. If we know the property of information, then we can refine candidate pages and rank them delicately. Various experiments are conducted to show the properties of each type of information. Therefore, we show an appropriate combining formula to utilize the properties of each type of information. In addition, for a service finding task, we propose Service Link Information that utilizes the existence of mechanisms for a user interaction.

Improving the Quality of Web Spam Filtering by Using Seed Refinement (시드 정제 기술을 이용한 웹 스팸 필터링의 품질 향상)

  • Qureshi, Muhammad Atif;Yun, Tae-Seob;Lee, Jeong-Hoon;Whang, Kyu-Young
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.48 no.6
    • /
    • pp.123-139
    • /
    • 2011
  • Web spam has a significant influence on the ranking quality of web search results because it promotes unimportant web pages. Therefore, web search engines need to filter web spam. web spam filtering is a concept that identifies spam pages - web pages contributing to web spam. TrustRank, Anti-TrustRank, Spam Mass, and Link Farm Spam are well-known web spam filtering algorithms in the research literature. The output of these algorithms depends upon the input seed. Thus, refinement in the input seed may lead to improvement in the quality of web spam filtering. In this paper, we propose seed refinement techniques for the four well-known spam filtering algorithms. Then, we modify algorithms, which we call modified spam filtering algorithms, by applying these techniques to the original ones. In addition, we propose a strategy to achieve better quality for web spam filtering. In this strategy, we consider the possibility that the modified algorithms may support one another if placed in appropriate succession. In the experiments we show the effect of seed refinement. For this goal, we first show that our modified algorithms outperform the respective original algorithms in terms of the quality of web spam filtering. Then, we show that the best succession significantly outperforms the best known original and the best modified algorithms by up to 1.38 times within typical value ranges of parameters in terms of recall while preserving precision.

e-Cohesive Keyword based Arc Ranking Measure for Web Navigation (연관 웹 페이지 검색을 위한 e-아크 랭킹 메저)

  • Lee, Woo-Key;Lee, Byoung-Su
    • Journal of KIISE:Databases
    • /
    • v.36 no.1
    • /
    • pp.22-29
    • /
    • 2009
  • The World Wide Web has emerged as largest media which provides even a single user to market their products and publish desired information; on the other hand the user can access what kind of information abundantly enough as well. As a result web holds large amount of related information distributed over multiple web pages. The current search engines search for all the entered keywords in a single webpage and rank the resulting set of web pages as an answer to the user query. But this approach fails to retrieve the pair of web pages which contains more relevant information for users search. We introduce a new search paradigm which gives different weights to the query keywords according to their order of appearance. We propose a new arc weight measure that assigns more relevance to the pair of web pages with alternate keywords present so that the pair of web pages which contains related but distributed information can be presented to the user. Our measure proved to be effective on the similarity search in which the experimentation represented the e~arc ranking measure outperforming the conventional ones.

New Re-ranking Technique based on Concept-Network Profiles for Personalized Web Search (웹 검색 개인화를 위한 개념네트워크 프로파일 기반 순위 재조정 기법)

  • Kim, Han-Joon;Noh, Joon-Ho;Chang, Jae-Young
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.12 no.2
    • /
    • pp.69-76
    • /
    • 2012
  • This paper proposes a novel way of personalized web search through re-ranking the search results with user profiles of concept-network structure. Basically, personalized search systems need to be based on user profiles that contain users' search patterns, and they actively use the user profiles in order to expand initial queries or to re-rank the search results. The proposed method is a sort of a re-ranking personalized search method integrated with query expansion facility. The method identifies some documents which occur commonly among a set of different search results from the expanded queries, and re-ranks the search results by the degree of co-occurring. We show that the proposed method outperforms the conventional ones by performing the empirical web search with a number of actual users who have diverse information needs and query intents.

A Reranking Method Using Query Expansion and PageRank Check (페이지 랭크지수와 질의 확장을 이용한 재랭킹 방법)

  • Kim, Tae-Hwan;Jeon, Ho-Chul;Choi, Joong-Min
    • The KIPS Transactions:PartB
    • /
    • v.18B no.4
    • /
    • pp.231-240
    • /
    • 2011
  • Many search algorithms have been implemented by many researchers on the world wide web. One of the best algorithms is Google using PageRank technology. PageRank approach computes the number of inlink of each documents then ranks documents in the order of inlink members. But it is difficult to find the results that user needs, because this method find documents not valueable for a person but valueable for the public. To solve this problem, We use the WordNet for analysis of the user's query history. This paper proposes a personalized search engine using the user's query history and PageRank Check. We compared the performance of the proposed approaches with google search results in the top 30. As a result, the average of the r-precision for the proposed approaches is about 60% and it is better as about 14%.