• Title/Summary/Keyword: search page

Search Result 180, Processing Time 0.029 seconds

Spamming page filtering algorithm using Web structure management management (Web Structure Management기법을 이용한 Spamming page filtering algorithm)

  • 신광섭;이우기;강석호
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.04b
    • /
    • pp.238-240
    • /
    • 2004
  • 정보 통신 기술의 발달로 엄청난 양의 정보가 World Wide Web을 통해 저장되고 공유된다. 특히, 사용자가 WWW을 이용하여 필요한 정보를 얻고자할 때, 가장 많이 사용되는 것이 Web search engine이다. 그러나 Web search engine의 algorithm 자체의 부정확성과 악의적으로 작성된 Web page로 인해 search engine 결과가 사용자의 요구와 일치하지 못하는 문제가 발생한다. 본 논문에서는 여러 Web search algorithm 중에서 Web structure management 기법을 중심으로 문제점을 분석하고 이를 해결할 수 있는 수정된 algorithm을 제시한다. 마지막으로 제시된 algorithm이 spamming page를 filtering하는 과정을 예시하여 논증한다.

  • PDF

Detecting Intentionally Biased Web Pages In terms of Hypertext Information (하이퍼텍스트 정보 관점에서 의도적으로 왜곡된 웹 페이지의 검출에 관한 연구)

  • Lee Woo Key
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.1 s.33
    • /
    • pp.59-66
    • /
    • 2005
  • The organization of the web is progressively more being used to improve search and analysis of information on the web as a large collection of heterogeneous documents. Most people begin at a Web search engine to find information. but the user's pertinent search results are often greatly diluted by irrelevant data or sometimes appear on target but still mislead the user in an unwanted direction. One of the intentional, sometimes vicious manipulations of Web databases is a intentionally biased web page like Google bombing that is based on the PageRank algorithm. one of many Web structuring techniques. In this thesis, we regard the World Wide Web as a directed labeled graph that Web pages represent nodes and link edges. In the Present work, we define the label of an edge as having a link context and a similarity measure between link context and target page. With this similarity, we can modify the transition matrix of the PageRank algorithm. By suggesting a motivating example, it is explained how our proposed algorithm can filter the Web intentionally biased web Pages effective about $60\%% rather than the conventional PageRank.

  • PDF

Design of Semantic Search System for the Search of Duplicated Geospatial Projects (공간정보사업의 중복사업 검색을 위한 의미기반검색 시스템의 설계)

  • Park, Sangun;Lim, Jay Ick;Kang, Juyoung
    • Journal of Information Technology Services
    • /
    • v.12 no.3
    • /
    • pp.389-404
    • /
    • 2013
  • Geospatial information, which is one of social overhead capital, is predicted as a core growing industry for the future. The production of geospatial information requires a huge budget, so it is very important objective of the policy for geospatial information to prevent the duplication of geospatial projects. In this paper, we proposed a semantic search system which extracts possible duplication of geospatial projects by using ontology for geospatial project administration. In order to achieve our goal, we suggested how to construct and utilize geospatial project ontology, and designed the architecture and process of the semantic search. Moreover, we showed how the suggested semantic search works with a duplicated projects search scenario. The suggested system enables a nonprofessional can easily search for duplicated projects, therefore we expect that our research contributes to effective and efficient duplication review process for geospatial projects.

Layout Analysis for Calculation of Web Page Similarity as Image

  • Mitsuhashi, Noriaki;Yamaguchi, Toru;Takama, Yasufumi
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.142-145
    • /
    • 2003
  • When we search information on the Web using search engines, they only analyze the text information collected from the source files of Web pages. However, there is a limit to analyze the layout of a Web page only from its source file, although Web page design is the most important factor for a user to estimate a page. In particular it often happens on the Web that the pages of similar design ofter similar information. We propose a method to analyze layout for comparing the design of pages by treating the displayed page as image.

  • PDF

GAN-based Automated Generation of Web Page Metadata for Search Engine Optimization (검색엔진 최적화를 위한 GAN 기반 웹사이트 메타데이터 자동 생성)

  • An, Sojung;Lee, O-jun;Lee, Jung-Hyeon;Jung, Jason J.;Yong, Hwan-Sung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2019.05a
    • /
    • pp.79-82
    • /
    • 2019
  • This study aims to design and implement automated SEO tools that has applied the artificial intelligence techniques for search engine optimization (SEO; Search Engine Optimization). Traditional Search Engine Optimization (SEO) on-page optimization show limitations that rely only on knowledge of webpage administrators. Thereby, this paper proposes the metadata generation system. It introduces three approaches for recommending metadata; i) Downloading the metadata which is the top of webpage ii) Generating terms which is high relevance by using bi-directional Long Short Term Memory (LSTM) based on attention; iii) Learning through the Generative Adversarial Network (GAN) to enhance overall performance. It is expected to be useful as an optimizing tool that can be evaluated and improve the online marketing processes.

  • PDF

A firmware base address search technique based on MIPS architecture using $gp register address value and page granularity

  • Seok-Joo, Mun;Young-Ho, Sohn
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.2
    • /
    • pp.1-7
    • /
    • 2023
  • In this paper, we propose a base address candidate selection method using the $gp register and page granularity as a way to build a static analysis environment for firmware based on MIPS architecture. As a way to shorten the base address search time, which is a disadvantage of the base address candidate selection method through inductive reasoning in existing studies, this study proposes a method to perform page-level search based on the $gp register in the existing base address candidate selection method as a reference point for search. Then, based on the proposed method, a base address search tool is implemented and a static analysis environment is constructed to prove the validity of the target tool. The results show that the proposed method is faster than the existing candidate selection method through inductive reasoning.

Customized Web Search Rank Provision (개인화된 웹 검색 순위 생성)

  • Kang, Youngki;Bae, Joonsoo
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.39 no.2
    • /
    • pp.119-128
    • /
    • 2013
  • Most internet users utilize internet portal search engines, such as Naver, Daum and Google nowadays. But since the results of internet portal search engines are based on universal criteria (e.g. search frequency by region or country), they do not consider personal interests. Namely, current search engines do not provide exact search results for homonym or polysemy because they try to serve universal users. In order to solve this problem, this research determines keyword importance and weight value for each individual search characteristics by collecting and analyzing customized keyword at external database. The customized keyword weight values are integrated with search engine results (e.g. PageRank), and the search ranks are rearranged. Using 50 web pages of Goolge search results for experiment and 6 web pages for customized keyword collection, the new customized search results are proved to be 90% match. Our personalization approach is not the way that users enter preference directly, but the way that system automatically collects and analyzes personal information and then reflects them for customized search results.

An Improved Approach to Ranking Web Documents

  • Gupta, Pooja;Singh, Sandeep K.;Yadav, Divakar;Sharma, A.K.
    • Journal of Information Processing Systems
    • /
    • v.9 no.2
    • /
    • pp.217-236
    • /
    • 2013
  • Ranking thousands of web documents so that they are matched in response to a user query is really a challenging task. For this purpose, search engines use different ranking mechanisms on apparently related resultant web documents to decide the order in which documents should be displayed. Existing ranking mechanisms decide on the order of a web page based on the amount and popularity of the links pointed to and emerging from it. Sometime search engines result in placing less relevant documents in the top positions in response to a user query. There is a strong need to improve the ranking strategy. In this paper, a novel ranking mechanism is being proposed to rank the web documents that consider both the HTML structure of a page and the contextual senses of keywords that are present within it and its back-links. The approach has been tested on data sets of URLs and on their back-links in relation to different topics. The experimental result shows that the overall search results, in response to user queries, are improved. The ordering of the links that have been obtained is compared with the ordering that has been done by using the page rank score. The results obtained thereafter shows that the proposed mechanism contextually puts more related web pages in the top order, as compared to the page rank score.

Ranking Quality Evaluation of PageRank Variations (PageRank 변형 알고리즘들 간의 순위 품질 평가)

  • Pham, Minh-Duc;Heo, Jun-Seok;Lee, Jeong-Hoon;Whang, Kyu-Young
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.46 no.5
    • /
    • pp.14-28
    • /
    • 2009
  • The PageRank algorithm is an important component for ranking Web pages in Google and other search engines. While many improvements for the original PageRank algorithm have been proposed, it is unclear which variations (and their combinations) provide the "best" ranked results. In this paper, we evaluate the ranking quality of the well-known variations of the original PageRank algorithm and their combinations. In order to do this, we first classify the variations into link-based approaches, which exploit the link structure of the Web, and knowledge-based approaches, which exploit the semantics of the Web. We then propose algorithms that combine the ranking algorithms in these two approaches and implement both the variations and their combinations. For our evaluation, we perform extensive experiments using a real data set of one million Web pages. Through the experiments, we find the algorithms that provide the best ranked results from either the variations or their combinations.

A Reranking Method Using Query Expansion and PageRank Check (페이지 랭크지수와 질의 확장을 이용한 재랭킹 방법)

  • Kim, Tae-Hwan;Jeon, Ho-Chul;Choi, Joong-Min
    • The KIPS Transactions:PartB
    • /
    • v.18B no.4
    • /
    • pp.231-240
    • /
    • 2011
  • Many search algorithms have been implemented by many researchers on the world wide web. One of the best algorithms is Google using PageRank technology. PageRank approach computes the number of inlink of each documents then ranks documents in the order of inlink members. But it is difficult to find the results that user needs, because this method find documents not valueable for a person but valueable for the public. To solve this problem, We use the WordNet for analysis of the user's query history. This paper proposes a personalized search engine using the user's query history and PageRank Check. We compared the performance of the proposed approaches with google search results in the top 30. As a result, the average of the r-precision for the proposed approaches is about 60% and it is better as about 14%.