• Title/Summary/Keyword: Wed Crawler

Search Result 2, Processing Time 0.015 seconds

Design and Implementation of a High Performance Web Crawler (고성능 웹크롤러의 설계 및 구현)

  • 권성호;이영탁;김영준;이용두
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.8 no.4
    • /
    • pp.64-72
    • /
    • 2003
  • A Web crawler is an important Internet software technology used in a variety of Internet application software which includes search engines. As Internet continues to grow, implementations of high performance web crawlers are urgently demanded. In this paper, we study how to support dynamic scheduling for a multiprocess-based web crawler. For high peformance, web crawlers are usually based on multiprocess in their implementations. In these systems, crawl scheduling which manages the allocation of web pages to each process for loading is one of the important issues. In this paper, we identify issues which are important and challenging in the crawl scheduling. To address the issue, we propose a dynamic crawl scheduling framework and subsequently a system architecture for a web crawler with dynamic crawl scheduling support. And we analysed the behaviors of Web crawler. Based on the analysis result, we suggest the direction for the design of high performance Web crawler.

  • PDF

The Development of Automatic Collection Method to Collect Information Resources for Wed Archiving: With Focus on Disaster Safety Information (웹 아카이빙을 위한 정보자원의 자동수집방법 개발 - 재난안전정보를 중심으로 -)

  • Lee, Su Jin;Han, Hui Lyeong;Sim, Min Jeong;Won, Dong Hyun;Kim, Yong
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.17 no.4
    • /
    • pp.1-26
    • /
    • 2017
  • This study aims to provide the efficient sharing and utilization method of disasters scattered by each institution and develop automated collection algorithm using web crawler for disaster information in deep web accounts. To achieve these goals, this study analyzes the logical structure of the deep web and develops algorithms to collect the information. With the proposed automatic algorithm, it is expected that disaster management will be helped by sharing and utilizing disaster safety information.