• Title/Summary/Keyword: Heritrix

Search Result 3, Processing Time 0.013 seconds

A Study on Web Archiving Tools (웹 아카이빙 도구에 관한 연구)

  • Lee, Sung-Sook
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 2005.08a
    • /
    • pp.185-193
    • /
    • 2005
  • 이 연구에서는 웹 아카이빙의 활성화를 위한 기초자료를 제공하기 위하여, 웹 아카이빙 관련 프로젝트에서 사용한 도구들을 살펴보았고, 웹 아카이빙 전용 SW 중에서 하비스팅 도구인 NEDLIB Harvester와 Heritrix, 접근도구인 Wayback Machine과 NWA Toolset을 중심으로 특징과 주요 기능을 검토하였다.

  • PDF

Comparative Analysis of Web Archiving Tools (웹아카이빙 도구 비교분석 연구)

  • Kim, Heejung
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 2011.08a
    • /
    • pp.95-98
    • /
    • 2011
  • 디지털 자원의 장기보존을 위한 기법과 전략은 지속적인 관심 속에서 개발되어 오고 있다. 특히, 웹 자원에 대한 의존도가 증폭될수록 웹 아카이빙에 대한 중요성이 커지고 있다. 본 연구에서는 IIPC에서 제시하는 웹 아카이빙 체인의 네 단계에 해당하는 각 단계별 웹 아카이빙 툴과 그 특성을 살펴보았다. 대상이 되는 웹 아카이빙 도구는 총 9개로서, Heritrix, DeepArc, Web Curator Tool, NetarchiveSuite, BnFArcTools, Wayback, NutchWAX, WERA 그리고 Xinq 등이다.

  • PDF

Comparison of Web Crawler Performance for Web Record Management (원격수집 방식의 웹기록물 관리를 위한 웹수집기 성능 비교 연구)

  • Chang, Jinho;Kwon, Hyuksang;Lee, Kyumo;Choi, Dong Joon
    • The Korean Journal of Archival Studies
    • /
    • no.74
    • /
    • pp.155-186
    • /
    • 2022
  • As of 2022, the number of Internet sites for public institutions registered on the 'Government 24' website (www.gov.kr) of the Ministry of the Interior and Safety is 17,000. The direct transfer takes a lot of human and material resources and time between the records-producing institution and the records-management institution that manages websites as records. In addition, it is practically difficult for records management institutions to migrate and operate various software and application technologies required to run each website. A method of automatically collecting websites from a remote location using web crawler software is used domestically and abroad to overcome these practical limitations. This study compared the performance of the web crawler required to collect and manage public Internet websites as records remotely. The most suitable web crawler was selected through a step-by-step review of several web crawlers from previous studies and other literature. Several public agency websites were applied to compare the actual performance of the crawlers in the evaluation process. The study provides empirical and specific performance comparison information for organizations that need to choose a web crawler.