Development of Web Crawler for Archiving Web Resources

Kim, Kwang-Young;Lee, Won-Goo;Lee, Min-Ho;Yoon, Hwa-Mook;Shin, Sung-Ho;

doi:10.5392/JKCA.2011.11.9.009

한국콘텐츠학회논문지 (The Journal of the Korea Contents Association)

제11권9호
/
Pages.9-16
/
2011
/
1598-4877(pISSN)
/
2508-6723(eISSN)

한국콘텐츠학회 (The Korea Contents Association)

DOI QR Code

웹 자원 아카이빙을 위한 웹 크롤러 연구 개발

Development of Web Crawler for Archiving Web Resources

김광영 (한국과학기술정보연구원 정보기술연구실) ;
이원구 (한국과학기술정보연구원 정보기술연구실) ;
이민호 (한국과학기술정보연구원 정보기술연구실) ;
윤화묵 (한국과학기술정보연구원 정보기술연구실) ;
신성호 (한국과학기술정보연구원 정보기술연구실)

투고 : 2011.06.29
심사 : 2011.08.03
발행 : 2011.09.28

https://doi.org/10.5392/JKCA.2011.11.9.009 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

웹 자원은 아직 수집, 보존, 활용에 대한 방안이 없어서 일정 기간의 서비스가 끝나면 사라져 버리는 문제점이 있다. 이런 웹 자원들은 중요성에 관계없이 주기적 또는 비주기적으로 갱신되거나 소멸된다. 따라서 웹 자원을 수집하고 보존하기위한 웹 아카이빙 시스템이 요구되고 있다. 이러한 웹 자원들을 주기적으로 수집하기 위해서는 웹 아카이빙 전용 크롤러의 개발이 필요하다. 따라서 본 연구에서는 웹 자원의 아카이빙 수집을 위해서 사용되는 기존의 웹 크롤러의 장단점을 분석하고 이것을 이용하여 웹 정보자원을 수집하기 위한 가장 적합한 수집 도구 시스템을 연구하고 개발하였다.

There are no way of collection, preservation and utilization for web resources after the service is terminated and is gone. However, these Web resources, regardless of the importance of periodically or aperiodically updated or have been destroyed. Therefore, to collect and preserve Web resources Web archive is being emphasized. Web resources collected periodically in order to develop Web archiving crawlers only was required. In this study, from the collection of Web resources to be used for archiving existing web crawlers to analyze the strengths and weaknesses. We have developed web archiving systems for the best collection of web resources.

키워드

참고문헌

이성숙, "웹 아카이빙 도구에 관한 연구", 한국정보 관리학회 학술대회, 제5권, pp.185-193, 2005.
김유승, "공공기록물 관리에 관한 법률의 제정 의의와 개선방안", 한국기록관리학회지, 제8권, 제1호, pp.5-24, 2008.
B. Adrian, Archiving Website: a practical guide for information management professionals, facet publishing, 2006
차승준, 정준선, 이규철, "공공기관 웹기록물 아카이빙을 위한 웹 크롤러 연구 개발", 한국정보과학회, 제25권, 제2호, pp.1-15, 2009.
J. Hendler, "Science and the Semantic Web," Science 299(5606) pp.520-521, 2003. https://doi.org/10.1126/science.1078874
서혜란 "웹 아카이빙의 성과와 미래 전망", 한국비블리아학술발표 제10집, pp.7-25, 2004.
Bergman and K. Michael "The Deep Web: Surfacing Hidden Value," Journal of Electronic Publishing, Vol.7, No.1, 2001.
A. Ball, "WEB Archiving," Digital Curation Centre, UKOLN, University of Bath, 2010.
K. Terry, "The Digital Dark Ages?: Challenges in the Perservation of Electronic Information," International Preservation News No.17, pp.8-13, 1998.
K. H. Lee, "The State of the Art and Practice in Digital Preservation," Journal of Research of the national Institute of Standards and Technology Vol.107, No.1, pp.93-106, 2002. https://doi.org/10.6028/jres.107.010
P. M. Krister and A. Allan, "The Kulturarw Project - The Royal Swedish Web Archive," Electronic Library, Vol.16, No.2, pp.105-108, 1998. https://doi.org/10.1108/eb045623
http://crawler.archive.org
http://www.httrack.com
http://bibnum.bnf.fr/downloads/deeparc
http://www.projectcomputing.com/products/pageVault
http://www.gnu.org/oftware/wget
http://www.archive.org

피인용 문헌

Study of Analyzing Outcome of Building and Introducing System for Preserving Full-Text of e-Journal vol.2, pp.2, 2012, https://doi.org/10.5865/IJKCT.2012.2.2.005
Refresh Cycle Optimization for Web Crawlers vol.13, pp.6, 2013, https://doi.org/10.5392/JKCA.2013.13.06.030

한국콘텐츠학회논문지 (The Journal of the Korea Contents Association)

웹 자원 아카이빙을 위한 웹 크롤러 연구 개발

Development of Web Crawler for Archiving Web Resources

초록

키워드

참고문헌

피인용 문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)