Browse > Article
http://dx.doi.org/10.5391/JKIIS.2007.17.6.849

A Method of Efficient Web Crawling Using URL Pattern Scripts  

Chang, Moon-Soo (서경대학교 소프트웨어학과)
Jung, June-Young (서경대학교 소프트웨어학과)
Publication Information
Journal of the Korean Institute of Intelligent Systems / v.17, no.6, 2007 , pp. 849-854 More about this Journal
Abstract
It is difficult that we collect only target documents from the Innumerable Web documents. One of solution to the problem is that we select target documents on the Web site which services many documents of target domain. In this paper, we will propose an intelligent crawling method collecting needed documents based on URL pattern script defined by XML. Proposed crawling method will efficiently apply to the sites which service structuralized information of a piece with database. In this paper, we collected 50 thousand Web documents using our crawling method.
Keywords
Web Crawling; URL; Pattern Script; URL Filtering;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Tim Bemers-Lee, 'Enabling Standards & Technologies,' (http://www.w3.org/2002/Talks/04-sweb/slide12-0.html)
2 J. Cho, 'Efficient Crawling through URL ordering,' Computer Networks and ISDN Systems, Vol.30, pp. 161-172, 1998   DOI   ScienceOn
3 장문수, 강선미, '도메인지식의 계층화를 통한 온톨로지 인스턴스의 속성정보 추출', 퍼지및지능시스템학회 논문지, 17권 3호, pp. 291-296, 2007.6   과학기술학회마을   DOI
4 'The Web Robots FAQ', http://www.robotstxt.org/faq.html
5 김성진, 이상호, '웹 로봇 구현 및 한국 웹 통계보고,' 한국정보처리학회논문지, 제10권, 4호, pp. 509-518. 2003
6 강문수, 최영식, '대용량 분산 웹 크롤러', 한국인터넷정보학회 학술발표대회 논문집, 제6권 1호, pp. 185-188, 2005