Asynchronous Web Crawling Algorithm

Won, Dong-Hyun;Park, Hyuk-Gyu;Kang, Yun-Jeong;Lee, Min-Hye;

Proceedings of the Korean Institute of Information and Commucation Sciences Conference (한국정보통신학회:학술대회논문집)

2022.10a
/
Pages.364-366
/
2022

The Korea Institute of Information and Commucation Engineering (한국정보통신학회)

Asynchronous Web Crawling Algorithm

링크 분석을 통한 비동기 웹 페이지 크롤링 알고리즘

Won, Dong-Hyun (Wonkwang University) ;
Park, Hyuk-Gyu (Wonkwang University) ;
Kang, Yun-Jeong (Wonkwang University) ;
Lee, Min-Hye (Wonkwang University)

원동현 (원광대학교) ;
박혁규 (원광대학교) ;
강윤정 (원광대학교) ;
이민혜 (원광대학교)

Published : 2022.10.03

PDF

Download PDF

⟨ Previous Next ⟩

Abstract

The web uses an asynchronous web method to provide various information having different processing speeds together. The asynchronous method has the advantage of being able to respond to other events even before the task is completed, but a typical crawler has difficulty collecting information provided asynchronously by collecting point-of-visit information on a web page. In addition, asynchronous web pages often do not change their web address even if the page content is changed, making it difficult to crawl. In this paper, we propose a web crawling algorithm considering asynchronous page movement by analyzing links in the web. With the proposed algorithm, it was possible to collect dictionary information on TTA terms that provide information asynchronously.

웹은 처리 속도가 다른 다양한 정보들을 함께 제공하기 위해 비동기식 웹 기술을 이용한다. 비동기 방식에서는 작업 완료 전에도 다른 이벤트에 응답할 수 있다는 장점이 있으나 일반적인 크롤러는 웹페이지의 방문 시점 정보를 수집함으로 비동기 방식으로 제공되는 정보를 수집하는 데 어려움이 있다. 또한 비동기식 웹 페이지는 페이지 내용이 변경되어도 웹 주소가 변하지 않는 경우도 많아 크롤링하는 데 어려움이 있다. 본 논문에서는 웹의 링크를 분석하여 비동기 방식 페이지 이동을 고려한 웹 크롤링 알고리즘을 제안한다. 제안한 알고리즘으로 비동기 방식으로 정보를 제공하는 TTA의 정보통신용어사전 정보를 수집할 수 있었다.

Proceedings of the Korean Institute of Information and Commucation Sciences Conference (한국정보통신학회:학술대회논문집)

Asynchronous Web Crawling Algorithm

링크 분석을 통한 비동기 웹 페이지 크롤링 알고리즘

Abstract

Keywords

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)