Browse > Article
http://dx.doi.org/10.15207/JKCS.2022.13.04.001

Design and Implementation of Event-driven Real-time Web Crawler to Maintain Reliability  

Ahn, Yong-Hak (Division of Convergence AI, Hoseo University)
Publication Information
Journal of the Korea Convergence Society / v.13, no.4, 2022 , pp. 1-6 More about this Journal
Abstract
Real-time systems using web cralwing data must provide users with data from the same database as remote data. To do this, the web crawler repeatedly sends HTTP(HtypeText Transfer Protocol) requests to the remote server to see if the remote data has changed. This process causes network load on the crawling server and remote server, causing problems such as excessive traffic generation. To solve this problem, in this paper, based on user events, we propose a real-time web crawling technique that can reduce the overload of the network while securing the reliability of maintaining the sameness between the data of the crawling server and data from multiple remote locations. The proposed method performs a crawling process based on an event that requests unit data and list data. The results show that the proposed method can reduce the overhead of network traffic in existing web crawlers and secure data reliability. In the future, research on the convergence of event-based crawling and time-based crawling is required.
Keywords
Convergence; Real-time Web Crawler; Event-driven; Maintain Reliability of Data; Reduce HTTP Request;
Citations & Related Records
Times Cited By KSCI : 6  (Citation Analysis)
연도 인용수 순위
1 E. M. Park & J. H. Seo. (2019). A Study on Leadership Typology in Sports Leaders Based on Big Data Analysis. Journal of The Korean Convergence Society, 10(7), 191-198.
2 J. R. Paik. (2018). Classification of Web Search Engines and Necessity of a Hybrid Search Engine. Journal of Digital Contents Society, 19(4), 719-729.   DOI
3 Y. A. Kim, G. H. Kim, H. J. Kim & C. G. Kim. (2019). Design and Implemention of Real-time web Crawling distributed monitoring system. Journal of Convergence for Information Technology, 9(1), 45-53.   DOI
4 J. Y. Kim, H. S. Kim, C. Y. Jin, Y. M. Hwang, S. R. Kim & B. M. Kim. (2021). Implementation of Web-based Project Management System. Proceeding of Korean Institute of Information Technoloy Conference, (pp. 556-559).
5 H. J. Kim, J. Y. Lee & S. S. Shin. (2017). Multi-threaded Web Crawling Design using Queues. Journal of Convergence for Information Technology, 7(2), 43-51.   DOI
6 S. J. Kwon. (2017). A Study on the Server Framework for Multi-platform Simulation Network Game. Journal of Korea Game Society, 17(6), 165-171.   DOI
7 M. Y. Park, C. Y. Park & C. S. Lee. (2019). Performance comparison of Spring Framework and Node.js Framework(NestJS) in microservice. Proceedings of The Korean Institute of Information Scientists and Engineers Conference, (pp.287-289).
8 S. C. Moon & S. C. Noh. (2019). A Study of Quality-based Software Architecture Design Model under Web Application Development Environment. Journal of Information Security, 12(4), 115-122.
9 S. Y. Choo, Y. S. Hwang & S. J. Lee. (2021). Methods for Collecting Harmful Websites Using Web Crawling. Journal of Digital Forensics, 15(3), 127-138.   DOI
10 J. H. Kim & E. G. Kim. (2021). HTML Text Extraction Using Tag Path and Text Appearance Frequency. Journal of the Korea Institute of Information and Communication Engineering, 25(12), 1709-1715.   DOI
11 J. H. Choi, J. S. Park & M. S. Kim. (2014). Processing speed improvement of HTTP traffic classification based on hierarchical structure of signature. The Journal of Korean Institute of Communications and Information Sciences, 39(4), 191-199.
12 D. H. Han & Y. K. Lee. (2021). Design of action-based Web crawler structural configuration for multi-website management. KIISE Transactions on Computing Practices, 27(2), 98-103.   DOI
13 B. J. Jeon, K. H. Han & S. S. Shin. (2018). Door-Lock System to Detect and Transmit in Real Time according to External Shock Sensitivity. Journal of the Korea Convergence Society, 9(7), 9-16.   DOI
14 3rd Party Promise module. http://bluebirdjs.com/docs/getting-started.html
15 S. Y. Choi, A. S. Matteson & H. S. Lim. (2018). Utilizing local bilingual embeddings on Korean-English law data. Journal of the Korea Convergence Society, 9(10), 45-53.   DOI