Acknowledgement
This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the Grand Information Technology Research Center support program(IITP-2022-2016-0-00318) supervised by the IITP(Institute for Information & communications Technology Planning & Evaluation). Also, this research was supported by the BB21plus funded by Busan Metropolitan City and Busan Institute for Talent & Lifelong Education (BIT). Finally Thank you Ms. Bomin Kim(20180020@office.deu.ac.kr) for completing the grammar inspection of this paper.
References
- J. H. Kim and E. G. Kim, "WCTT: Web Crawling System based on HTML Document Formalization," Journal of the Korea Institute of Information and Communication Engineering, vol. 26, no. 4, pp. 495-502, Apr. 2022. https://doi.org/10.6109/JKIICE.2022.26.4.495
- Samsung Display Newsroom. Collect only the information you want! Leverage crawling and big data analytics [Internet]. Available: https://news.samsungdisplay.com/22907.
- DATA ON-AIR. Data collection methods and techniques [Internet]. Available: https://dataonair.or.kr/db-tech-reference/d-guide/data-practical/?mod=document&uid=378.
- Y. -R. Suh, K. P. Koh, and J. Lee, "An analysis of the change in media's reports and attitudes about face masks during the COVID-19 pandemic in South Korea: a study using Big Data latent dirichlet allocation (LDA) topic modelling," Journal of the Korea Institute of Information and Communication Engineering, vol. 25, no. 5, pp. 731-740, May 2021. https://doi.org/10.6109/JKIICE.2021.25.5.731
- C. Y. An, S. W. Moon, E. H. Shin, and H. Kim, "Study on Effective Web Services for Data Acquisition, Analysis, and Visualization," Journal of D-Culture Archives (JDCA), vol. 4, no. 2, pp. 113-122, Oct. 2021. https://doi.org/10.23089/JDCA.2021.4.2.008
- J. S. Han, J. S. Kim, I. B. Kim, and H. I. Lee, "Building Personal Blogs with Static Site Generators," in Proceeding of the Korea Contents Association Comprehensive Conference, Daejeon, Korea, pp. 475-476, 2021.
- J. S. Yoo, S. Y. Heo, and S. W. Park, "Forgery detection system of dynamic web page using snapshot," in Proceeding of the Korean Institute of Information Scientists and Engineers, Pyeongchang, Korea, pp. 1612-1614, 2019.
- KDB VELOG. Web Data Crawling [Internet]. Available: https://velog.io/@kimdukbae/.
- Cosmos Project. Python Basic: Python coroutine, coroutine [Internet]. Available: https://cosmosproject.tistory.com/474.
- 101 Help. 25 Best Free Web Crawler Tools [Internet]. Available: https://ko.101-help.com/25gaji-coegoyi-muryo-web-keurolreo-dogu-baa8db87e8/.
- Exmemory Tistory. A productive web crawler structure for collecting large amounts of data [Internet]. Available: https://exmemory.tistory.com/.
- ScrapeHero. How to Scrape Websites Without Getting Blocked [Internet]. Available: https://www.scrapehero.com/how-to-prevent-getting-blacklisted-while-scraping/.
- HACKERNOON. Web Scraping Tutorial with Python: Tips and Tricks [Internet]. Available: https://hackernoon.com/web-scraping-tutorial-with-python-tips-and-tricks-db070e70e071.
- C.-W. Na and B.-W. On, "A proposal on a proactive crawling approach with analysis of state-of-the-art web crawling algorithms," Journal of Internet Computing and Services, vol. 20, no. 3, pp. 43-59, Jun. 2019. https://doi.org/10.7472/JKSII.2019.20.3.43
- Tigercow Door Tistory. Multi-processing and Multi-threading Source [Internet]. Available: https://doorbw.tistory.com/205.
- Yeko90 Tistory. Learn-headless-and-multiple-function-with python-basic-selenium-addargument [Internet]. Available: https://yeko90.tistory.com.
- Apache Commons. A Universally Unique Identifier (UUID) [Internet]. Available: https://commons.apache.org/sandbox/commons-id/uuid.html.
- C. Li, C. Ding, and K. Shen, "Quantifying The Cost of Context Switch," in Proceedings of the 2007 workshop on Experimental computer science, San Diego: CA, USA, pp. 218, 2007.
- T. -S. Hur, J. -H. Kim, and S. -H. Baek, "Recruitment collector using multiple processes based on Python," in Proceedings of the Korean Society of Computer Information Conference, Jeju, Korea, pp. 229-230, 2019.