Fig. 1. Web Crawling Blocking Problems of Three Types of Web Crawler
Fig. 2. The Structure of Distributed Web Crawler Using AWS
Fig. 3. The Process of Distributed Web Crawler Using AWS
Fig. 4. The Speed of Each Crawler
Table 1. Rank of Keyword Frequency based on ‘Gas Safety’
Table 2. Rank of Keyword Frequency based on ‘Gas Accident’
References
- S. Oh, J. M. Lee & Y. Y. Kim. (2017). A Study on the Job Satisfaction in the Smart Work Environment, Journal of the Korea Convergence Society, 8(11), 393-401. https://doi.org/10.15207/JKCS.2017.8.11.393
- H. Chen, R. H. L. Chiang & V. C. Storey. (2012). Business Intelligence and Analytics: From Big Data to Big Impact, MIS Quarterly, 36(4), 1165-1188. https://doi.org/10.2307/41703503
- A. De Mauro, M. Greco & M. Grimaldi. (2016). A Formal Definition of Big Data Based on Its Essential Features, Library Review, 65(3), 122-135. https://doi.org/10.1108/LR-06-2015-0061
- X. Wu et al. (2014). Data Mining with Big Data, IEEE Transactions on Knowledge and Data Engineering, 26(1), 97-107. https://doi.org/10.1109/TKDE.2013.109
- P. Philipp et al. (2017). A Semantic Framework for Sequential Decision Making, Journal of Web Engineering, 16(5-6), 471-504.
- B. Shin & H. Jeon. (2018). A Study on Disaster Information Support Using Big Data, Journal of the Korea Convergence Society, 9(8), 25-32. https://doi.org/10.15207/JKCS.2018.9.8.025
- I. A. T. Hashem et al. (2015). The Rise of "Big Data" on Cloud Computing: Review and Open Research Issues, Information Systems, 47, 98-115. https://doi.org/10.1016/j.is.2014.07.006
- A. S. Matteson, S. Choi & H. Lim. (2018), Inference of Korean Public Sentiment from Online News, Journal of the Korea Convergence Society, 9(7), 25-31. https://doi.org/10.15207/JKCS.2018.9.7.025
- H. Seo & H. Park. (2018). Design and Implementation of Potential Advertisement Keyword Extraction System Using SNS, Journal of the Korea Convergence Society, 9(7), 17-24. https://doi.org/10.15207/JKCS.2018.9.7.017
- Web Crawler. Available from: https://en.wikipedia.org/wiki/Web_crawler.
- S. Thenmalar & T. V. Geetha. (2014). The Modified Concept Based Focused Crawling Using Ontology, Journal of Web Engineering, 13(5-6), 525-538.
- S. Choudhary et al. (2014). Model-Based Rich Internet Applications Crawling: "Menu" and "Probability" Models, Journal of Web Engineering, 13(3-4), 243-262.
- J. Cho & H. Garcia-Molina. (2002). Parallel Crawlers, 11th International Conference on World Wide Web.
- J. Cho, H. Garcia-Molina & L. Page. (1998). Efficient Crawling through URL Ordering, Computer Networks and ISDN Systems, 30(1), 161-172. https://doi.org/10.1016/S0169-7552(98)00108-1
- A. Heydon & M. Najork. (1999). Mercator: A Scalable, Extensible Web Crawler, World Wide Web, 2(4), 219-229. https://doi.org/10.1023/A:1019213109274
- C. D. Manning, P. Raghavan & H. Schutze. (2008). Introduction to Information Retrieval, Cambridge University Press, 2008.
- J. Cho et al. (2006). Stanford WebBase Components and Applications, ACM Transactions on Internet Technology, 6(2), 153-186. https://doi.org/10.1145/1149121.1149124
- AWS. Available from: https://aws.amazon.com/