Browse > Article
http://dx.doi.org/10.5626/JCSE.2014.8.4.181

A New Approach to Web Data Mining Based on Cloud Computing  

Zhu, Wenzheng (School of Computer Science, Konkuk University)
Lee, Changhoon (School of Computer Science, Konkuk University)
Publication Information
Journal of Computing Science and Engineering / v.8, no.4, 2014 , pp. 181-186 More about this Journal
Abstract
Web data mining aims at discovering useful knowledge from various Web resources. There is a growing trend among companies, organizations, and individuals alike of gathering information through Web data mining to utilize that information in their best interest. In science, cloud computing is a synonym for distributed computing over a network; cloud computing relies on the sharing of resources to achieve coherence and economies of scale, similar to a utility over a network, and means the ability to run a program or application on many connected computers at the same time. In this paper, we propose a new system framework based on the Hadoop platform to realize the collection of useful information of Web resources. The system framework is based on the Map/Reduce programming model of cloud computing. We propose a new data mining algorithm to be used in this system framework. Finally, we prove the feasibility of this approach by simulation experiment.
Keywords
Web data mining; Cloud computing; Hadoop; Map/Reduce programming model;
Citations & Related Records
연도 인용수 순위
  • Reference
1 M. Armbrust, A. Fox, G. Rean, A. Joseph, R. Katz, A. Konwinski, L. Gunho, P. David, A. Rabkin, I. Stoica and M. Zaharia, "Above the clouds: a Berkeley view of cloud computing," Department of Electrical Engineering and Computing Sciences, University of California at Berkeley, Tech. Rep. UCB/EECS-2009-28, 2009.
2 C. H. Chang, M. Kayed, M. R. Girgis, and K. F. Shaalan, "A survey of Web information extraction systems," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 10, pp. 1411-1428, 2006.   DOI   ScienceOn
3 Wikipedia, "Cloud computing," http://en.wikipedia.org/wiki/Cloud_computing.
4 J. Dean and S. Ghemawat, "MapReduce simplified data processing on large clusters," in Proceedings of the 6th Symposium on Operating System Design and Implementation, San Francisco, CA, 2004, pp. 137-150.
5 R. Cooley, B. Mobasher, and J. Srivastava, "Web mining: information and pattern discovery on the World Wide Web," in Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence, Newport Beach, CA, 1997, pp. 558-567.
6 Hadoop, http://hadoop.apache.org.
7 Y. Tao, W. Lin, and X. Xiao, "Minimal MapReduce algorithms," in Proceedings of the ACM SIGMOD International Conference on Management of Data, New York, NY, 2013, pp. 529-540.
8 M. J. Fischer, X. Su, and Y. Yin, "Assigning tasks for efficiency in Hadoop: extended abstract," in Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures, Santorini, Greece, 2010, pp. 30-39.
9 W. W. Lin, "An improved data placement strategy for Hadoop," Journal of South China University of Technology: Natural Science, vol. 40, no. 1, pp. 152-158, 2012.
10 C. Gong, J. Liu, Q. Zhang, H. Chen, and Z. Gong, "The characteristics of cloud computing," in Proceedings of the 39th International Conference on Parallel Processing, San Diego, CA, 2010, pp. 275-279.
11 D. Jiang, B. C. Ooi, L. Shi, and S. Wu, "The performance of MapReduce: an in-depth study," Proceedings of the VLDB, vol. 3, no. 1-2, pp. 472-483, 2010.   DOI
12 X. L. Lu and J. M. He, "Study on cloud storage model of Map/Reduce-based index data," Journal of Ningbo University, vol. 24, no. 3, pp. 29-33, 2011.
13 R. Lammel, "Google's MapReduce programming model - revisited," Science of Computer Programming, vol. 70, no. 1, pp. 1-30, 2008.   DOI   ScienceOn
14 M. S. Chen, J. Han, and P. S. Yu, "Data mining: an overview from a database perspective," IEEE Transaction on Knowledge and Data Engineering, vol. 8, no. 6, pp. 866-883, 1996.   DOI   ScienceOn
15 Z. Bar-Yossef and S. Rajagopalan, "Template detection via data mining and its applications," in Proceedings of the 11th International Conference on World Wide Web, Honolulu, HI, 2002, pp. 580-591.