DOI QR코드

DOI QR Code

Deep Web and MapReduce

  • Tao, Yufei (Division of Web Science and Technology, Korea Advanced Institute of Science and Technology)
  • 투고 : 2013.06.07
  • 심사 : 2013.07.01
  • 발행 : 2013.09.30

초록

This invited paper introduces results on Web science and technology obtained during work with the Korea Advanced Institute of Science and Technology. In the first part, we discuss algorithms for exploring the deep Web, which refers to the collection of Web pages that cannot be reached by conventional Web crawlers. In the second part, we discuss sorting algorithms on the MapReduce system, which has become a dominant paradigm for massive parallel computing.

키워드

참고문헌

  1. C. Sheng, N. Zhang, Y. Tao, and X. Jin, "Optimal algorithms for crawling a hidden database in the Web," Proceedings of the VLDB Endowment, vol. 5, no. 11, pp. 1112-1123, 2012. https://doi.org/10.14778/2350229.2350232
  2. J. Dean and S. Ghemawat, "MapReduce: simplified data processing on large clusters," in Proceedings of the 6th Symposium on Operating Systems Design & Implementation, San Francisco, CA, 2004, pp. 137-150.
  3. Y. Kwon, M. Balazinska, B. Howe, and J. Rolia, "Skew- Tune: mitigating skew in mapReduce applications," in Proceedings of the ACM SIGMOD International Conference on Management of Data, Scottsdale, AZ, 2012, pp. 25-36.
  4. O. O'Malley, "Terabyte sort on apache hadoop," Yahoo, Sunnyvale, CA, Technical report, 2008.
  5. Y. Tao, W. Lin, and X. Xiao, "Minimal mapReduce algorithms," in Proceedings of the ACM SIGMOD International Conference on Management of Data, New York, NY, 2013, pp. 529-540.
  6. R. Vernica, A. Balmin, K. S. Beyer, and V. Ercegovac, "Adaptive mapReduce using situation-aware mappers," in Proceedings of the 15th International Conference on Extending Database Technology, Berlin, Germany, 2012, pp. 420-431.