Optimization Model on the World Wide Web Organization with respect to Content Centric Measures

월드와이드웹의 내용기반 구조최적화

  • Published : 2005.03.01

Abstract

The structure of a Web site can prevent the search robots or crawling agents from confusion in the midst of huge forest of the Web pages. We formalize the view on the World Wide Web and generalize it as a hierarchy of Web objects such as the Web as a set of Web sites, and a Web site as a directed graph with Web nodes and Web edges. Our approach results in the optimal hierarchical structure that can maximize the weight, tf-idf (term frequency and inverse document frequency), that is one of the most widely accepted content centric measures in the information retrieval community, so that the measure can be used to embody the semantics of search query. The experimental results represent that the optimization model is an effective alternative in the dynamically changing Web environment by replacing conventional heuristic approaches.

Keywords

References

  1. Brin, S. and Page, L., 'The Anatomy of a Large-Scale Hypertextual Web Search Engine,' Computer Networks, Vol.30, No.1-7 (1998), pp.107-117
  2. Broder, A., M. Najork and J Wiener, 'Efficient URL Caching for World Wide Web Crawling,' WWW (2003) pp.679-689
  3. Chen, M., M. Hearst, J. Hong and J. Lin, 'Cha -Cha : A System for Organizing Intranet Search Results,' USENIX on Internet Technologies and Systems(1999), pp.11-14
  4. Cooley, R., 'The Use of Web Structure and Content to Identify Subjectively Interesting Web Usage Patterns,' ACM Internet Technology, Vol.3, No.2(2003), pp,93-116 https://doi.org/10.1145/767193.767194
  5. Demaine, E.D. and A. Lopez-Ortiz, 'A Linear Lower Bound on Index Size for Text Retrieval,' Journal of Algorithms, Vol.48, No.1 (2003), pp.2-15 https://doi.org/10.1016/S0196-6774(03)00043-9
  6. Garofalakis, J., P. Kappos and D. Mourloukos, 'Web Site Optimization Using Page Popularity', IEEE Internet Computing, Vol.3, No.4 (1999), pp.22-29 https://doi.org/10.1109/4236.780957
  7. Glover, E.J., K. Tsioutsiouliklis, S. Lawrence, D.M. Pennock and G.W. Flake, 'Using web Structure for Classifying and Describing Web Pages,' WWW (2002), pp.562-569
  8. Gurrin, C. and A.F. Smeaton, 'Replicating Web Structure in Small-Scale Test Collections,' Information Retrieval, Vol.7, No.3-4 (2004), pp.239-263 https://doi.org/10.1023/B:INRT.0000011206.23588.ab
  9. Gabriel Nivasch, 'Cycle detection using a stack,' Information Processing Letters, Vol. 90, No.3(2004), pp.135-140 https://doi.org/10.1016/j.ipl.2004.01.016
  10. Henzinger, M.R., A. Heydon, M. Mitzenmacher and M. Najork, 'On Near-uniform URL Sampling', Computer Networks, Vol.33, No.1(2000), pp.295-308 https://doi.org/10.1016/S1389-1286(00)00055-4
  11. Hou, J. and Y. Zhang, 'Effective Finding Relevant Web Pages from Linkage Information', IEEE TKDE, VoI.15, No.4(2003), pp.940-951
  12. Kumar, R, P. Raghavan, S. Rajagopalan and A. Tomkins, 'Trawling the Web for Emerging Cyber-Communities', WWW (1999), pp. 403-415
  13. Gurrin, C. and A.F. Smeaton, Replicating Web Structure in Small-Scale Test Collections, Information Retrieval, Vol.7, No.3 (2004), pp.239-263 https://doi.org/10.1023/B:INRT.0000011206.23588.ab
  14. Mendelzon, A.O. and T. Milo, 'Formal Model of Web Queries,' ACM PODS (1997), pp.134-143
  15. Demaine, E., A. Lopez-Ortiz, 'A Linear Lower Bound on Index Size for Text Retrieval,' Journal of Algorithms, Vol.48, No.1(2003), pp.2-15 https://doi.org/10.1016/S0196-6774(03)00043-9
  16. Najork, M. and J. Wiener, 'Breadth-first crawling yields high-quality pages,' WWW (2001), pp.114-118
  17. Pandurangan, G., P. Raghavan and E. Upfal, 'Using PageRank to Characterize Web Structure', COCOON (2002), pp.330-339
  18. Glover, E.J., K. Tsioutsiouliklis, S. Lawrence, D.M. Pennock and G. Flake, 'Using Web Structure for Classifying and Describing Web Pages', WWW (2002) pp.562-569
  19. Subramani, K. and L. Kovalchick, 'Contraction versus Relaxation : A Comparison of Two Approaches for the Negative Cost Cycle Detection Problem,' Computational Science (2003), pp.377-387
  20. Thom, L.H. and C. Iochpe, 'Integrating a Pattern Catalogue in a Business Process Model,' In Proc. ICEIS (2004), pp.651-654
  21. Wookey, L. and J. Geller, 'Semantic Hierarchical Abstraction of Web Site Structures for Web Searchers,' Journal of Research and Practice in Information Technology, Vol.36, No.1(2004), pp.71-8Z
  22. Zwol, R. and P. Apers, 'The webspace method : On the Integration of Database Technology with Multimedia Retrieval,' ICIKM (2000), pp.438-445