Optimized Structures with Hop Constraints for Web Information Retrieval

Hop 제약조건이 고려된 최적화 웹정보검색

  • Published : 2008.12.31

Abstract

The explosively growing attractiveness of the Web is commencing significant demands for a structuring analysis on various web objects. The larger the substantial number of web objects are available, the more difficult for the clients(i.e. common web users and web robots) and the servers(i.e. Web search engine) to retrieve what they really want. We have in mind focusing on the structure of web objects by introducing optimization models for more convenient and effective information retrieval. For this purpose, we represent web objects and hyperlinks as a directed graph from which the optimal structures are derived in terms of rooted directed spanning trees and Top-k trees. Computational experiments are executed for synthetic data as well as for real web sites' domains so that the Lagrangian Relaxation approaches have exploited the Top-k trees and Hop constraint resolutions. In the experiments, our methods outperformed the conventional approaches so that the complex web graph can successfully be converted into optimal-structured ones within a reasonable amount of computation time.

Keywords

References

  1. Abiteboul, S., Predal, M., and Cobena, G., "Adaptive On-Line Page Importance Computation," In Proc. WWW, (2003), pp.280-290
  2. Alvarez, M., J. Raposo, A. Cacheda, F. Bellas, and V. Carneiro, "DeepBot:a Focused Crawler for Accessing Hidden Web Content," In Proc. DEECS, (2007), pp.18-25
  3. Bergman, M., "The Deep Web:Surfacing Hidden Value," Journal of Electronic Publishing, Vol.7, No.1(2001)
  4. Botafogo, R., E. Rivlin, and B. Shneiderman, "Structural Analysis of Hypertexts:Identifying Hierarchies and Useful Metrics," ACM TOIS, Vol.10, No.2(1992), pp.142-180 https://doi.org/10.1145/146802.146826
  5. Brynjolfsson, E., A. Dick, and M. Smith, "Search and Product Differentiation at an Internet Shopbot," MIT Sloan Working Paper, No. 4441-03(2003)
  6. Caldo, P., B. Ribeiro-Neto, and N. Ziviani, "Local Versus Global Link Information in the Web," ACM TOIS, Vol.21, No.1(2003), pp.42-62 https://doi.org/10.1145/635484.635486
  7. Chakrabarti, S., "Dynamic Personalized Page-Rank in Entity-Relation Graphs," In Proc. WWW, (2007), pp.571-580
  8. Chang, C., M. Kayed, M. Girgis, and K. Shaalan, "A Survey of Web Information Extraction Systems," IEEE TKDE, Vol.18, No. 10(2006), pp.1411-1428
  9. Cilibrasi, R., and P. Vitanyi, "The Google Similarity Distance," IEEE TKDE, Vol.19, No.3(2007), pp.370-383
  10. Eichmann, D., "The RBSE spider:Balancing Effective Search Against Web Load," In Proc. WWW, (1994), pp.113-120
  11. Eiron, N., K. McCurley, and J. Tomlin, "Ranking the Web Frontier," In Proc. WWW, (2004), pp.309-318
  12. Fu, Y., M. Creado, and C. Ju, "Reorganizing Web Sites Based on User Access Patterns," In Proc. CIKM, (2002), pp.583-585
  13. Garofalakis, J., P. Kappos, and D. Mourloukos, "Web Site Optimization Using Page Popularity," IEEE Internet Computing, Vol.3, No.4 (1999), pp.22-29 https://doi.org/10.1109/4236.780957
  14. Gouveia, L., "Multicommodity Flow Models For Spanning Trees with Hop Constraints," European Journal of Operational Research, Vol.95, No.1(1996), pp.178-190 https://doi.org/10.1016/0377-2217(95)00090-9
  15. Gyongyi, Z., P. Berkhin, H. Garcia-Molina, and J. Pedersen, "Link Spam Detection Based on Mass Estimation," In Proc. VLDB, (2006), pp. 439-450
  16. Hammami, M., Y. Chahir, and L. Chen, "Web-Guard:A Web Filtering Engine Combining Textual, Structural, and Visual Content-Based Analysis," IEEE TkDE, Vol.18, No.2(2006), pp.272-284
  17. Haveliwala, T., "Topic-Sensitive PageRank:A Context-Sensitive Ranking Algorithm for Web Search," IEEE TKDE, Vol.15, No.4(2003), pp.784-796
  18. Henzinger, M.R., A. Heydon, M. Mitzenmacher, and M. Najork, "On Near-Uniform URL Sampling," Computer Networks, Vol.33, No.1-6(2000), pp.295-308 https://doi.org/10.1016/S1389-1286(00)00055-4
  19. Henzinger, M.R., "Combinatorial algorithms for web search engines:three success stories," In Proc. SODA, (2007), pp.1022-1026
  20. Ikeda, R., K. Zhao, and H. Garcia-Molina, "Matching Hierarchies Using Shared Objects," In Proc. ECDL, (2008), pp.209-220
  21. John, J.C., and U. Schonfeld, "RankMass Crawler:A Crawler with High PageRank Coverage Guarantee," In Proc. VLDB, (2007), pp.375-386
  22. Kawatra, R., "A Hop Constrained Min-Sum Arborescence with Outage Costs," In Proc. HICSS, (2003), pp.2648-2656
  23. Kleinberg, Jon. M., "Navigation in a Small World", Nature, Vol.406, No.6798(2000), p. 845 https://doi.org/10.1038/35022643
  24. Lin, C.C., "Optimal Web Site Reorganization Considering Information Overload and Search Depth," European Journal of Operational Research, (2005), pp.839-848
  25. Meng, W., C. Yu, and k. Liu, "Building Efficient and Effective Metasearch Engines," ACM Computing Surveys, Vol.34, No.1(2002), pp.48-89 https://doi.org/10.1145/505282.505284
  26. Miller, R., and k. Bharat, "Sphinx:A Framework for Creating Personal, Site-Specific Web Crawlers," In Proc. WWW, (1998), pp.119-130
  27. Najork, M., and J.L. Wiener, "Breadth-First Crawling Yields High-Quality Pages," In Proc. WWW, (2001), pp.114-118
  28. Najork, M., H. Zaragoza, and M.J. Taylor, "Hits on the Web:How Does It Compare?" In Proc. SIGIR, (2007), pp.471-478
  29. Novak, J., P. Raghavan, and A. Tomkins, "Anti-Aliasing on the Web," In Proc. WWW, (2004), pp.30-39
  30. Ntoulas, A., Zerfos, P., and Cho, J., "Downloading Textual Hidden Web Content Through Keyword Queries," In Proc. ICDL, (2005), pp. 100-109
  31. Pandurangan, G., P. Raghavan, and E. Upfal, "Using PageRank to Characterize Web Structure," In Proc. COCOON, (2002), pp.330-339
  32. Perkowitz, M., and O. Eizioni, "Toward Adaptive Web Sites:Conceptual Framework and Case Study," Artificial Intelligence, Vol.118, No.1-2(2000), pp.245-275 https://doi.org/10.1016/S0004-3702(99)00098-3
  33. Robertson, S.E. and S. Walker, "Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval, ACM SIGIR, (1994), pp.232-241
  34. Siganos, G., M. Faloutsos, P. Faloutsos, and C. Faloutsos, "Power Laws and the AS-level Internet Topology," IEEE Transactions on Networking, Vol.11, No.4(2003), pp.514-524 https://doi.org/10.1109/TNET.2003.815300
  35. Varadarajan, R., V. Hristidis, and T. Li, "Beyond Single-Page Web Search Results," IEEE TKDE, Vol.20, No.3(2008), pp.411-424
  36. Wookey, Lee and S. Lim, "Maximum Rooted Spanning Trees for the Web," OTM Workshops, Vol.2(2006), pp.1873-1882
  37. Wookey, Lee, S. Kim, and S. Kang, "Structuring Web Sites Using Linear Programming," LNCS, (2004), pp.328-337
  38. Xu, G., and W. Ma, "Building Implicit Links from Content for Forum Search," In Proc. SIGIR, (2006), pp.300-307
  39. http://www.websiteoptimization.com/
  40. http://www.poweradmin.com/servermonitor/