Optimization Model on the World Wide Web Organization with respect to Content Centric Measures

Lee Wookey;Kim Seung;Kim Hando;Kang Sukho;

Journal of the Korean Operations Research and Management Science Society (한국경영과학회지)

Volume 30 Issue 1
/
Pages.187-198
/
2005
/
1225-1119(pISSN)
/
2733-4759(eISSN)

The Korean Operations Research and Management Science Society (한국경영과학회)

Optimization Model on the World Wide Web Organization with respect to Content Centric Measures

월드와이드웹의 내용기반 구조최적화

Lee Wookey ;
Kim Seung ;
Kim Hando (KEDCOM Co. Ltd.) ;
Kang Sukho

이우기 (성결대학교 컴퓨터공학부) ;
김승 (서울대학교 산업공학과) ;
김한도 ;
강석호 (서울대학교 산업공학과)

Published : 2005.03.01

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

The structure of a Web site can prevent the search robots or crawling agents from confusion in the midst of huge forest of the Web pages. We formalize the view on the World Wide Web and generalize it as a hierarchy of Web objects such as the Web as a set of Web sites, and a Web site as a directed graph with Web nodes and Web edges. Our approach results in the optimal hierarchical structure that can maximize the weight, tf-idf (term frequency and inverse document frequency), that is one of the most widely accepted content centric measures in the information retrieval community, so that the measure can be used to embody the semantics of search query. The experimental results represent that the optimization model is an effective alternative in the dynamically changing Web environment by replacing conventional heuristic approaches.

Keywords

References

Brin, S. and Page, L., 'The Anatomy of a Large-Scale Hypertextual Web Search Engine,' Computer Networks, Vol.30, No.1-7 (1998), pp.107-117
Broder, A., M. Najork and J Wiener, 'Efficient URL Caching for World Wide Web Crawling,' WWW (2003) pp.679-689
Chen, M., M. Hearst, J. Hong and J. Lin, 'Cha -Cha : A System for Organizing Intranet Search Results,' USENIX on Internet Technologies and Systems(1999), pp.11-14
Cooley, R., 'The Use of Web Structure and Content to Identify Subjectively Interesting Web Usage Patterns,' ACM Internet Technology, Vol.3, No.2(2003), pp,93-116 https://doi.org/10.1145/767193.767194
Demaine, E.D. and A. Lopez-Ortiz, 'A Linear Lower Bound on Index Size for Text Retrieval,' Journal of Algorithms, Vol.48, No.1 (2003), pp.2-15 https://doi.org/10.1016/S0196-6774(03)00043-9
Garofalakis, J., P. Kappos and D. Mourloukos, 'Web Site Optimization Using Page Popularity', IEEE Internet Computing, Vol.3, No.4 (1999), pp.22-29 https://doi.org/10.1109/4236.780957
Glover, E.J., K. Tsioutsiouliklis, S. Lawrence, D.M. Pennock and G.W. Flake, 'Using web Structure for Classifying and Describing Web Pages,' WWW (2002), pp.562-569
Gurrin, C. and A.F. Smeaton, 'Replicating Web Structure in Small-Scale Test Collections,' Information Retrieval, Vol.7, No.3-4 (2004), pp.239-263 https://doi.org/10.1023/B:INRT.0000011206.23588.ab
Gabriel Nivasch, 'Cycle detection using a stack,' Information Processing Letters, Vol. 90, No.3(2004), pp.135-140 https://doi.org/10.1016/j.ipl.2004.01.016
Henzinger, M.R., A. Heydon, M. Mitzenmacher and M. Najork, 'On Near-uniform URL Sampling', Computer Networks, Vol.33, No.1(2000), pp.295-308 https://doi.org/10.1016/S1389-1286(00)00055-4
Hou, J. and Y. Zhang, 'Effective Finding Relevant Web Pages from Linkage Information', IEEE TKDE, VoI.15, No.4(2003), pp.940-951
Kumar, R, P. Raghavan, S. Rajagopalan and A. Tomkins, 'Trawling the Web for Emerging Cyber-Communities', WWW (1999), pp. 403-415
Gurrin, C. and A.F. Smeaton, Replicating Web Structure in Small-Scale Test Collections, Information Retrieval, Vol.7, No.3 (2004), pp.239-263 https://doi.org/10.1023/B:INRT.0000011206.23588.ab
Mendelzon, A.O. and T. Milo, 'Formal Model of Web Queries,' ACM PODS (1997), pp.134-143
Demaine, E., A. Lopez-Ortiz, 'A Linear Lower Bound on Index Size for Text Retrieval,' Journal of Algorithms, Vol.48, No.1(2003), pp.2-15 https://doi.org/10.1016/S0196-6774(03)00043-9
Najork, M. and J. Wiener, 'Breadth-first crawling yields high-quality pages,' WWW (2001), pp.114-118
Pandurangan, G., P. Raghavan and E. Upfal, 'Using PageRank to Characterize Web Structure', COCOON (2002), pp.330-339
Glover, E.J., K. Tsioutsiouliklis, S. Lawrence, D.M. Pennock and G. Flake, 'Using Web Structure for Classifying and Describing Web Pages', WWW (2002) pp.562-569
Subramani, K. and L. Kovalchick, 'Contraction versus Relaxation : A Comparison of Two Approaches for the Negative Cost Cycle Detection Problem,' Computational Science (2003), pp.377-387
Thom, L.H. and C. Iochpe, 'Integrating a Pattern Catalogue in a Business Process Model,' In Proc. ICEIS (2004), pp.651-654
Wookey, L. and J. Geller, 'Semantic Hierarchical Abstraction of Web Site Structures for Web Searchers,' Journal of Research and Practice in Information Technology, Vol.36, No.1(2004), pp.71-8Z
Zwol, R. and P. Apers, 'The webspace method : On the Integration of Database Technology with Multimedia Retrieval,' ICIKM (2000), pp.438-445

Journal of the Korean Operations Research and Management Science Society (한국경영과학회지)

Optimization Model on the World Wide Web Organization with respect to Content Centric Measures

월드와이드웹의 내용기반 구조최적화

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)