Browse > Article

An Effective Metric for Measuring the Degree of Web Page Changes  

Kwon, Shin-Young (숭실대학교 대학원 컴퓨터학과)
Kim, Sung-Jin (서울대학교 컴퓨터학과)
Lee, Sang-Ho (숭실대학교 컴퓨터학부)
Abstract
A variety of similarity metrics have been used to measure the degree of web page changes. In this paper, we first define criteria for web page changes to evaluate the effectiveness of the similarity metrics in terms of six important types of web page changes. Second, we propose a new similarity metric appropriate for measuring the degree of web page changes. Using real web pages and synthesized pages, we analyze the five existing metrics (i.e., the byte-wise comparison, the TF IDF cosine distance, the word distance, the edit distance, and the shingling) and ours under the proposed criteria. The analysis result shows that our metric represents the changes more effectively than other metrics. We expect that our study can help users select an appropriate metric for particular web applications.
Keywords
web databases; web database management; web page changes;
Citations & Related Records
연도 인용수 순위
  • Reference
1 D. Fetterly, M. Manasse, M. Najork, and J. L. Wiener, 'A Large-Scale Study of the Evolution of Web Pages,' Software: Practice & Experience, Vol. 34, No. 2, pp. 213-237, 2004   DOI   ScienceOn
2 A. Ntoulas, J. Cho, and C. Olston, 'What's New on the Web? The Evolution of the Web from a Search Engine Perspective,' In Proceedings of the 13th International World Wide Web Conference, pp. 1-12, 2004
3 B. E. Brewington and G. Cybenko, 'How Dynamic is the Web?' In Proceedings of the 9th International World Wide Web Conference, pp. 257-276, 2000   DOI   ScienceOn
4 T. H. Cormen, C. E. Leiserson, and R. L. Rivest, 'Introduction to Algorithm,' the Massachusetts Institute of Technology, 2001
5 L. Lim, M. Wang, S. Padmanabhan, J. S. Vitter, and R. Agarwal, 'Characterizing Web Document Change,' In Proceedings of the 2nd International Conference on Advances in Web-Age Information Management, pp. 133-144, 2001
6 Yahoo Search Engine, http://www.yahoo.com
7 G. Salton and M. J. McGill, 'Introduction to Modern Information Retrieval,' McGraw-Hill, 1983
8 Google Search Engine, http://www.google.com
9 S. J. Kim and S. H. Lee, 'An Empirical Study on the Change of Web Pages,' In Proceedings of the 7th Asia-Pacific Web Conference, pp. 632-642, 2005
10 J. Cho and H. Garcia-Molina, 'Synchronizing a Database to Improve Freshness,' the 26th ACM SIGMOD International Conference on Management of Data, pp. 117-128, 2000