1 |
Pant, G., Srinivasan, P., Menczer, F., 'Crawling the Web,' Web Dynamics 2004, pp, 153-178
|
2 |
Bar-Yossef, Z., Keidar, I., Schonfeld, U., 'Do Not Crawl in the DUST: Different URLs with Similar Text,' in the Proceedings of the International World Wide web Conference (WWW 2007), pp. 111 - 120, May 2007
DOI
|
3 |
Burner M., 'Crawling Towards Eternity: Building an archive of the World Wide Web,' Web Techniques Magazine, 2(5), May 1997
|
4 |
Han, J., Kamber, M., Data Mining Concepts and Techniques, Morgan Kaufmann Publishers, Elsevier, San Francisco, CA, 2006
|
5 |
Chakrabarti, S., Mining the web, Discovering Knowledge from Hypertext Data, Morgan Kaufmann Publishers, Elservier, San Francisco, CA, 2003
|
6 |
Kim, S. J., Jeong, H. S., and Lee, S. H., 'Reliable Evaluations of URI. Normalization,' in Proceedings of the 2006 International Conference on Computational Science and its Applications (ICCSA), Glasgow, pp. 609-617, May 2006
DOI
ScienceOn
|
7 |
Netcraft June 2008 Web Server Survey, available at: http://news.netcraft.com/archives/web_server_survey.html
|
8 |
Berners-Lee, T., Fielding, R, Masinter, L., 'Uniform Resource Identifier (URI): General Syntax,' available at Hhttp://gbiv.com/protocols/uri/rfc/rfc 3986.htmlH.
|
9 |
The MD5 Message-Digest Algorithm, available at: http://tools.ietf.org/html/rfcl321
|
10 |
Soon, L. K. and Lee, S. R., 'Identifying Equivalent URI.s using URI. Signatures,' to appear in the Proceedings of the 4th IEEE International Conference on Signal-Image Technology & Internet- Based Systems (SITIS 2008), Bali, Indonesia, December 2008
DOI
|
11 |
Lee, S. H., Kim, S. J, Hong, S. H., 'On URL Normalization,' in Proceedings of the 2005 International Conference on Computational Science and its Applications (ICCSA), Singapore, pp, 1076-1085, May 2005
DOI
ScienceOn
|
12 |
Web Data Extractor, available at: http://www.webextractor.corn/
|