• Title/Summary/Keyword: URL normalization evaluation

Search Result 1, Processing Time 0.016 seconds

Effects and Evaluations of URL Normalization (URL정규화의 적용 효과 및 평가)

  • Jeong, Hyo-Sook;Kim, Sung-Jin;Lee, Sang-Ho
    • Journal of KIISE:Databases
    • /
    • v.33 no.5
    • /
    • pp.486-494
    • /
    • 2006
  • A web page can be represented by syntactically different URLs. URL normalization is a process of transforming URL strings into canonical form. Through this process, duplicate URL representations for a web page can be reduced significantly. A number of normalization methods have been heuristically developed and used, and there has been no study on analyzing the normalization methods systematically. In this paper, we give a way to evaluate normalization methods in terms of efficiency and effectiveness of web applications, and give users guidelines for selecting appropriate methods. To this end, we examine all the effects that can take place when a normalization method is adopted to web applications, and describe seven metrics for evaluating normalization methods. Lastly, the evaluation results on 12 normalization methods with the 25 million actual URLs are reported.