Document Ranking of Web Document Retrieval Systems

An, Dong-Un;Kang, In-Ho;

doi:10.1633/JIM.2003.34.2.055

Journal of Information Management (정보관리연구)

Volume 34 Issue 2
/
Pages.55-66
/
2003
/
0254-3621(pISSN)

Korea Institute of Science and Technology Information (한국과학기술정보연구원 과학기술정보센터)

DOI QR Code

Document Ranking of Web Document Retrieval Systems

웹 정보검색 시스템의 문서 순위 결정

An, Dong-Un (School of Information and Electronics Engineering, Chonbuk National University) ;
Kang, In-Ho (Department of Computer Science, KAIST)

안동언 (전북대학교 전자정보공학부) ;
강인호 (한국과학기술원 전자전산학과 전산학 전공)

Published : 2003.06.30

https://doi.org/10.1633/JIM.2003.34.2.055 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

The Web is rich with various sources of information. It contains the contents of documents, multimedia data, shopping materials and so on. Due to the massive and heterogeneous web document collections, users want to find various types of target pages. We can classify user queries as three categories according to users'intent, content search, the site search, and the service search. In this paper, we present that different strategies are needed to meet the need of a user. Also we show the properties of content information, link information and URL information according to the class of a user query. In the content search, content information showed the good result. However, we lost the performance by combining link information and URL information. In the site search, we could increase the performance by combining link information and URL information.

인터넷의 발달로 인해 웹에서 얻을 수 있는 정보의 종류와 수는 급진적으로 증가하고 있다. 기존의 문서 위주의 구성에서 멀티미디어 서비스, 쇼핑몰 등 종류와 매체에 있어서 다양한 변화를 보이고 있다. 이에 따라 사용자가 요구하는 정보의 단위는 문서 뿐만 아니라, 사이트 그리고 서비스 단위로 확장되고 있다. 웹 환경에서 사용자의 정보 요구를 보면 크게 세가지로 볼 수 있다. 첫째는 원하는 정보를 설명하는 혹은 정보와 관련된 문서를 찾는 내용검색, 둘째는 사용자가 관심 있어 하는 개인이나 단체의 사이트 입구를 찾는 사이트 검색, 셋째는 사용자가 관심 있어 하는 서비스를 제공하는 웹 페이지를 찾는 서비스 검색을 들 수 있다. 본 논문에서는 이러한 사용자의 정보 요구 목적에 따라서 문서 순위화가 달라져야 함을 보인다. 지금까지 정보 검색에서 언급된 내용 정보, 링크 정보 그리고 URL 정보의 유용함을 사용자의 정보 요구 형태에 따라서 분류한다. 내용 검색에서는 내용 정보가 유용한 반면 링크 정보와 URL 정보를 결합할 경우 성능의 저하를 초래했다. 반면 사이트 검색에서는 내용 정보만 쓰는 것 보다는 링크 정보와 URL 정보를 결합할 경우 성능의 향상을 얻을 수 있었다.

Keywords

References

Baeza-Yates, R., & Ribeiro-Neto, B. 1999. Modern Information Retrieval. ACM Press.
Bailey, P., Craswell, N., & Hawking, D. (to appear). “Engineering a multipurpose test collection for web retrieval experiments”. Information Processing and Management
Brin, S., & Page, L. 1998. “The Anatomy of a Large-Scale Hypertextual Web Search Engine”. Computer Networks and ISDN Systems, 30(1-7):107-117. https://doi.org/10.1016/S0169-7552(98)00110-X
Broder, A. 2002. “A Taxonomy of Web Search”. SIGIR Forum, 36(2).
Croft, W. B. 2000. Combining Approaches to Information Retrieval. In W. B. Croft (Ed.), Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval, Kluwer Academic Publishers : 1-36.
CSIRO. (2001). “Web Research Collections - Trec Web Track.”.
Hawking, D., & Craswell, N. 2001. “Overview of the Trec-2001 Web Track”. In Text REtrieval Conference (trec-10): 61-67.
Kleinberg, J. M. 1999. “Authoritative Sources in a Hyperlinked Environment”. Journal of the ACM, 46(5):604-632. https://doi.org/10.1145/2537948
Page, L., Brin, S., Motwani, R. & Winograd, T. 1998. The Pagerank citation ranking : Brining Order to the Web(Tech. Rep.). Stanford Digital Library Technologies Project.
Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M., & Gatford, M. 1994. “Okapi at trec-3”. In Text REtrieval Conference (trec-2): 109-126.
Salton, G. 1989. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Reading, MA : Addison-Wesley.
Salton, G. & McGill, M. 1983. Introduction to Modern Information Retrieval, New York : McGraw-Hill Press.
Westerveld, T., Kraaij, W., & Hiemstra, D. 2001. “Retrieving Web Pages using Content, Links, Urls and Anchors”. In Text REtrieval Conference (trec-10): 663-672.
Yang, K. 2001. “Combining Text and Link-Based Retrieval Methods for Web IR”. In Text REtrieval Conference(trec-10): 609-618.

Journal of Information Management (정보관리연구)

Document Ranking of Web Document Retrieval Systems

웹 정보검색 시스템의 문서 순위 결정

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)