• Title/Summary/Keyword: Web Document Retrieval

Search Result 129, Processing Time 0.033 seconds

Adaptive User Profile for Information Retrieval from the Web

  • Srinil, Phaitoon;Pinngern, Ouen
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.1986-1989
    • /
    • 2003
  • This paper proposes the information retrieval improvement for the Web using the structure and hyperlinks of HTML documents along with user profile. The method bases on the rationale that terms appearing in different structure of documents may have different significance in identifying the documents. The method partitions the occurrence of terms in a document collection into six classes according to the tags in which particular terms occurred (such as Title, H1-H6 and Anchor). We use genetic algorithm to determine class importance values and expand user query. We also use this value in similarity computation and update user profile. Then a genetic algorithm is used again to select some terms from user profile to expand the original query. Lastly, the search engine uses the expanded query for searching and the results of the search engine are scored by similarity values between each result and the user profile. Vector space model is used and the weighting schemes of traditional information retrieval were extended to include class importance values. The tested results show that precision is up to 81.5%.

  • PDF

A Study on Information Retrieval of Web Using Local Context Analysts Feedback (지역적 문맥 분석 피드백을 이용한 웹 정보검색에 관한 연구)

  • Kim, Young-Cheon;Lee, Sung-Joo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.6
    • /
    • pp.745-751
    • /
    • 2004
  • In conventional boolean retrieval systems, document ranking is not supported and similarity coefficients cannot be computed between queries and documents. The MMM(Max and Min Model), Paice and P-norm models have been proposed in the past to support the ranking facility for boolean retrieval systems. They have common properties of interpreting boolean operators softly In this paper we propose a new soft evaluation method for web Information retrieval using local context analysis feedback model. We also show through performance comparison that local contort analysis feedback is more efficient and effective than MMM, Paice and P-norm.

Design and Implementation of Topic Map Generation System based Tag (태그 기반 토픽맵 생성 시스템의 설계 및 구현)

  • Lee, Si-Hwa;Lee, Man-Hyoung;Hwang, Dae-Hoon
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.5
    • /
    • pp.730-739
    • /
    • 2010
  • One of core technology in Web 2.0 is tagging, which is applied to multimedia data such as web document of blog, image and video etc widely. But unlike expectation that the tags will be reused in information retrieval and then maximize the retrieval efficiency, unacceptable retrieval results appear owing to toot limitation of tag. In this paper, in the base of preceding research about image retrieval through tag clustering, we design and implement a topic map generation system which is a semantic knowledge system. Finally, tag information in cluster were generated automatically with topics of topic map. The generated topics of topic map are endowed with mean relationship by use of WordNet. Also the topics are endowed with occurrence information suitable for topic pair, and then a topic map with semantic knowledge system can be generated. As the result, the topic map preposed in this paper can be used in not only user's information retrieval demand with semantic navigation but alse convenient and abundant information service.

A Re-Ranking Retrieval Model based on Two-Level Similarity Relation Matrices (2단계 유사관계 행렬을 기반으로 한 순위 재조정 검색 모델)

  • 이기영;은희주;김용성
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.11
    • /
    • pp.1519-1533
    • /
    • 2004
  • When Web-based special retrieval systems for scientific field extremely restrict the expression of user's information request, the process of the information content analysis and that of the information acquisition become inconsistent. In this paper, we apply the fuzzy retrieval model to solve the high time complexity of the retrieval system by constructing a reduced term set for the term's relatively importance degree. Furthermore, we perform a cluster retrieval to reflect the user's Query exactly through the similarity relation matrix satisfying the characteristics of the fuzzy compatibility relation. We have proven the performance of a proposed re-ranking model based on the similarity union of the fuzzy retrieval model and the document cluster retrieval model.

Efficient Internet Information Extraction Using Hyperlink Structure and Fitness of Hypertext Document (웹의 연결구조와 웹문서의 적합도를 이용한 효율적인 인터넷 정보추출)

  • Hwang Insoo
    • Journal of Information Technology Applications and Management
    • /
    • v.11 no.4
    • /
    • pp.49-60
    • /
    • 2004
  • While the World-Wide Web offers an incredibly rich base of information, organized as a hypertext it does not provide a uniform and efficient way to retrieve specific information. Therefore, it is needed to develop an efficient web crawler for gathering useful information in acceptable amount of time. In this paper, we studied the order in which the web crawler visit URLs to rapidly obtain more important web pages. We also developed an internet agent for efficient web crawling using hyperlink structure and fitness of hypertext documents. As a result of experiment on a website. it is shown that proposed agent outperforms other web crawlers using BackLink and PageRank algorithm.

  • PDF

Dynamic index storage and integrated searching service development (동적 색인 스토리지 및 통합 검색 서비스 개발)

  • Lee, Wang-Woo;Lee, Seok-Hyoung;Choe, Ho-Seop;Yoon, Hwa-Mook;Kim, Jong-Hwan;Hur, Yoon-Young
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2007.11a
    • /
    • pp.346-349
    • /
    • 2007
  • In this paper, the integrated search system made for the web news and review retrieval service is introduced. We made XSLTRobot that extract title, date, author and content from html document like news or reviews for search service. XSLTRobot used the XSLT technology in order to extract desired part of html page. The Intergrated Information Retrieval System(IIRS) is suitable for various search data format. And we introduce Dynamic Index Storage which is module of IIRS. Dynamic Index Storage is used to environment which needs fast index update like news. And it's design focused on retrieval performance because there was not many document that it has to update on a real time.

  • PDF

A Proposal of Methods for Extracting Temporal Information of History-related Web Document based on Historical Objects Using Machine Learning Techniques (역사객체 기반의 기계학습 기법을 활용한 웹 문서의 시간정보 추출 방안 제안)

  • Lee, Jun;KWON, YongJin
    • Journal of Internet Computing and Services
    • /
    • v.16 no.4
    • /
    • pp.39-50
    • /
    • 2015
  • In information retrieval process through search engine, some users want to retrieve several documents that are corresponding with specific time period situation. For example, if user wants to search a document that contains the situation before 'Japanese invasions of Korea era', he may use the keyword 'Japanese invasions of Korea' by using searching query. Then, search engine gives all of documents about 'Japanese invasions of Korea' disregarding time period in order. It makes user to do an additional work. In addition, a large percentage of cases which is related to historical documents have different time period between generation date of a document and record time of contents. If time period in document contents can be extracted, it may facilitate effective information for retrieval and various applications. Consequently, we pursue a research extracting time period of Joseon era's historical documents by using historic literature for Joseon era in order to deduct the time period corresponding with document content in this paper. We define historical objects based on historic literature that was collected from web and confirm a possibility of extracting time period of web document by machine learning techniques. In addition to the machine learning techniques, we propose and apply the similarity filtering based on the comparison between the historical objects. Finally, we'll evaluate the result of temporal indexing accuracy and improvement.

Document Clustering Methods using Hierarchy of Document Contents (문서 내용의 계층화를 이용한 문서 비교 방법)

  • Hwang, Myung-Gwon;Bae, Yong-Geun;Kim, Pan-Koo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.12
    • /
    • pp.2335-2342
    • /
    • 2006
  • The current web is accumulating abundant information. In particular, text based documents are a type used very easily and frequently by human. So, numerous researches are progressed to retrieve the text documents using many methods, such as probability, statistics, vector similarity, Bayesian, and so on. These researches however, could not consider both subject and semantic of documents. So, to overcome the previous problems, we propose the document similarity method for semantic retrieval of document users want. This is the core method of document clustering. This method firstly, expresses a hierarchy semantically of document content ut gives the important hierarchy domain of document to weight. With this, we could measure the similarity between documents using both the domain weight and concepts coincidence in the domain hierarchies.

XML-based Retrieval System for E-Learning Contents using mobile device PDA

  • Park Yong-Bin;Yang Hae-Sool
    • Proceedings of the Korea Society for Industrial Systems Conference
    • /
    • 2006.05a
    • /
    • pp.241-248
    • /
    • 2006
  • Web is greatly contributing in providing a variety of information. Especially, as media for the purpose of development and education of human resources, the role of web is important. Furthermore, E-Learning through web plays an important role for each enterprise and an educational institution. Also, above all, fast and various searches are required in order to manage and search a great number of educational contents in web. Therefore, most of present information is composed in HTML, so there are lots of restrictions. As a solution to such restriction, XML a standard of Web document, and its various search functions is being extended and studied variously. This paper proposes a search system able to search XML in E-Learning or var ious contents of non-XML using mobile device PDA.

  • PDF

Design of XQL Query Processing System for Structural information retrieval (구조적 정보 검색을 위한 XQL 질의 처리 시스템 설계)

  • 김상영;김철원;김광현;박종훈;정현철
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2003.10a
    • /
    • pp.892-896
    • /
    • 2003
  • XML is used in various fields such as interface format for data swapping between application between several various system passing over thing to mark to web browser simply. Accordingly, a lot of studies about system that can manage effectively and search XML document with formation of information, reusability, disposal and durability, portability are proceeding. In this paper, explain about XQL and document structure processor and language processor of quality and make contents of XML document by tree structure, structure information presents method that find XML document tree structure information that is correct on question using XQL while do parsing. Through this, described for design and embodiment of efficient XML document search system that use XQL that compose structure information of document in tree structure and is proposed in language of quality after do parsing absorbing XML document that is scattered on web.

  • PDF