• Title/Summary/Keyword: relevant information retrieval

Search Result 186, Processing Time 0.027 seconds

Shannon's Information Theory and Document Indexing (Shannon의 정보이론과 문헌정보)

  • Chung Young Mee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.6
    • /
    • pp.87-103
    • /
    • 1979
  • Information storage and retrieval is a part of general communication process. In the Shannon's information theory, information contained in a message is a measure of -uncertainty about information source and the amount of information is measured by entropy. Indexing is a process of reducing entropy of information source since document collection is divided into many smaller groups according to the subjects documents deal with. Significant concepts contained in every document are mapped into the set of all sets of index terms. Thus index itself is formed by paired sets of index terms and documents. Without indexing the entropy of document collection consisting of N documents is $log_2\;N$, whereas the average entropy of smaller groups $(W_1,\;W_2,...W_m)$ is as small $(as\;(\sum\limits^m_{i=1}\;H(W_i))/m$. Retrieval efficiency is a measure of information system's performance, which is largely affected by goodness of index. If all and only documents evaluated relevant to user's query can be retrieved, the information system is said $100\%$ efficient. Document file W may be potentially classified into two sets of relevant documents and non-relevant documents to a specific query. After retrieval, the document file W' is reclassified into four sets of relevant-retrieved, relevant-not retrieved, non-relevant-retrieved and non-relevant-not retrieved. It is shown in the paper that the difference in two entropies of document file Wand document file W' is a proper measure of retrieval efficiency.

  • PDF

A Study on the Utility of Relevance/Non-relevance Information in Homogeneous Documents (유사문헌집단에서 적합/부적합정보의 유용성에 관한 연구)

  • Moon, Sung-Been
    • Journal of the Korean Society for information Management
    • /
    • v.32 no.3
    • /
    • pp.277-293
    • /
    • 2015
  • This study examined the relative retrieval effectiveness after relevance feedback between two systems (Title/Abstract and Full-text) using four different sets of relevance judgment. Four relevance levels (not relevant, marginally relevant, relevant, highly relevant) are also used, each of which is determined by referees giving a relevance score to documents. This study also investigated how much the average precision was improved after relevance feedback when "marginally relevant" documents are included in the relevant class with the Title/Abstract system, and with the Full-text retrieval system as well. It is found that the Title/Abstract system benefited from relevance feedback with the marginally relevant documents. In case of the Title/Abstract system, the higher percentage of improvement was consistently obtained when including the marginally relevant documents in the relevance class, however the result was vice versa in case of the Full-text retrieval system. It implied that the marginally relevant documents in the relevant class had caused noises in the Full-text retrieval system.

Review and Make Up of HANTEC Test Collection Relevant Information (한텍(HANTEC) 테스트 컬렉션 적합성 정보 재평가 및 보완)

  • Kang, Hyun-Kyu;Jang, Hyeong-Il;Park, Kyung-Il;Kim, Hyun-Tae;Yeom, Sung-Wook;Ra, Dong-Yeol;Choe, Ho-Sup;Yoon, Hwa-Mook
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2007.11a
    • /
    • pp.160-166
    • /
    • 2007
  • HANTEC 2.0 (A Korean Test Collection) is distributed for evaluation of information retrieval systems. HANTEC 2.0 is consists of 120,000 documents, 50 topics(queries) and relevant information. The relevant information is constructed by pooling methods. The relevant information is very important for evaluation of information retrieval systems. So we would like to review of the relevant information by manual method. It will be show validation of pooling method and HANTEC relevant information. We make tool for manual review of relevant information and review of that. We review of relevant information between manual relevant information and HANTEC's. We review of pooling method and HANTEC relevant information. The manual relevant information will be use evaluation of information retrieval systems.

  • PDF

Design and Development of a Multimodal Biomedical Information Retrieval System

  • Demner-Fushman, Dina;Antani, Sameer;Simpson, Matthew;Thoma, George R.
    • Journal of Computing Science and Engineering
    • /
    • v.6 no.2
    • /
    • pp.168-177
    • /
    • 2012
  • The search for relevant and actionable information is a key to achieving clinical and research goals in biomedicine. Biomedical information exists in different forms: as text and illustrations in journal articles and other documents, in images stored in databases, and as patients' cases in electronic health records. This paper presents ways to move beyond conventional text-based searching of these resources, by combining text and visual features in search queries and document representation. A combination of techniques and tools from the fields of natural language processing, information retrieval, and content-based image retrieval allows the development of building blocks for advanced information services. Such services enable searching by textual as well as visual queries, and retrieving documents enriched by relevant images, charts, and other illustrations from the journal literature, patient records and image databases.

A Study on Information Retrieval Effectiveness by Cited References (인용문헌에 의한 정보검색 효과에 관한 고찰)

  • Lee Lanju
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.27
    • /
    • pp.265-289
    • /
    • 1994
  • Databases publicly available for online searching permit both citation and subject searching, however, subject searching has dominated the online search environment. Despite the power of citation searching, it may be underutilized This study explored the relationship between the number of cited references used in a citation search and information retrieval effectiveness, a relatively unstudied phenomenon. Three articles in the library and information science literature were chosen to represent sample questions. Cited reference searches were conducted for each article and each of its references. All searches were conducted in Social Scisearch and Scisearch on DIALOG. Relevance judgments on the retrieved citations were obtained from the authors of the original articles. This research focused on analyzing, in terms of information retrieval effectiveness, the overlap among postings sets retrieved by various combinations of cited references. The findings from the three case studies clearly showed that the more cited references used for the citation search, the better the performance, in terms of retrieving more relevant documents, up to a point of diminishing retums. In addition, generally the overall level of overlap among relevant documents sets was found to be low. Therefore, if only some of the cited references among many candidates are used for a citation search, a significant proportion of relevant documents may be missed. The analysis of the characteristics of cited references provided the ways to predict which cited refereces would be useful to improve information retrieval. The findings of this comprehensive exploratory study are of interest for both theoretical and practical reasons. They contribute to the development of a theoretical model for the effective use of the citation search. This model might also be implemented in operational online systems. In addition, the findings potentially will help online searchers improve their search strategies using the citation search so that they can better achieve their information retrieval goals: the retrieval of items relevant to a given question and the suppression of nonrelevant items.

  • PDF

Levelized Information Retrieval Method in Context Awareness Environments (컨텍스트 인식 환경에서 레벨화된 정보 검색 기법)

  • Kim, Sung-Rim;Kwon, Joon-Hee
    • Journal of the Institute of Electronics Engineers of Korea TE
    • /
    • v.42 no.1
    • /
    • pp.47-52
    • /
    • 2005
  • The context-aware retrieval method is one of the fundamental characteristics in ubiquitous computing. The essential aims of context-aware retrieval method are retrieving relevant information and delivering information quickly. We propose a new method that retrieves relevant information and delivers information quickly using characteristics of levelized contexts. We extract rules and recommendation information in the near future using context values and rules. Then we prefetch recommendation information in very near future using access score. Our method retrieves relevant information and deliver information quickly by storing only recommendation information to be needed in near future using the characteristics of levelized contexts.

Concept and Attribute based Answer Retrieval (개념 속성 기반 정보 검색)

  • Yun Bo-Hyun;Seo Chang-ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.3 s.35
    • /
    • pp.1-10
    • /
    • 2005
  • This paper presents the information retrieval system which can retrieve the most appropriate answer sentence for user queries by using the concept and the attribute for the knowledge retrieval. The system analyzes the user query into the Boolean queries with the concept and the attribute and then retrieve the relevant documents in the indexing set of answer documents. Users can retrieve the relevant answer sentences from the relevant documents. For this, the answer documents indexed by the concept and the attribute are segmented by each sentence respectively. Thus, the segmented sentences are analyzed into the concept and the attribute of which the relevance degree with indexing units of documents is evaluated. Then, the system indexes the location of answer sentences. In the experiment, we evaluate the performance of our answer retrieval system against 100 user queries and show the experimental results.

  • PDF

AN EFFICIENT DENSITY BASED ANT COLONY APPROACH ON WEB DOCUMENT CLUSTERING

  • M. REKA
    • Journal of applied mathematics & informatics
    • /
    • v.41 no.6
    • /
    • pp.1327-1339
    • /
    • 2023
  • World Wide Web (WWW) use has been increasing recently due to users needing more information. Lately, there has been a growing trend in the document information available to end users through the internet. The web's document search process is essential to find relevant documents for user queries.As the number of general web pages increases, it becomes increasingly challenging for users to find records that are appropriate to their interests. However, using existing Document Information Retrieval (DIR) approaches is time-consuming for large document collections. To alleviate the problem, this novel presents Spatial Clustering Ranking Pattern (SCRP) based Density Ant Colony Information Retrieval (DACIR) for user queries based DIR. The proposed first stage is the Term Frequency Weight (TFW) technique to identify the query weightage-based frequency. Based on the weight score, they are grouped and ranked using the proposed Spatial Clustering Ranking Pattern (SCRP) technique. Finally, based on ranking, select the most relevant information retrieves the document using DACIR algorithm.The proposed method outperforms traditional information retrieval methods regarding the quality of returned objects while performing significantly better in run time.

Relevance Feedback based on Medicine Ontology for Retrieval Performance Improvement (검색 성능 향상을 위한 약품 온톨로지 기반 연관 피드백)

  • Lim, Soo-Yeon
    • Journal of the Korean Society for information Management
    • /
    • v.22 no.2 s.56
    • /
    • pp.41-56
    • /
    • 2005
  • For the purpose of extending the Web that is able to understand and process information by machine, Semantic Web shared knowledge in the ontology form. For exquisite query processing, this paper proposes a method to use semantic relations in the ontology as relevance feedback information to query expansion. We made experiment on pharmacy domain. And in order to verify the effectiveness of the semantic relation in the ontology, we compared a keyword based document retrieval system that gives weights by using the frequency information compared with an ontology based document retrieval system that uses relevant information existed in the ontology to a relevant feedback. From the evaluation of the retrieval performance. we knew that search engine used the concepts and relations in ontology for improving precision effectively. Also it used them for the basis of the inference for improvement the retrieval performance.

Conceptual Retrieval of Chinese Frequently Asked Healthcare Questions

  • Liu, Rey-Long;Lin, Shu-Ling
    • International Journal of Knowledge Content Development & Technology
    • /
    • v.5 no.1
    • /
    • pp.49-68
    • /
    • 2015
  • Given a query (a health question), retrieval of relevant frequently asked questions (FAQs) is essential as the FAQs provide both reliable and readable information to healthcare consumers. The retrieval requires the estimation of the semantic similarity between the query and each FAQ. The similarity estimation is challenging as semantic structures of Chinese healthcare FAQs are quite different from those of the FAQs in other domains. In this paper, we propose a conceptual model for Chinese healthcare FAQs, and based on the conceptual model, present a technique ECA that estimates conceptual similarities between FAQs. Empirical evaluation shows that ECA can help various kinds of retrievers to rank relevant FAQs significantly higher. We also make ECA online to provide services for FAQ retrievers.