• Title/Summary/Keyword: Top-K Retrieval

Search Result 48, Processing Time 0.032 seconds

Known-Item Retrieval Performance of a PICO-based Medical Question Answering Engine

  • Vong, Wan-Tze;Then, Patrick Hang Hui
    • Asia pacific journal of information systems
    • /
    • v.25 no.4
    • /
    • pp.686-711
    • /
    • 2015
  • The performance of a novel medical question-answering engine called CliniCluster and existing search engines, such as CQA-1.0, Google, and Google Scholar, was evaluated using known-item searching. Known-item searching is a document that has been critically appraised to be highly relevant to a therapy question. Results show that, using CliniCluster, known-items were retrieved on average at rank 2 ($MRR@10{\approx}0.50$), and most of the known-items could be identified from the top-10 document lists. In response to ill-defined questions, the known-items were ranked lower by CliniCluster and CQA-1.0, whereas for Google and Google Scholar, significant difference in ranking was not found between well- and ill-defined questions. Less than 40% of the known-items could be identified from the top-10 documents retrieved by CQA-1.0, Google, and Google Scholar. An analysis of the top-ranked documents by strength of evidence revealed that CliniCluster outperformed other search engines by providing a higher number of recent publications with the highest study design. In conclusion, the overall results support the use of CliniCluster in answering therapy questions by ranking highly relevant documents in the top positions of the search results.

The Access-Enhanced Search Interface Design for Korean Paintings (다양한 접근점 기반의 한국화 검색 인터페이스에 관한 연구)

  • Seo, Eun-Gyoung;Lee, Won-Kyung
    • Journal of the Korean Society for information Management
    • /
    • v.25 no.2
    • /
    • pp.25-48
    • /
    • 2008
  • The purpose of this study is to suggest retrieval interfaces for Korean paintings which support users to retrieve specific digitalized images of them through various access points and to widely browse based on unique features Korean paintings. The study, first, develops a set of descriptive elements suitable for Korean Paintings. Twenty-six core elements and one hundred seventy-two attributes are selected as descriptive items for Korean paintings based on the opinion of 8 experts. Then, to gam realistic evidence of what descriptive elements of image serve users as access points, it is investigated which elements are used as retrieval access points among 26 core elements by 300 peered users who are consisted with two groups such as common users and domain specialists. The study, in final, designs two(general and advanced) types of search interfaces and display interfaces based on the most popular top 15 descriptive elements. This access-enhanced platform which enables user-oriented searches will satisfy users in image retrieving.

Efficient 3D Model Retrieval using Discriminant Analysis (판별분석을 이용한 효율적인 3차원 모델 검색)

  • Song, Ju-Whan;Choi, Seong-Hee;Gwun, Ou-Bong
    • 전자공학회논문지 IE
    • /
    • v.45 no.2
    • /
    • pp.34-39
    • /
    • 2008
  • This study established the efficient system that retrieves the 3D model by using a statistical technique called the function of discriminant analysis. This method was suggested to search index, which was formed by the statistics of 128 feature vectors including those scope, minimum value, average, standard deviation, skewness and scale. All of these were sampled with Osada's D2 method and the statistics as a factor effecting a change turned the value of discriminant analytic function into that of index. Through the primary retrieval on the model of query, the class above the top 2% was drawn out by comparing the query with the index of previously saved class from the group of same models. This method was proved an efficient retrieval technique that saved its procedural time. It shortened the retrieval time for 3D model by 57% faster than the existing Osada's method, and the precision that similar models were found in the first place was recorded 0.362, which revealed it more efficient by 44.8%.

Rutgers Information Retrieval Evaluation Project on IR Performance on Different Precision Levels (럿거스 정보검색 평가 프로젝트에 관한 연구)

  • Lee, Hyuk-Jin;Belkin Nicholas J.;Krovitz Bob
    • Journal of the Korean Society for information Management
    • /
    • v.23 no.2
    • /
    • pp.97-111
    • /
    • 2006
  • The purpose of this study is to investigate what level of difference in precision would be significantly perceived by a human user of an information retrieval system. Not many researches have been conducted with regards to this issue in information retrieval field. Despite the non-significant results, there were several interesting findings in recognizing different levels of precision rates. The correctness of relevance task had little to do with the taken time for the task. In addition, the strong relationship between the subjects' topic familiarity and rate of correct judgments is one of the most interesting results in this study. It turned out that the subjects have more difficulty in a situation they have to judge between the two lists having more non-relevant documents than in a situation they do between the lists haying more relevant documents. Finally, the serious influence from the first top N documents in a list for relevance judgment task has been confirmed.

A Study on Improving the Effectiveness of Information Retrieval Through P-norm, RF, LCAF

  • Kim, Young-cheon;Lee, Sung-joo
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.2 no.1
    • /
    • pp.9-14
    • /
    • 2002
  • Boolean retrieval is simple and elegant. However, since there is no provision for term weighting, no ranking of the answer set is generated. As a result, the size of the output might be too large or too small. Relevance feedback is the most popular query reformulation strategy. in a relevance feedback cycle, the user is presented with a list of the retrieved documents and, after examining them, marks those which are relevant. In practice, only the top 10(or 20) ranked documents need to be examined. The main idea consists of selecting important terms, or expressions, attached to the documents that have been identified as relevant by the user, and of enhancing the importance of these terms in a new query formulation. The expected effect is that the new query will be moved towards the relevant documents and away from the non-relevant ones. Local analysis techniques are interesting because they take advantage of the local context provided with the query. In this regard, they seem more appropriate than global analysis techniques. In a local strategy, the documents retrieved for a given query q are examined at query time to determine terms for query expansion. This is similar to a relevance feedback cycle but might be done without assistance from the user.

FRIP System for Region-based Image Retrieval (영역기반 영상 검색을 위한 FRIP 시스템)

  • Ko, Byoung-Chul;Lee, Hae-Sung;Byun, Hye-Ran
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.7 no.3
    • /
    • pp.260-272
    • /
    • 2001
  • In this paper, we have designed a region-based image retrieval system, FRIP(Finding Region In the Pictures). This system includes a robust image segmentation scheme using color and texture direction and retrieval scheme based on features of each region. For image segmentation, by using a circular filter, we can protect the boundary of round object and merge stripes or spots of objects into body region. It also combines scaled and shifted color coordinate and texture direction. After image segmentation, in order to improve the storage management effectively and reduce the computation time, we extract compact features from each region and store as index. For user interface, by the user specified constraints such as color-care / don't care. scale-care / dont care, shape-care / dont care and location-care / dont care, the overal/ matching score is estimated and the top Ie nearest images are reported in the ascending order of the final score.

  • PDF

An Improved Approach to Ranking Web Documents

  • Gupta, Pooja;Singh, Sandeep K.;Yadav, Divakar;Sharma, A.K.
    • Journal of Information Processing Systems
    • /
    • v.9 no.2
    • /
    • pp.217-236
    • /
    • 2013
  • Ranking thousands of web documents so that they are matched in response to a user query is really a challenging task. For this purpose, search engines use different ranking mechanisms on apparently related resultant web documents to decide the order in which documents should be displayed. Existing ranking mechanisms decide on the order of a web page based on the amount and popularity of the links pointed to and emerging from it. Sometime search engines result in placing less relevant documents in the top positions in response to a user query. There is a strong need to improve the ranking strategy. In this paper, a novel ranking mechanism is being proposed to rank the web documents that consider both the HTML structure of a page and the contextual senses of keywords that are present within it and its back-links. The approach has been tested on data sets of URLs and on their back-links in relation to different topics. The experimental result shows that the overall search results, in response to user queries, are improved. The ordering of the links that have been obtained is compared with the ordering that has been done by using the page rank score. The results obtained thereafter shows that the proposed mechanism contextually puts more related web pages in the top order, as compared to the page rank score.

The MeSH-Term Query Expansion Models using LDA Topic Models in Health Information Retrieval (MeSH 기반의 LDA 토픽 모델을 이용한 검색어 확장)

  • You, Sukjin
    • Journal of Korean Library and Information Science Society
    • /
    • v.52 no.1
    • /
    • pp.79-108
    • /
    • 2021
  • Information retrieval in the health field has several challenges. Health information terminology is difficult for consumers (laypeople) to understand. Formulating a query with professional terms is not easy for consumers because health-related terms are more familiar to health professionals. If health terms related to a query are automatically added, it would help consumers to find relevant information. The proposed query expansion (QE) models show how to expand a query using MeSH terms. The documents were represented by MeSH terms (i.e. Bag-of-MeSH), found in the full-text articles. And then the MeSH terms were used to generate LDA (Latent Dirichlet Analysis) topic models. A query and the top k retrieved documents were used to find MeSH terms as topic words related to the query. LDA topic words were filtered by threshold values of topic probability (TP) and word probability (WP). Threshold values were effective in an LDA model with a specific number of topics to increase IR performance in terms of infAP (inferred Average Precision) and infNDCG (inferred Normalized Discounted Cumulative Gain), which are common IR metrics for large data collections with incomplete judgments. The top k words were chosen by the word score based on (TP *WP) and retrieved document ranking in an LDA model with specific thresholds. The QE model with specific thresholds for TP and WP showed improved mean infAP and infNDCG scores in an LDA model, comparing with the baseline result.

An Analysis of Image Use in Twitter Message (트위터 상의 이미지 이용에 관한 분석)

  • Chung, EunKyung;Yoon, JungWon
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.24 no.4
    • /
    • pp.75-90
    • /
    • 2013
  • Given the context that users are actively using social media with multimedia embedded information, the purpose of this study is to demonstrate how images are used within Twitter messages, especially in influential and favorited messages. In order to achieve the purpose of this study, the top 200 influential and favorited messages with images were selected out of 1,589 tweets related to "Boston bombing" in April 2013. The characteristics of the message, image use, and user are analyzed and compared. Two phases of the analysis were conducted on three data sets containing the top 200 influential messages, top 200 favorited messages, and general messages. In the first phase, coding schemes have been developed for conducting three categorical analyses: (1) categorization of tweets, (2) categorization of image use, and (3) categorization of users. The three data sets were then coded using the coding schemes. In the second phase, comparison analyses were conducted among influential, favorited, and general tweets in terms of tweet type, image use, and user. While messages expressing opinion were found to be most favorited, the messages that shared information were recognized as most influential to users. On the other hand, as only four image uses - information dissemination, illustration, emotive/persuasive, and information processing - were found in this data set, the primary image use is likely to be data-driven rather than object-driven. From the perspective of users, the user types such as government, celebrity, and photo-sharing sites were found to be favorited and influential. An improved understanding of how users' image needs, in the context of social media, contribute to the body of knowledge of image needs. This study will also provide valuable insight into practical designs and implications of image retrieval systems or services.

Selection of Cluster Topic Words in Hierarchical Clustering using K-Means Algorithm

  • Lee Shin Won;Yi Sang Seon;An Dong Un;Chung Sung Jong
    • Proceedings of the IEEK Conference
    • /
    • 2004.08c
    • /
    • pp.885-889
    • /
    • 2004
  • Fast and high-quality document clustering algorithms play an important role in providing data exploration by organizing large amounts of information into a small number of meaningful clusters. Hierarchical clustering improves the performance of retrieval and makes that users can understand easily. For outperforming of clustering, we implemented hierarchical structure with variety and readability, by careful selection of cluster topic words and deciding the number of clusters dynamically. It is important to select topic words because hierarchical clustering structure is summarizes result of searching. We made choice of noun word as a cluster topic word. The quality of topic words is increased $33\%$ as follows. As the topic word of each cluster, the only noun word is extracted for the top-level cluster and the used topic words for the children clusters were not reused.

  • PDF