• Title/Summary/Keyword: Top-K Retrieval

Search Result 48, Processing Time 0.023 seconds

Optimized Structures with Hop Constraints for Web Information Retrieval (Hop 제약조건이 고려된 최적화 웹정보검색)

  • Lee, Woo-Key;Kim, Ki-Baek;Lee, Hwa-Ki
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.33 no.4
    • /
    • pp.63-82
    • /
    • 2008
  • The explosively growing attractiveness of the Web is commencing significant demands for a structuring analysis on various web objects. The larger the substantial number of web objects are available, the more difficult for the clients(i.e. common web users and web robots) and the servers(i.e. Web search engine) to retrieve what they really want. We have in mind focusing on the structure of web objects by introducing optimization models for more convenient and effective information retrieval. For this purpose, we represent web objects and hyperlinks as a directed graph from which the optimal structures are derived in terms of rooted directed spanning trees and Top-k trees. Computational experiments are executed for synthetic data as well as for real web sites' domains so that the Lagrangian Relaxation approaches have exploited the Top-k trees and Hop constraint resolutions. In the experiments, our methods outperformed the conventional approaches so that the complex web graph can successfully be converted into optimal-structured ones within a reasonable amount of computation time.

Word Embeddings-Based Pseudo Relevance Feedback Using Deep Averaging Networks for Arabic Document Retrieval

  • Farhan, Yasir Hadi;Noah, Shahrul Azman Mohd;Mohd, Masnizah;Atwan, Jaffar
    • Journal of Information Science Theory and Practice
    • /
    • v.9 no.2
    • /
    • pp.1-17
    • /
    • 2021
  • Pseudo relevance feedback (PRF) is a powerful query expansion (QE) technique that prepares queries using the top k pseudorelevant documents and choosing expansion elements. Traditional PRF frameworks have robustly handled vocabulary mismatch corresponding to user queries and pertinent documents; nevertheless, expansion elements are chosen, disregarding similarity to the original query's elements. Word embedding (WE) schemes comprise techniques of significant interest concerning QE, that falls within the information retrieval domain. Deep averaging networks (DANs) defines a framework relying on average word presence passed through multiple linear layers. The complete query is understandably represented using the average vector comprising the query terms. The vector may be employed for determining expansion elements pertinent to the entire query. In this study, we suggest a DANs-based technique that augments PRF frameworks by integrating WE similarities to facilitate Arabic information retrieval. The technique is based on the fundamental that the top pseudo-relevant document set is assessed to determine candidate element distribution and select expansion terms appropriately, considering their similarity to the average vector representing the initial query elements. The Word2Vec model is selected for executing the experiments on a standard Arabic TREC 2001/2002 set. The majority of the evaluations indicate that the PRF implementation in the present study offers a significant performance improvement compared to that of the baseline PRF frameworks.

MRI Image Retrieval Using Wavelet with Mahalanobis Distance Measurement

  • Rajakumar, K.;Muttan, S.
    • Journal of Electrical Engineering and Technology
    • /
    • v.8 no.5
    • /
    • pp.1188-1193
    • /
    • 2013
  • In content based image retrieval (CBIR) system, the images are represented based upon its feature such as color, texture, shape, and spatial relationship etc. In this paper, we propose a MRI Image Retrieval using wavelet transform with mahalanobis distance measurement. Wavelet transformation can also be easily extended to 2-D (image) or 3-D (volume) data by successively applying 1-D transformation on different dimensions. The proposed algorithm has tested using wavelet transform and performance analysis have done with HH and $H^*$ elimination methods. The retrieval image is the relevance between a query image and any database image, the relevance similarity is ranked according to the closest similar measures computed by the mahalanobis distance measurement. An adaptive similarity synthesis approach based on a linear combination of individual feature level similarities are analyzed and presented in this paper. The feature weights are calculated by considering both the precision and recall rate of the top retrieved relevant images as predicted by our enhanced technique. Hence, to produce effective results the weights are dynamically updated for robust searching process. The experimental results show that the proposed algorithm is easily identifies target object and reduces the influence of background in the image and thus improves the performance of MRI image retrieval.

Development of Content-Based Trademark Retrieval System on the World Wide Web

  • Kim, Young-Sum;Kim, Yong-Sung;Kim, Whoi-Yul;Kim, Myung-Joon
    • ETRI Journal
    • /
    • v.21 no.1
    • /
    • pp.40-54
    • /
    • 1999
  • In this paper, we describe a new trademark retrieval system based upon the content or the shape of trademark. The system has an on-line graphical user interface for the World Wide Web (WWW) that allows user to provide a query in forms of a sketch or a visual image to search for similar trademarks from database. User interfaces for the WWW were implemented by utilizing HTML and Java applets. The query can occur in arbitrary size and orientation. A shape representation scheme invariant to scale and rotation was developed to measure the similarity between two trademarks using the magnitude of Zernike moments as a feature set. Performance evaluation has been carried out with a database of 3,000 trademarks. It takes only about 0.6 second for the retrieval on a 200 MHz Pentium PC. The average recall of the original one among top 30 candidates queried by noisy or deformed images was 100%.

  • PDF

A New Approach of Domain Dictionary Generation

  • Xi, Su Mei;Cho, Young-Im;Gao, Qian
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.12 no.1
    • /
    • pp.15-19
    • /
    • 2012
  • A Domain Dictionary generation algorithm based on pseudo feedback model is presented in this paper. This algorithm can increase the precision of domain dictionary generation algorithm. The generation of Domain Dictionary is regarded as a domain term retrieval process: Assume that top N strings in the original retrieval result set are relevant to C, append these strings into the dictionary, retrieval again. Iterate the process until a predefined number of domain terms have been generated. Experiments upon corpus show that the precision of pseudo feedback model based algorithm is much higher than existing algorithms.

Image Retrieval Using Entropy-Based Image Segmentation (엔트로피에 기반한 영상분할을 이용한 영상검색)

  • Jang, Dong-Sik;Yoo, Hun-Woo;Kang, Ho-Jueng
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.8 no.4
    • /
    • pp.333-337
    • /
    • 2002
  • A content-based image retrieval method using color, texture, and shape features is proposed in this paper. A region segmentation technique using PIM(Picture Information Measure) entropy is used for similarity indexing. For segmentation, a color image is first transformed to a gray image and it is divided into n$\times$n non-overlapping blocks. Entropy using PIM is obtained from each block. Adequate variance to perform good segmentation of images in the database is obtained heuristically. As variance increases up to some bound, objects within the image can be easily segmented from the background. Therefore, variance is a good indication for adequate image segmentation. For high variance image, the image is segmented into two regions-high and low entropy regions. In high entropy region, hue-saturation-intensity and canny edge histograms are used for image similarity calculation. For image having lower variance is well represented by global texture information. Experiments show that the proposed method displayed similar images at the average of 4th rank for top-10 retrieval case.

Medical Image Classification and Retrieval Using BoF Feature Histogram with Random Forest Classifier (Random Forest 분류기와 Bag-of-Feature 특징 히스토그램을 이용한 의료영상 자동 분류 및 검색)

  • Son, Jung Eun;Ko, Byoung Chul;Nam, Jae Yeal
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.4
    • /
    • pp.273-280
    • /
    • 2013
  • This paper presents novel OCS-LBP (Oriented Center Symmetric Local Binary Patterns) based on orientation of pixel gradient and image retrieval system based on BoF (Bag-of-Feature) and random forest classifier. Feature vectors extracted from training data are clustered into code book and each feature is transformed new BoF feature using code book. BoF features are applied to random forest for training and random forest having N classes is constructed by combining several decision trees. For testing, the same OCS-LBP feature is extracted from a query image and BoF is applied to trained random forest classifier. In contrast to conventional retrieval system, query image selects similar K-nearest neighbor (K-NN) classes after random forest is performed. Then, Top K similar images are retrieved from database images that are only labeled K-NN classes. Compared with other retrieval algorithms, the proposed method shows both fast processing time and improved retrieval performance.

Object Tracking System for Additional Service Providing under Interactive Broadcasting Environment (대화형 방송 환경에서 부가서비스 제공을 위한 객체 추적 시스템)

  • Ahn, Jun-Han;Byun, Hye-Ran
    • Journal of KIISE:Information Networking
    • /
    • v.29 no.1
    • /
    • pp.97-107
    • /
    • 2002
  • In general, under interactive broadcasting environment, user finds additional service using top-down menu. However, user can't know that additional service provides information until retrieval has finished and top-down menu requires multi-level retrieval. This paper proposes the new method for additional service providing not using top-down menu but using object selection. For the purpose of this method, the movie of a MPEG should be synchronized with the object information(position, size, shape) and object tracking technique is required. Synchronization technique uses the Directshow provided by the Microsoft. Object tracking techniques use a motion-based tracking and a model-based tracking together. We divide object into two parts. One is face and the other is substance. Face tracking uses model-based tracking and Substance uses motion-based tracking base on the block matching algorithm. To improve precise tracking, motion-based tracking apply the temporal prediction search algorithm and model-based tracking apply the face model which merge ellipse model and color model.

A Search-Result Clustering Method based on Word Clustering for Effective Browsing of the Paper Retrieval Results (논문 검색 결과의 효과적인 브라우징을 위한 단어 군집화 기반의 결과 내 군집화 기법)

  • Bae, Kyoung-Man;Hwang, Jae-Won;Ko, Young-Joong;Kim, Jong-Hoon
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.3
    • /
    • pp.214-221
    • /
    • 2010
  • The search-results clustering problem is defined as the automatic and on-line grouping of similar documents in search results returned from a search engine. In this paper, we propose a new search-results clustering algorithm specialized for a paper search service. Our system consists of two algorithmic phases: Category Hierarchy Generation System (CHGS) and Paper Clustering System (PCS). In CHGS, we first build up the category hierarchy, called the Field Thesaurus, for each research field using an existing research category hierarchy (KOSEF's research category hierarchy) and the keyword expansion of the field thesaurus by a word clustering method using the K-means algorithm. Then, in PCS, the proposed algorithm determines the category of each paper using top-down and bottom-up methods. The proposed system can be used in the application areas for retrieval services in a specialized field such as a paper search service.

Domain Question Answering System (도메인 질의응답 시스템)

  • Yoon, Seunghyun;Rhim, Eunhee;Kim, Deokho
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.2
    • /
    • pp.144-147
    • /
    • 2015
  • Question Answering (QA) services can provide exact answers to user questions written in natural language form. This research focuses on how to build a QA system for a specific domain area. Online and offline QA system architecture of targeted domain such as domain detection, question analysis, reasoning, information retrieval, filtering, answer extraction, re-ranking, and answer generation, as well as data preparation are presented herein. Test results with an official Frequently Asked Question (FAQ) set showed 68% accuracy of the top 1 and 77% accuracy of the top 5. The contribution of each part such as question analysis system, document search engine, knowledge graph engine and re-ranking module for achieving the final answer are also presented.