• Title/Summary/Keyword: Document information retrieval

Search Result 413, Processing Time 0.02 seconds

A Study on the Effect of Data Fusion on the Retrieval Effectiveness of Web Documents (데이터 결합이 웹 문서 검색성능에 미치는 영향 연구)

  • Park, Ok-Hwa;Chung, Young-Mee
    • Journal of Information Management
    • /
    • v.38 no.1
    • /
    • pp.1-19
    • /
    • 2007
  • This study investigates the effect of data fusion on the retrieval effectiveness by performing an experiment combining multiple representations of Web documents. The types of document representation combined in the study include content terms, links, anchor text, and URL. The experimental results showed that the data fusion technique combining document representation methods in Web environment did not bring any significant improvement in retrieval effectiveness.

An Experimental Study on Fuzzy Document Retrieval System (퍼지개념을 적용한 질의식의 분석과 문헌정보 검색에 관한 연구)

  • Lee Seung Chai
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.21
    • /
    • pp.249-290
    • /
    • 1991
  • Theoretical developments in the information retrieval have offered a number of alternatives to traditional Boolean retrieval. Probability theory and fuzzy set theory have played prominent roles here. Fuzzy set theory is an attempt to generalize traditional set theory by permitting partial membership in a set and this means recognizing different degrees to which a document can match a request. In this study, an experimentation of a document retrieval system using the fuzzy relation matrix of the keywords is described and the results are offered. The queries composed of keywords and Boolean operaters AND, OR, NOT were processed in the retrieval method, and the method was implemented on the PC of 32bit level (30 MHz) in an experimental system. The measurement of the recall ratio and precision ratio verified the effectiveness of the proposed fuzzy relation matrix of keywords and retrieval method. Compared to traditional crisp method in the same document database, the recall ratio increased $10\%$ high although the precision ratio decreased slightly. The problems, in this experiment, to be resolved are first, the design of the automatic data input and fuzzy indexing modules, through which the system . can have the ability of competition and usefulness. Second, devising a systematic procedure for assigning fuzzy weights to keywords in documents and in queries.

  • PDF

Index Graph : An IR Index Structure for Dynamic Document Database (인덱스 그래프 : 동적 문서 데이터베이스를 위한 IR 인덱스 구조)

  • 박병권
    • The Journal of Information Systems
    • /
    • v.10 no.1
    • /
    • pp.257-278
    • /
    • 2001
  • An IR(information retrieval) index for dynamic document databases where insertion, deletion, and update of documents happen frequently should be frequently updated. As the conventional structure of IR index is, however, focused on the information retrieval purpose, its structure is inefficient to handle dynamic update of it. In this paper, we propose a new structure for IR Index, we call it Index Graph, which is organized by connecting multiple indexes into a graph structure. By analysis and experiment, we prove the Index Graph is superior to the conventional structure of IR index in the performance of insertion, deletion, and update of documents as well as the performance of information retrieval.

  • PDF

Information Retrieval System for Mobile Devices (모바일 기기를 위한 정보검색 시스템)

  • Kim, Jae-Hoon;Kim, Hyung-Chul
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.33 no.4
    • /
    • pp.569-577
    • /
    • 2009
  • Mobile information retrieval is an evolving branch of information retrieval that is centered on mobile and ubiquitous environments. In general, mobile devices are characterized by lightweight, low power, small memory, small display, limited input/output, low bandwidth, and so on. Some of these characteristics make it impossible to apply general information retrieval to mobile environments without any modification. In order to relieve this problem, we design and implement an information retrieval system for mobile devices like wireless phones, PDA and handheld devices. We use document summarization techniques to alleviate the limitation of small display and user profiles to retrieve the most proper documents for each individual user for personalized search. Futhermore we use meta-search to lighten some burdens visiting several portal sites. In this paper, we have implemented and demonstrated the proposed mobile information retrieval system on the domain of travel and received good evaluation from users subjectively.

Research on Function and Policy for e-Government System using Semantic Technology (전자정부내 의미기반 기술 도입에 따른 기능 및 정책 연구)

  • Go, Gwang-Seop;Jang, Yeong-Cheol;Lee, Chang-Hun
    • 한국디지털정책학회:학술대회논문집
    • /
    • 2007.06a
    • /
    • pp.79-87
    • /
    • 2007
  • This paper aims to offer a solution based on semantic document classification to improve e-Government utilization and efficiency for people using their own information retrieval system and linguistic expression Generally, semantic document classification method is an approach that classifies documents based on the diverse relationships between keywords in a document without fully describing hierarchial concepts between keywords. Our approach considers the deep meanings within the context of the document and radically enhances the information retrieval performance. Concept Weight Document Classification(CoWDC) method, which goes beyond using exist ing keyword and simple thesaurus/ontology methods by fully considering the concept hierarchy of various concepts is proposed, experimented, and evaluated. With the recognition that in order to verify the superiority of the semantic retrieval technology through test results of the CoWDC and efficiently integrate it into the e-Government, creation of a thesaurus, management of the operating system, expansion of the knowledge base and improvements in search service and accuracy at the national level were needed.

  • PDF

Semantic Document-Retrieval Based on Markov Logic (마코프 논리 기반의 시맨틱 문서 검색)

  • Hwang, Kyu-Baek;Bong, Seong-Yong;Ku, Hyeon-Seo;Paek, Eun-Ok
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.6
    • /
    • pp.663-667
    • /
    • 2010
  • A simple approach to semantic document-retrieval is to measure document similarity based on the bag-of-words representation, e.g., cosine similarity between two document vectors. However, such a syntactic method hardly considers the semantic similarity between documents, often producing semantically-unsound search results. We circumvent such a problem by combining supervised machine learning techniques with ontology information based on Markov logic. Specifically, Markov logic networks are learned from similarity-tagged documents with an ontology representing the diverse relationship among words. The learned Markov logic networks, the ontology, and the training documents are applied to the semantic document-retrieval task by inferring similarities between a query document and the training documents. Through experimental evaluation on real world question-answering data, the proposed method has been shown to outperform the simple cosine similarity-based approach in terms of retrieval accuracy.

Design of an Information Retrieval Indexing Method using XML Links (XML 링크정보를 이용한 정보 검색 색인 기법의 설계)

  • Kim, Eun-Jeong;Bae, Jong-Min
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.7
    • /
    • pp.2020-2027
    • /
    • 2000
  • The hypertext document is used for information exchange in the Web environments. Its structure is considered as having graph structures with links, which makes nonlinear processing of documents possible. This paper proposes an indexing method for information retrieval system using XML links. We define new attributes that control links of a remote document and assign an unique identifier for the attribute of each link. Each identifier has a different weight according to its occurrence position that is local or remote documents. We index a word not only from a local document but a remote document based on the given weight. Experimental results show that the proposed method outperforms conventional retrieval systems that ignore links.

  • PDF

Mathematical Properties of the Formulas Evaluating Boolean Operators in Information Retrieval (정보검색에서 부울연산자를 연산하는 식의 수학적 특성)

  • 이준호;이기호;조영화
    • Journal of the Korean Society for information Management
    • /
    • v.12 no.1
    • /
    • pp.87-97
    • /
    • 1995
  • Boolean retrieval systems have been most widely used in the area of information retrieval due to easy implementation and efficient retrieval. Conventional Boolean retrieval systems. however, cannot rank retrieved documents in decreasing order of query-document similarities because they cannot compute similarity coefficients between queries and documents. Extended Boolean models such as fuzzy set. Waller-Kraft, Paice, P-Norm and Infinite-One have been developed to provide the document ranking facility. In extended Boolean models, the formulas evaluating Boolean operators AND and OR are an important component to affect the quality of document ranking. In this paper we present mathematical properties of the formulas, and analyse their effect on retrieval effectiveness. Our analyses show that P-Norm is the most suitable for achieving high retrieval effectiveness.

  • PDF

Relevance Feedback based on Medicine Ontology for Retrieval Performance Improvement (검색 성능 향상을 위한 약품 온톨로지 기반 연관 피드백)

  • Lim, Soo-Yeon
    • Journal of the Korean Society for information Management
    • /
    • v.22 no.2 s.56
    • /
    • pp.41-56
    • /
    • 2005
  • For the purpose of extending the Web that is able to understand and process information by machine, Semantic Web shared knowledge in the ontology form. For exquisite query processing, this paper proposes a method to use semantic relations in the ontology as relevance feedback information to query expansion. We made experiment on pharmacy domain. And in order to verify the effectiveness of the semantic relation in the ontology, we compared a keyword based document retrieval system that gives weights by using the frequency information compared with an ontology based document retrieval system that uses relevant information existed in the ontology to a relevant feedback. From the evaluation of the retrieval performance. we knew that search engine used the concepts and relations in ontology for improving precision effectively. Also it used them for the basis of the inference for improvement the retrieval performance.

How Query by humming, a Music Information Retrieval System, is Being Used in the Music Education Classroom

  • Bradshaw, Brian
    • Journal of Multimedia Information System
    • /
    • v.4 no.3
    • /
    • pp.99-106
    • /
    • 2017
  • This study does a qualitative and quantitative analysis of how music by humming is being used by music educators in the classroom. Music by humming is part division of music information retrieval. In order to define what a music information retrieval system is first I need to define what it is. Berger and Lafferty (1999) define information retrieval as "someone doing a query to a retrieval system, a user begins with an information need. This need is an ideal document- perfect fit for the user, but almost certainly not present in the retrieval system's collection of documents. From this ideal document, the user selects a group of identifying terms. In the context of traditional IR, one could view this group of terms as akin to expanded query." Music Information Retrieval has its background in information systems, data mining, intelligent systems, library science, music history and music theory. Three rounds of surveys using question pro where completed. The study found that there were variances in knowledge, training and level of awareness of query by humming, music information retrieval systems. Those variance relationships where based on music specialty, level that they teach, and age of the respondents.