• Title/Summary/Keyword: Document Retrieval

Search Result 448, Processing Time 0.028 seconds

The Design of Retrieval System Using Fuzzy Logic (퍼지 논리(論理)를 이용한 정보검색(情報檢索) 시스템의 설계(設計))

  • Cho, Hye-Min
    • Journal of Information Management
    • /
    • v.24 no.3
    • /
    • pp.73-100
    • /
    • 1993
  • In attempting to respond to boolean retrieval system's limitations, this paper presents the design of a retrieval system using fuzzy logic. The fuzzy retrieval system introduces the weights of terms in the documents and in the query and makes use of them to determine how much relevant a document is to the given query. After comparing and analyzing the previous researches, an effective model of the fuzzy retrieval system is suggested and the performance of the system is evaluated through actual examples.

  • PDF

A Hangul Document Classification System using Case-based Reasoning (사례기반 추론을 이용한 한글 문서분류 시스템)

  • Lee, Jae-Sik;Lee, Jong-Woon
    • Asia pacific journal of information systems
    • /
    • v.12 no.2
    • /
    • pp.179-195
    • /
    • 2002
  • In this research, we developed an efficient Hangul document classification system for text mining. We mean 'efficient' by maintaining an acceptable classification performance while taking shorter computing time. In our system, given a query document, k documents are first retrieved from the document case base using the k-nearest neighbor technique, which is the main algorithm of case-based reasoning. Then, TFIDF method, which is the traditional vector model in information retrieval technique, is applied to the query document and the k retrieved documents to classify the query document. We call this procedure 'CB_TFIDF' method. The result of our research showed that the classification accuracy of CB_TFIDF was similar to that of traditional TFIDF method. However, the average time for classifying one document decreased remarkably.

A Study of Automatic Indexing Technique based on Logical Structure of SGML Hangul Document (SGML 한글문서의 논리적 구조에 근거한 색인기법에 관한 연구)

  • 유석종
    • Journal of the Korean Society for information Management
    • /
    • v.12 no.2
    • /
    • pp.85-101
    • /
    • 1995
  • Conventional indexing sytstems support only full-text indexing method for electronic documents and do not use logical structure of documents in retrieval. Most electronic documents are in different formats depending on various systems. Also, they only indicate physical style of the document without considering any logical structure. Thus, in the effort to standardize the exchange of documents. IS0 developed SGML(Stadard Generalized Markup Language) which contains information about logical structure of the documents. In this paper, to resolve the disadvantages of full-text indexing method and to use standard document format. indexing system for SGML document is designed and implemented. In this system, user can assign indexing domain on elements, thus the logical structure of document is reflected in retrieving information. Various retrieval methods can be implemented by using the structural information of the document. In addition, automatic indexing for SGML Hangul document is supported in this system

  • PDF

A Study on the Feasibility of Full-Text Information Retrieval System Based on Document Content Structure (문헌의 내용단위구조에 의한 전문검색시스템의 타당성 고찰)

  • Lee Byeong-Ki
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.32 no.1
    • /
    • pp.129-154
    • /
    • 1998
  • In these days the online full-text database are increasing, but conventional full-text information retrieval system has been proved with high recall ratio and low precision ratio. One of the disadvantages of full-text IR system is that it is not designed to reflect the user's information need it is due to the fact that full-text IR system has been designed based on physical and logical structure of document without considering the content of document. Therefore, the purpose of the study examined feasibility of document content structure in full-text IR system by resolving such disadvantages of conventional system. 180 Journal articles have been analyzed to find common structure of document content and finally general model of the structure of journal articles were developed. The result shows that have relation to between user's cogntive schema structure, user's information need and contents structure of document. Thus it is concluded that full-text IR system need to be designed by using document content structure in order to meet user's information need more effectively.

  • PDF

A Study of Document Ranking Algorithms in a P-norm Retrieval System (P-norm 검색의 문헌 순위화 기법에 관한 실험적 연구)

  • 고미영;정영미
    • Journal of the Korean Society for information Management
    • /
    • v.16 no.1
    • /
    • pp.7-30
    • /
    • 1999
  • This study is to develop effective document ranking algorithms in the P-norm retrieval system which can be implemented to the Boolean retrieval system without major difficulties by using non-statistical term weights based on document structure. Also, it is to enhance the performance by introducing the rank adjustment process which rearranges the ranks of retrieved documents according to the similarity between the top ranked documents and the rest of them. Of the non-statistical term weight algorithms, this study uses field weight and term pair distance weight. In the rank adjustment process, five retrieval experiments were performed, ranging between the case of using one record for the similarity measurement and the case of using first five records. It is proved that non-statistical term weights are highly effective and the rank adjustment process enhance the performance further.

  • PDF

Design and Implementation of OCR Correction Model for Numeric Digits based on a Context Sensitive and Multiple Streams (제한적 문맥 인식과 다중 스트림을 기반으로 한 숫자 정정 OCR 모델의 설계 및 구현)

  • Shin, Hyun-Kyung
    • The KIPS Transactions:PartD
    • /
    • v.18D no.1
    • /
    • pp.67-80
    • /
    • 2011
  • On an automated business document processing system maintaining financial data, errors on query based retrieval of numbers are critical to overall performance and usability of the system. Automatic spelling correction methods have been emerged and have played important role in development of information retrieval system. However scope of the methods was limited to the symbols, for example alphabetic letter strings, which can be reserved in the form of trainable templates or custom dictionary. On the other hand, numbers, a sequence of digits, are not the objects that can be reserved into a dictionary but a pure markov sequence. In this paper we proposed a new OCR model for spelling correction for numbers using the multiple streams and the context based correction on top of probabilistic information retrieval framework. We implemented the proposed error correction model as a sub-module and integrated into an existing automated invoice document processing system. We also presented the comparative test results that indicated significant enhancement of overall precision of the system by our model.

Medicine Ontology Building based on Semantic Relation and Its Application (의미관계 정보를 이용한 약품 온톨로지의 구축과 활용)

  • Lim Soo-Yeon;Park Seong-Bae;Lee Sang-Jo
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.5
    • /
    • pp.428-437
    • /
    • 2005
  • An ontology consists of a set and definition of concepts that represents the characteristics of a given domain and relationship between the elements. To reduce time-consuming and cost in building ontology, this paper proposes a semiautomatic method to build a domain ontology using the results of text analysis. To do this, we Propose a terminology processing method and use the extracted concepts and semantic relations between them to build ontology. An experiment domain is selected by the pharmacy field and the built ontology is applied to document retrieval. In order to represent usefulness for retrieving a document using the hierarchical relations in ontology, we compared a typical keyword based retrieval method with an ontology based retrieval method, which uses related information in an ontology for a related feedback. As a result, the latter shows the improvement of precision and recall by $4.97\%$ and $0.78\%$ respectively.

xPlaneb: 3-Dimensional Bitmap Index for Index Document Retrieval (xPlaneb: XML문서 검색을 위한 3차원 비트맵 인덱스)

  • 이재민;황병연
    • Journal of KIISE:Databases
    • /
    • v.31 no.3
    • /
    • pp.331-339
    • /
    • 2004
  • XML has got to be a new standard for data representation and exchanging by its many good points, and the core part of many new researches and emerging technologies. However, the self-describing characteristic, which is one of XML's good points, caused the spreading of XML documents with different structures, and so the need of the research for the effective XML-document search has been proposed. This paper is for the analysis of the problem in BitCube, which is a bitmap indexing that shows high performance grounded on its fast retrieval. In addition, to resolve the problem of BitCube, we did design and implement xPlaneb(XML Plane Web) which it a new 3-dimensional bitmap indexing made of linked lists. We propose an effective information retrieval technique by replacing BitCube operations with new ones and reconstructing 3-dimensional array index of BitCube with effective nodes. Performance evaluation shows that the proposed technique is better than BitCube, as the amount of document increases, in terms of memory consumptions and operation speed.

Implementing and Evaluating an Empirical Variable Retrieval System : The Entity-Relationship and Relational Approach (실험변수를 이용한 정보검색 시스템의 구축 및 평가 : 개체-관계 모델과 관계형 데이터베이스를 이용한 접근)

  • Oh Sam-Gyun
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.32 no.4
    • /
    • pp.53-67
    • /
    • 1998
  • This article investigates the potentialities of using empirical variables and their associated statistical relationships in document representation and retrieval. To this end, a newly devised empirical fact retrieval system was evaluated in comparison to a simulated traditional retrieval system involving a set of predetermined empirical queries. Results indicate that the EFRS generally outperformed the TRS in terms of the precision, search effort, and measures of user satisfaction.

  • PDF

Interactive Searching Behavior with Elements-Based on XML Documents Retrieval System (엘리먼트 기반 XML 검색 시스템에서의 이용자의 정보 탐색 행태 연구)

  • Jung, Young-Mi
    • Journal of Korean Library and Information Science Society
    • /
    • v.40 no.4
    • /
    • pp.159-176
    • /
    • 2009
  • The aim of this study was to investigate the users' behaviour when interacting with elements based on XML documents retrieval system and develop approaches for XML retrieval which are effective in user-based environment. We followed the experimental guidelines from the INEX 2006 iTrack organizers. For the research goals, 16 responses from the questionnaires per subject and system logs were collected and analyzed using Excel and SPSS 17.0.

  • PDF