• 제목/요약/키워드: Indexing System

검색결과 463건 처리시간 0.024초

PDFindexer: Distributed PDF Indexing system using MapReduce

  • Murtazaev, JAziz;Kihm, Jang-Su;Oh, Sangyoon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제4권1호
    • /
    • pp.13-17
    • /
    • 2012
  • Indexing allows converting raw document collection into easily searchable representation. Web searching by Google or Yahoo provides subsecond response time which is made possible by efficient indexing of web-pages over the entire Web. Indexing process gets challenging when the scale gets bigger. Parallel techniques, such as MapReduce framework can assist in efficient large-scale indexing process. In this paper we propose PDFindexer, system for indexing scientific papers in PDF using MapReduce programming model. Unlike Web search engines, our target domain is scientific papers, which has pre-defined structure, such as title, abstract, sections, references. Our proposed system enables parsing scientific papers in PDF recreating their structure and performing efficient distributed indexing with MapReduce framework in a cluster of nodes. We provide the overview of the system, their components and interactions among them. We discuss some issues related with the design of the system and usage of MapReduce in parsing and indexing of large document collection.

2차 법률정보 전문데이터베이스에 있어서 통제어 색인시스템과 자연어 색인시스템의 검색효율 평가에 관한 연구 (A Study on the Indexing System Using a Controlled Vocabulary and Natural Language in the Secondary Legal Information Full-Text Databases : an Evaluation and Comparison of Retrieval Effectiveness)

  • 노정란
    • 한국문헌정보학회지
    • /
    • 제32권4호
    • /
    • pp.69-86
    • /
    • 1998
  • 본 연구는 2차 법률정보 전문 데이터베이스 구축을 위한 기초연구(권기원, 노정란, 1998, 한국문헌정보학회지, 32(3))에서 밝혀진 법률정보의 특성을 근거로 알고리즘을 개발하고 알고리즘에 의한 모형 통제어 데이터베이스를 구축하여 통제어 색인 시스템과 자연어 색인 시스템의 검색효율을 비교 평가한 것이다. 연구 결과 2차 법률 정보 전문 데이터베이스에서 통제어 색인 시스템은 재현을, 정확률, 자연어 시스템이 검색하지 못한 고유한 적합 문헌을 검색하는 능력에 있어서 자연어 색인시스템보다 높은 효율을 나타내었다. 또한 일반적으로 가중치를 부여하거나 접근점을 추가할 경우 데이터베이스의 정확률이나 재현율의 향상을 가져올 수 있다고 보고 있으나, 2차 법률정보 전문 데이터베이스에서는 법률정보라는 특정 지식 분야의 특성으로 인하여 가중치를 부여하거나 접근점을 추가한 경우에도 재현율과 정확률의 향상을 나타내지 않는다는 사실이 맞혀졌다. 그러므로 정보시스템 설계자는 시스템을 단순히 언어학적, 통계학적 방법으로 접근하기보다는 정보전문가와 주제전문가가 인식하고 있는 각 주제분야의 고유 지식을 시스템에 내장시키는 것이 필요하다고 할 수 있다.

  • PDF

신문기사(新聞記事) 자동색인(自動索引)에 관한 고찰(考察) (A Study on Automatic Indexing System for Newspaper Articles)

  • 조선희
    • 정보관리연구
    • /
    • 제23권3호
    • /
    • pp.19-44
    • /
    • 1992
  • 최근 국내(國內) 대부분의 신문사(新聞社)에서 CTS시스템을 도입함에 따라 기사전문(記事全文)이 컴퓨터에 입력되는 장점을 고려한 자동색인(自動索引) 시스템의 필요성이 대두되고 있다. 본 연구에서는 선행연구(先行硏究)와 국내외(國內外) 사례(事例)들을 통해 신문기사 자동색인 시스템의 문제점(間題點)과 앞으로의 전망(展望)을 고찰하였다.

  • PDF

멀티미디어 데이터 스트림을 위한 파일 시스템의 설계 및 구현 (A New File System for Multimedia Data Stream)

  • 이민석;송진석
    • 대한임베디드공학회논문지
    • /
    • 제1권2호
    • /
    • pp.90-103
    • /
    • 2006
  • There are many file systems in various operating systems. Those are usually designed for server environments, where the common cases are usually 'multiple active users', 'great many small files' And they assume a big main memory to be used as buffer cache. So the existing file systems are not suitable for resource hungry embedded systems that process multimedia data streams. In this study, we designed and implemented a new file system which efficiently stores and retrieves multimedia data steams. The proposed file system has a very simple disk layout, which guarantees a quick disk initialization and file system recovery. And we introduced a new indexing-scheme, called the time-based indexing scheme, with the file system. With the indexing scheme, the file system maintains the relation between time and the location for all the multimedia streams. The scheme is useful in searching and playing the compressed multimedia streams by locating exact frame position with given time, resulting in reduction of CPU processing and power consumption. The proposed file system and its APIs utilizing the time-based indexing schemes were implemented firstly on a Linux environment, though it is operating system independent. In the performance evaluation on a real DVR system, which measured the execution time of multi-threaded reading and writing, we found the proposed file system is maximum 38.7% faster than EXT2 file system.

  • PDF

MBR을 이용한 실용적 공간 데이터 관리 (A Practical Approach to Spatial Object Indexing Using Minimum Bounding Rectangles)

  • 이재호
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 1999년도 가을 학술발표논문집 Vol.26 No.2 (1)
    • /
    • pp.177-179
    • /
    • 1999
  • We present a simple and efficient spatial object indexing scheme based on the minimum bounding rectangles (MBR) of the objects for use in applications in geographic information system (GIS). We also provide the rationale behind the simple indexing scheme instead of other complex hierarchical indexing approaches such as the R-tree and its variants.

  • PDF

주제색인의 이론과 실제 (Theory and practice of alphabetical subject indexing)

  • 윤구호
    • 한국도서관정보학회지
    • /
    • 제10권
    • /
    • pp.95-131
    • /
    • 1983
  • Index is a systematic guide to items contained in, or concepts derived from, a collection, Thus, it is represented as a paired set of index terms (t) and documents (D) : I= {(t,D) vertical bar t .mem. V, D .mem. W), where V is index vocabulary and W is document collection. Indexing is the process of analysing the informational content of records of knowledge and expressing the informational content in the language of the indexing system. It involves: 1) Selecting indexable concepts in a document; and 2) expressing these concepts in the language of the indexing system (as index entries): and an ordered list. Indexing process involves technical, semantic and syntactic problems. Technical problems are related to the accuracy of indexing, which is primarily governed by the indexer's ability of analysing subject, identifying indexable concepts, and coding. The proper levels of indexing exhaustivity, and index language specificity are also significant factors affecting the quality of index. Semantic problems are related to the choice of index terms and the form in which they should be used. Equivalent, hierarchical and affinitive/associative relationships of index terms are involved. Syntactic problems are largely related to the coordination of index terms. This process of coordination arises from the need to be able to search for the intersection of two or more classes defined by terms denoting distinct concepts. Finally, most valuable aspects of alphabetical subject indexing theories and practices are derived from those of Cutter, Kaiser, Ranganathan, Coates, Lynch and Austin, and discussed in details.

  • PDF

Retrieval of Broadcast News Using Audio Content Analysis

  • Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • 제26권3E호
    • /
    • pp.74-79
    • /
    • 2007
  • In this paper, we report our recent work on a indexing and retrieval system of broadcast news using audio content analysis. Key issues addressed in this work are two major parts of the audio indexing system: anchorperson detection based on audio segmentation, and phone-based spoken document retrieval, developed in the framework of the emerging MPEG-7 standard. Experiments are conducted on a database of Britisch broadcast news videos. We discuss the development of the retrieval system, and the evaluation of each part and the retrieval system.

볼스크류 전구간 피치오차 측정시스템 (Precision Measurement System forBall Screw Pitch Error)

  • 박희재;김인기
    • 한국정밀공학회:학술대회논문집
    • /
    • 한국정밀공학회 1993년도 추계학술대회 논문집
    • /
    • pp.279-285
    • /
    • 1993
  • This paper presents a precision automatic measuring system for ball screw Pitch. Ball screw is mounted on a precision indexing table, and the ball screw pitch is measured via magnetic scale, where the indexing and measurement are performed by a PC. For precision indexing of ball screw, direct driven motor is coupled to the designed dead and live centers; the performance of the centers are assessed with a precision master sylinder,such as radial motion,tilt motion, and axial motions. An error compensation model is constructed for the measurement system of ball screw pitch, where the error motions of indexing system as well as the scale measurement system are combined to give the measurement error for the ball screw. The developed system proposes an automated precision measurement system for manufacturers and users of ball screw.

  • PDF

전기유압식 서보인덱싱 시스템의 PWM 제어에 관한 연구 (A Study on PWM Control of an Electro-Hydraulic Servo Indexing System)

  • 허준영
    • Journal of Advanced Marine Engineering and Technology
    • /
    • 제23권2호
    • /
    • pp.236-243
    • /
    • 1999
  • This study deals with the application of high speed on-off valves to an electro-hydraulic servo indexing system incorporated electro-hydraulic servo valces. Comparing with the electro-hydraulic servo valve the high speed on-off valve has some merits. Which included low price robustness to the oil contamination and dircect control without D/A converter. The considered sys-tem of this study is controlled by pulse width modulation(PWM) of the control law which is pro-duced by a PID controller which is used broadly in industrial equipments. The dynamic character-istics corresponding to variations of system parameters such as inertia moment system gain and supply pressure are investigated by computer simulation and experiment. Consequently the availability of the application of high speed on-off valve to servo indexing system instead of electro-hydraulic servo valve is confirmed.

  • PDF

A Personal Videocasting System with Intelligent TV Browsing for a Practical Video Application Environment

  • Kim, Sang-Kyun;Jeong, Jin-Guk;Kim, Hyoung-Gook;Chung, Min-Gyo
    • ETRI Journal
    • /
    • 제31권1호
    • /
    • pp.10-20
    • /
    • 2009
  • In this paper, a video broadcasting system between a home-server-type device and a mobile device is proposed. The home-server-type device can automatically extract semantic information from video contents, such as news, a soccer match, and a baseball game. The indexing results are utilized to convert the original video contents to a digested or arranged format. From the mobile device, a user can make recording requests to the home-server-type devices and can then watch and navigate recorded video contents in a digested form. The novelty of this study is the actual implementation of the proposed system by combining the actual IT environment that is available with indexing algorithms. The implementation of the system is demonstrated along with experimental results of the automatic video indexing algorithms. The overall performance of the developed system is compared with existing state-of-the-art personal video recording products.

  • PDF