• Title/Summary/Keyword: Indexing System

Search Result 464, Processing Time 0.027 seconds

PDFindexer: Distributed PDF Indexing system using MapReduce

  • Murtazaev, JAziz;Kihm, Jang-Su;Oh, Sangyoon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.4 no.1
    • /
    • pp.13-17
    • /
    • 2012
  • Indexing allows converting raw document collection into easily searchable representation. Web searching by Google or Yahoo provides subsecond response time which is made possible by efficient indexing of web-pages over the entire Web. Indexing process gets challenging when the scale gets bigger. Parallel techniques, such as MapReduce framework can assist in efficient large-scale indexing process. In this paper we propose PDFindexer, system for indexing scientific papers in PDF using MapReduce programming model. Unlike Web search engines, our target domain is scientific papers, which has pre-defined structure, such as title, abstract, sections, references. Our proposed system enables parsing scientific papers in PDF recreating their structure and performing efficient distributed indexing with MapReduce framework in a cluster of nodes. We provide the overview of the system, their components and interactions among them. We discuss some issues related with the design of the system and usage of MapReduce in parsing and indexing of large document collection.

A Study on the Indexing System Using a Controlled Vocabulary and Natural Language in the Secondary Legal Information Full-Text Databases : an Evaluation and Comparison of Retrieval Effectiveness (2차 법률정보 전문데이터베이스에 있어서 통제어 색인시스템과 자연어 색인시스템의 검색효율 평가에 관한 연구)

  • Roh Jeong-Ran
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.32 no.4
    • /
    • pp.69-86
    • /
    • 1998
  • The purpose of velop the indexing algorithm of secondary legal information by the study of characteristics of legal information, to compare the indexing system using controlled vocabulary to the indexing system using natural language in the secondary legal information full-text databases, and to prove propriety and superiority of the indexing system using controlled vocabulary. The results are as follows; 1)The indexing system using controlled vocabulary in the secondary legal information full-text databases has more effectiveness than the indexing system using natural language, in the recall rate, the precision rate, the distribution of propriety, and the faculty of searching for the unique proper-records which the indexing system using natural language fans to find 2)The indexing system which adds more words to the controlled vocabulary in the secondary legal information full-text databases does not better effectiveness in the retail rate, the precision rate, comparing to the indexing system using controlled vocabulary. 3)The indexing system using word-added controlled vocabulary with an extra weight in the secondary legal information full-text databases does not better effectiveness in the recall rate, the precision rate, comparing to the indexing system using word-added controlled vocabulary without an extra weight. This study indicates that it is necessary to have characteristic information the information experts recognize - that is to say, experimental and inherent knowledge only human being can have built-in into the system rather than to approach the information system by the linguistic, statistic or structuralistic way, and it can be more essential and intelligent information system.

  • PDF

A Study on Automatic Indexing System for Newspaper Articles (신문기사(新聞記事) 자동색인(自動索引)에 관한 고찰(考察))

  • Cho, Sun-Hee
    • Journal of Information Management
    • /
    • v.23 no.3
    • /
    • pp.19-44
    • /
    • 1992
  • As most of the domestic newspaper companies are adopting CTS system, the need for automatic indexing system, which can transfer the full-text into a computer, is sharply expanding. In this research, I tried to analyse problems and prospects of the automatic indexing system through various examples and studies conducted by other analysts previously.

  • PDF

A New File System for Multimedia Data Stream (멀티미디어 데이터 스트림을 위한 파일 시스템의 설계 및 구현)

  • Lee, Minsuk;Song, Jin-Seok
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.1 no.2
    • /
    • pp.90-103
    • /
    • 2006
  • There are many file systems in various operating systems. Those are usually designed for server environments, where the common cases are usually 'multiple active users', 'great many small files' And they assume a big main memory to be used as buffer cache. So the existing file systems are not suitable for resource hungry embedded systems that process multimedia data streams. In this study, we designed and implemented a new file system which efficiently stores and retrieves multimedia data steams. The proposed file system has a very simple disk layout, which guarantees a quick disk initialization and file system recovery. And we introduced a new indexing-scheme, called the time-based indexing scheme, with the file system. With the indexing scheme, the file system maintains the relation between time and the location for all the multimedia streams. The scheme is useful in searching and playing the compressed multimedia streams by locating exact frame position with given time, resulting in reduction of CPU processing and power consumption. The proposed file system and its APIs utilizing the time-based indexing schemes were implemented firstly on a Linux environment, though it is operating system independent. In the performance evaluation on a real DVR system, which measured the execution time of multi-threaded reading and writing, we found the proposed file system is maximum 38.7% faster than EXT2 file system.

  • PDF

A Practical Approach to Spatial Object Indexing Using Minimum Bounding Rectangles (MBR을 이용한 실용적 공간 데이터 관리)

  • 이재호
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1999.10a
    • /
    • pp.177-179
    • /
    • 1999
  • We present a simple and efficient spatial object indexing scheme based on the minimum bounding rectangles (MBR) of the objects for use in applications in geographic information system (GIS). We also provide the rationale behind the simple indexing scheme instead of other complex hierarchical indexing approaches such as the R-tree and its variants.

  • PDF

Theory and practice of alphabetical subject indexing (주제색인의 이론과 실제)

  • 윤구호
    • Journal of Korean Library and Information Science Society
    • /
    • v.10
    • /
    • pp.95-131
    • /
    • 1983
  • Index is a systematic guide to items contained in, or concepts derived from, a collection, Thus, it is represented as a paired set of index terms (t) and documents (D) : I= {(t,D) vertical bar t .mem. V, D .mem. W), where V is index vocabulary and W is document collection. Indexing is the process of analysing the informational content of records of knowledge and expressing the informational content in the language of the indexing system. It involves: 1) Selecting indexable concepts in a document; and 2) expressing these concepts in the language of the indexing system (as index entries): and an ordered list. Indexing process involves technical, semantic and syntactic problems. Technical problems are related to the accuracy of indexing, which is primarily governed by the indexer's ability of analysing subject, identifying indexable concepts, and coding. The proper levels of indexing exhaustivity, and index language specificity are also significant factors affecting the quality of index. Semantic problems are related to the choice of index terms and the form in which they should be used. Equivalent, hierarchical and affinitive/associative relationships of index terms are involved. Syntactic problems are largely related to the coordination of index terms. This process of coordination arises from the need to be able to search for the intersection of two or more classes defined by terms denoting distinct concepts. Finally, most valuable aspects of alphabetical subject indexing theories and practices are derived from those of Cutter, Kaiser, Ranganathan, Coates, Lynch and Austin, and discussed in details.

  • PDF

Retrieval of Broadcast News Using Audio Content Analysis

  • Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.3E
    • /
    • pp.74-79
    • /
    • 2007
  • In this paper, we report our recent work on a indexing and retrieval system of broadcast news using audio content analysis. Key issues addressed in this work are two major parts of the audio indexing system: anchorperson detection based on audio segmentation, and phone-based spoken document retrieval, developed in the framework of the emerging MPEG-7 standard. Experiments are conducted on a database of Britisch broadcast news videos. We discuss the development of the retrieval system, and the evaluation of each part and the retrieval system.

Precision Measurement System forBall Screw Pitch Error (볼스크류 전구간 피치오차 측정시스템)

  • 박희재;김인기
    • Proceedings of the Korean Society of Precision Engineering Conference
    • /
    • 1993.10a
    • /
    • pp.279-285
    • /
    • 1993
  • This paper presents a precision automatic measuring system for ball screw Pitch. Ball screw is mounted on a precision indexing table, and the ball screw pitch is measured via magnetic scale, where the indexing and measurement are performed by a PC. For precision indexing of ball screw, direct driven motor is coupled to the designed dead and live centers; the performance of the centers are assessed with a precision master sylinder,such as radial motion,tilt motion, and axial motions. An error compensation model is constructed for the measurement system of ball screw pitch, where the error motions of indexing system as well as the scale measurement system are combined to give the measurement error for the ball screw. The developed system proposes an automated precision measurement system for manufacturers and users of ball screw.

  • PDF

A Study on PWM Control of an Electro-Hydraulic Servo Indexing System (전기유압식 서보인덱싱 시스템의 PWM 제어에 관한 연구)

  • 허준영
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.23 no.2
    • /
    • pp.236-243
    • /
    • 1999
  • This study deals with the application of high speed on-off valves to an electro-hydraulic servo indexing system incorporated electro-hydraulic servo valces. Comparing with the electro-hydraulic servo valve the high speed on-off valve has some merits. Which included low price robustness to the oil contamination and dircect control without D/A converter. The considered sys-tem of this study is controlled by pulse width modulation(PWM) of the control law which is pro-duced by a PID controller which is used broadly in industrial equipments. The dynamic character-istics corresponding to variations of system parameters such as inertia moment system gain and supply pressure are investigated by computer simulation and experiment. Consequently the availability of the application of high speed on-off valve to servo indexing system instead of electro-hydraulic servo valve is confirmed.

  • PDF

A Personal Videocasting System with Intelligent TV Browsing for a Practical Video Application Environment

  • Kim, Sang-Kyun;Jeong, Jin-Guk;Kim, Hyoung-Gook;Chung, Min-Gyo
    • ETRI Journal
    • /
    • v.31 no.1
    • /
    • pp.10-20
    • /
    • 2009
  • In this paper, a video broadcasting system between a home-server-type device and a mobile device is proposed. The home-server-type device can automatically extract semantic information from video contents, such as news, a soccer match, and a baseball game. The indexing results are utilized to convert the original video contents to a digested or arranged format. From the mobile device, a user can make recording requests to the home-server-type devices and can then watch and navigate recorded video contents in a digested form. The novelty of this study is the actual implementation of the proposed system by combining the actual IT environment that is available with indexing algorithms. The implementation of the system is demonstrated along with experimental results of the automatic video indexing algorithms. The overall performance of the developed system is compared with existing state-of-the-art personal video recording products.

  • PDF