• Title/Summary/Keyword: Large-scale indexing

Search Result 22, Processing Time 0.025 seconds

PDFindexer: Distributed PDF Indexing system using MapReduce

  • Murtazaev, JAziz;Kihm, Jang-Su;Oh, Sangyoon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.4 no.1
    • /
    • pp.13-17
    • /
    • 2012
  • Indexing allows converting raw document collection into easily searchable representation. Web searching by Google or Yahoo provides subsecond response time which is made possible by efficient indexing of web-pages over the entire Web. Indexing process gets challenging when the scale gets bigger. Parallel techniques, such as MapReduce framework can assist in efficient large-scale indexing process. In this paper we propose PDFindexer, system for indexing scientific papers in PDF using MapReduce programming model. Unlike Web search engines, our target domain is scientific papers, which has pre-defined structure, such as title, abstract, sections, references. Our proposed system enables parsing scientific papers in PDF recreating their structure and performing efficient distributed indexing with MapReduce framework in a cluster of nodes. We provide the overview of the system, their components and interactions among them. We discuss some issues related with the design of the system and usage of MapReduce in parsing and indexing of large document collection.

Implementation of Tile Searching and Indexing Management Algorithms for Mobile GIS Performance Enhancement

  • Lee, Kang-Won;Choi, Jin-Young
    • Journal of Internet of Things and Convergence
    • /
    • v.1 no.1
    • /
    • pp.11-19
    • /
    • 2015
  • The mobile and ubiquitous environment is experiencing a rapid development of information and communications technology as it provides an ever increasing flow of information. Particularly, GIS is now widely applied in daily life due to its high accuracy and functionality. GIS information is utilized through the tiling method, which divides and manages large-scale map information. The tiling method manages map information and additional information to allow overlay, so as to facilitate quick access to tiled data. Unlike past studies, this paper proposes a new architecture and algorithms for tile searching and indexing management to optimize map information and additional information for GIS mobile applications. Since this involves the processing of large-scale information and continuous information changes, information is clustered for rapid processing. In addition, data size is minimized to overcome the constrained performance associated with mobile devices. Our system has been implemented in actual services, leading to a twofold increase in performance in terms of processing speed and mobile bandwidth.

A Distributed High Dimensional Indexing Structure for Content-based Retrieval of Large Scale Data (대용량 데이터의 내용 기반 검색을 위한 분산 고차원 색인 구조)

  • Cho, Hyun-Hwa;Lee, Mi-Young;Kim, Young-Chang;Chang, Jae-Woo;Lee, Kyu-Chul
    • Journal of KIISE:Databases
    • /
    • v.37 no.5
    • /
    • pp.228-237
    • /
    • 2010
  • Although conventional index structures provide various nearest-neighbor search algorithms for high-dimensional data, there are additional requirements to increase search performances as well as to support index scalability for large scale data. To support these requirements, we propose a distributed high-dimensional indexing structure based on cluster systems, called a Distributed Vector Approximation-tree (DVA-tree), which is a two-level structure consisting of a hybrid spill-tree and VA-files. We also describe the algorithms used for constructing the DVA-tree over multiple machines and performing distributed k-nearest neighbors (NN) searches. To evaluate the performance of the DVA-tree, we conduct an experimental study using both real and synthetic datasets. The results show that our proposed method contributes to significant performance advantages over existing index structures on difference kinds of datasets.

Direct Stem Blot Immunoassay (DSBIA): A Rapid, Reliable and Economical Detection Technique Suitable for Testing Large Number of Barley Materials for Field Monitoring and Resistance Screening to Barley mild mosaic virus and Barley yellow mosaic virus

  • Jonson, Gilda;Park, Jong-Chul;Kim, Yang-Kil;Kim, Mi-Jung;Lee, Mi-Ja;Hyun, Jong-Nae;Kim, Jung-Gon
    • The Plant Pathology Journal
    • /
    • v.23 no.4
    • /
    • pp.260-265
    • /
    • 2007
  • Testing a large number of samples from field monitoring and routine indexing is cumbersome and the available virus detection tools were labor intensive and expensive. To circumvent these problems we established tissue blot immunoassay (TBIA) method an alternative detection tool to detect Barley mild mosaic virus (BaMMV) and Barley yellow mosaic virus (BaYMV) infection in the field and greenhouse inoculated plants for monitoring and routine indexing applications, respectively. Initially, leaf and stem were tested to determine suitable plant tissue for direct blotting on nitrocellulose membrane. The dilutions of antibodies were optimized for more efficient and economical purposes. Results showed that stem tissue was more suitable for direct blotting for it had no background that interferes in the reaction. Therefore, this technique was referred as direct stem blot immunoassay or DSBIA, in this study. Re-used diluted (1:1000) antiserum and conjugate up to 3 times with the addition of half strength amount of concentrated antibodies was more effective in detecting the virus. The virus blotted on the nitrocellulose membrane from stem tissues kept at room temperature for 3 days were still detectable. The efficiency of DSBIA and RT-PCR in detecting BaMMV and BaYMV were relatively comparable. Results further proved that DSBIA is a rapid, reliable and economical detection method suitable for monitoring BaMMV and BaYMV infection in the field and practical method in indexing large scale of barley materials for virus resistance screening.

Development of the Spatial Indexing Method for the Effective Visualization of BIM data based on GIS (GIS 기반 BIM 데이터의 효과적 가시화를 위한 공간인덱싱 기법 개발)

  • Kim, Ji-Eun;Kang, Tae-Wook;Hong, Chang-Hee
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.15 no.8
    • /
    • pp.5333-5341
    • /
    • 2014
  • Recently, with the increasing interest in facility management based on indoor spatial information, various studies have been attempted to manage facility conversion between BIM and GIS. Visualization of the geometry data for a large-scale is one of the major issues to the maintenance system. Therefore, this study designed the spatial indexing algorithm through an IFC schema-based scenario for the effective visualization of BIM data based on GIS. A part of the algorithm was developed implementing the OcTree structure and this research has a test for the developed output with IFC sample data. Ultimately, we propose the spatial indexing method for the effective visualization of BIM data based on GIS.

VotingRank: A Case Study of e-Commerce Recommender Application Using MapReduce

  • Ren, Jian-Ji;Lee, Jae-Kee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.04a
    • /
    • pp.834-837
    • /
    • 2009
  • There is a growing need for ad-hoc analysis of extremely large data sets, especially at e-Commerce companies which depend on recommender application. Nowadays, as the number of e-Commerce web pages grow to a tremendous proportion; vertical recommender services can help customers to find what they need. Recommender application is one of the reasons for e-Commerce success in today's world. Compared with general e-Commerce recommender application, obviously, general e-Commerce recommender application's processing scope is greatly narrowed down. MapReduce is emerging as an important programming model for large-scale data-parallel applications such as web indexing, data mining, and scientific simulation. The objective of this paper is to explore MapReduce framework for the e-Commerce recommender application on major general and dedicated link analysis for e-Commerce recommender application, and thus the responding time has been decreased and the recommender application's accuracy has been improved.

A Semantic Service Discovery Network for Large-Scale Ubiquitous Computing Environments

  • Kang, Sae-Hoon;Kim, Dae-Woong;Lee, Young-Hee;Hyun, Soon-J.;Lee, Dong-Man;Lee, Ben
    • ETRI Journal
    • /
    • v.29 no.5
    • /
    • pp.545-558
    • /
    • 2007
  • This paper presents an efficient semantic service discovery scheme called UbiSearch for a large-scale ubiquitous computing environment. A semantic service discovery network in the semantic vector space is proposed where services that are semantically close to each other are mapped to nearby positions so that the similar services are registered in a cluster of resolvers. Using this mapping technique, the search space for a query is efficiently confined within a minimized cluster region while maintaining high accuracy in comparison to the centralized scheme. The proposed semantic service discovery network provides a number of novel features to evenly distribute service indexes to the resolvers and reduce the number of resolvers to visit. Our simulation study shows that UbiSearch provides good semantic searchability as compared to the centralized indexing system. At the same time, it supports scalable semantic queries with low communication overhead, balanced load distribution among resolvers for service registration and query processing, and personalized semantic matching.

  • PDF

NVST DATA ARCHIVING SYSTEM BASED ON FASTBIT NOSQL DATABASE

  • Liu, Ying-Bo;Wang, Feng;Ji, Kai-Fan;Deng, Hui;Dai, Wei;Liang, Bo
    • Journal of The Korean Astronomical Society
    • /
    • v.47 no.3
    • /
    • pp.115-122
    • /
    • 2014
  • The New Vacuum Solar Telescope (NVST) is a 1-meter vacuum solar telescope that aims to observe the fine structures of active regions on the Sun. The main tasks of the NVST are high resolution imaging and spectral observations, including the measurements of the solar magnetic field. The NVST has been collecting more than 20 million FITS files since it began routine observations in 2012 and produces maximum observational records of 120 thousand files in a day. Given the large amount of files, the effective archiving and retrieval of files becomes a critical and urgent problem. In this study, we implement a new data archiving system for the NVST based on the Fastbit Not Only Structured Query Language (NoSQL) database. Comparing to the relational database (i.e., MySQL; My Structured Query Language), the Fastbit database manifests distinctive advantages on indexing and querying performance. In a large scale database of 40 million records, the multi-field combined query response time of Fastbit database is about 15 times faster and fully meets the requirements of the NVST. Our slestudy brings a new idea for massive astronomical data archiving and would contribute to the design of data management systems for other astronomical telescopes.

Design and Implementation of Trajectory Preservation Indices for Location Based Query Processing (위치 기반 질의 처리를 위한 궤적 보존 색인의 설계 및 구현)

  • Lim, Duk-Sung;Hong, Bong-Hee
    • Journal of Korea Spatial Information System Society
    • /
    • v.10 no.3
    • /
    • pp.67-78
    • /
    • 2008
  • With the rapid development of wireless communication and mobile equipment, many applications for location-based services have been emerging. Moving objects such as vehicles and ships change their positions over time. Moving objects have their moving path, called the trajectory, because they move continuously. To monitor the trajectory of moving objects in a large scale database system, an efficient Indexing scheme to processed queries related to trajectories is required. In this paper, we focus on the issues of minimizing the dead space of index structures. The Minimum Bounding Boxes (MBBs) of non-leaf nodes in trajectory-preserving indexing schemes have large amounts of dead space since trajectory preservation is achieved at the sacrifice of the spatial locality of trajectories. In this thesis, we propose entry relocating techniques to reduce dead space and overlaps in non-leaf nodes. we present performance studies that compare the proposed index schemes with the TB-tree and the R*-tree under a varying set of spatio-temporal queries.

  • PDF

A Study of Developing Variable-Scale Maps for Management of Efficient Road Network (효율적인 네트워크 데이터 관리를 위한 가변-축척 지도 제작 방안)

  • Joo, Yong Jin
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.21 no.4
    • /
    • pp.143-150
    • /
    • 2013
  • The purpose of this study is to suggest the methodology to develop variable-scale network model, which is able to induce large-scale road network in detailed level corresponding to small-scale linear objects with various abstraction in higher level. For this purpose, the definition of terms, the benefits and the specific procedures related with a variable-scale model were examined. Second, representation level and the components of layer to design the variable-scale map were presented. In addition, rule-based data generating method and indexing structure for higher LoD were defined. Finally, the implementation and verification of the model were performed to road network in study area (Jeju -do) so that the proposed algorithm can be practical. That is, generated variable scale road network were saved and managed in spatial database (Oracle Spatial) and performance analysis were carried out for the effectiveness and feasibility of the model.