• Title/Summary/Keyword: Range Query Processing

Search Result 106, Processing Time 0.024 seconds

Acceleration of Range Query in R-tree Using GPU Parallel Processing (GPU를 이용한 R-tree의 질의처리 병렬화)

  • Kim, Min-Cheol;Choi, Won-Ik
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2011.06c
    • /
    • pp.37-40
    • /
    • 2011
  • 계층적 색인 구조는 대용량의 다차원 데이터에 대한 범위질의를 가장 효율적으로 처리하는 색인 구조이다. 계층적 색인 구조에서 범위질의의 속도를 향상시키기 위해서 색인 구조의 구성 시 발생하는 인접노드간의 겹치는 영역을 줄이는 기법들과 다량의 데이터를 한 번에 읽어 상향식 방식으로 색인 구조의 공간 활용도를 증가시키는 벌크 로딩 기법들이 제안되었다. 하지만 CPU기반에서 개별의 노드들을 순차적으로 질의처리 하는 계층적 색인 구조는 공간 활용도의 증가와 노드 간의 중첩 영역을 줄이는 것만으로는 질의 처리 성능 향상에 한계가 있다. 따라서 본 논문에서는 기존의 CPU기반 계층적 색인 구조 중의 대표적인 예인 R-tree의 저장 구조를 GPU 메모리에 적합하도록 변경을 하였다. 또한 기존 CPU기반 계층적 색인 구조의 순차적인 노드 검색을 GPU를 이용해 병렬적으로 노드를 검사하여 성능을 향상시켰다. 이와 같은 방식으로 질의 영역의 크기에 따라서 성능 향상정도가 다르지만 최대 100배 이상의 성능을 향상시켰다.

A Benchmark Test of Spatial Big Data Processing Tools and a MapReduce Application

  • Nguyen, Minh Hieu;Ju, Sungha;Ma, Jong Won;Heo, Joon
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.35 no.5
    • /
    • pp.405-414
    • /
    • 2017
  • Spatial data processing often poses challenges due to the unique characteristics of spatial data and this becomes more complex in spatial big data processing. Some tools have been developed and provided to users; however, they are not common for a regular user. This paper presents a benchmark test between two notable tools of spatial big data processing: GIS Tools for Hadoop and SpatialHadoop. At the same time, a MapReduce application is introduced to be used as a baseline to evaluate the effectiveness of two tools and to derive the impact of number of maps/reduces on the performance. By using these tools and New York taxi trajectory data, we perform a spatial data processing related to filtering the drop-off locations within Manhattan area. Thereby, the performance of these tools is observed with respect to increasing of data size and changing number of worker nodes. The results of this study are as follows 1) GIS Tools for Hadoop automatically creates a Quadtree index in each spatial processing. Therefore, the performance is improved significantly. However, users should be familiar with Java to handle this tool conveniently. 2) SpatialHadoop does not automatically create a spatial index for the data. As a result, its performance is much lower than GIS Tool for Hadoop on a same spatial processing. However, SpatialHadoop achieved the best result in terms of performing a range query. 3) The performance of our MapReduce application has increased four times after changing the number of reduces from 1 to 12.

Development of Meteorologic Data Retrieval Program for Vulnerability Assessment to Natural Hazards (재해 취약성 평가를 위한 기상자료 처리 프로그램 MetSystem 개발)

  • Jang, Min-Won;Kim, Sang-Min
    • Journal of Korean Society of Rural Planning
    • /
    • v.19 no.4
    • /
    • pp.47-54
    • /
    • 2013
  • Climate change is the most direct threatening factors in sustaining agricultural productivity. It is necessary to reduce the damages from the natural hazards such as flood, drought, typhoons, and snowstorms caused by climate change. Through the vulnerability assessment to adapt the climate change, it is possible to analyze the priority, feasibility, effect of the reduction policy. For the vulnerability assessment, broad amount of weather data for each meterological station are required. Making the database management system for the meteorologic data could troubleshoot of the difficulties lie in handling and processing the weather data. In this study, we generated the meteorologic data retrieval system (MetSystem) for climate change vulnerability assessment. The user interface of MetSystem was implemented in the web-browser so as to access to a database server at any time and place, and it provides different query executions according to the criteria of meteorologic stations, temporal range, meteorologic items, statistics, and range of values, as well as the function of exporting to Excel format (*.xls). The developed system is expected that it will make it easier to try different analyses of vulnerability to natural hazards by the simple access to meteorologic database and the extensive search functions.

A Comparative Analysis of Music Similarity Measures in Music Information Retrieval Systems

  • Gurjar, Kuldeep;Moon, Yang-Sae
    • Journal of Information Processing Systems
    • /
    • v.14 no.1
    • /
    • pp.32-55
    • /
    • 2018
  • The digitization of music has seen a considerable increase in audience size from a few localized listeners to a wider range of global listeners. At the same time, the digitization brings the challenge of smoothly retrieving music from large databases. To deal with this challenge, many systems which support the smooth retrieval of musical data have been developed. At the computational level, a query music piece is compared with the rest of the music pieces in the database. These systems, music information retrieval (MIR systems), work for various applications such as general music retrieval, plagiarism detection, music recommendation, and musicology. This paper mainly addresses two parts of the MIR research area. First, it presents a general overview of MIR, which will examine the history of MIR, the functionality of MIR, application areas of MIR, and the components of MIR. Second, we will investigate music similarity measurement methods, where we provide a comparative analysis of state of the art methods. The scope of this paper focuses on comparative analysis of the accuracy and efficiency of a few key MIR systems. These analyses help in understanding the current and future challenges associated with the field of MIR systems and music similarity measures.

Performance Analysis on Declustering High-Dimensional Data by GRID Partitioning (그리드 분할에 의한 다차원 데이터 디클러스터링 성능 분석)

  • Kim, Hak-Cheol;Kim, Tae-Wan;Li, Ki-Joune
    • The KIPS Transactions:PartD
    • /
    • v.11D no.5
    • /
    • pp.1011-1020
    • /
    • 2004
  • A lot of work has been done to improve the I/O performance of such a system that store and manage a massive amount of data by distributing them across multiple disks and access them in parallel. Most of the previous work has focused on an efficient mapping from a grid ceil, which is determined bY the interval number of each dimension, to a disk number on the assumption that each dimension is split into disjoint intervals such that entire data space is GRID-like partitioned. However, they have ignored the effects of a GRID partitioning scheme on declustering performance. In this paper, we enhance the performance of mapping function based declustering algorithms by applying a good GRID par-titioning method. For this, we propose an estimation model to count the number of grid cells intersected by a range query and apply a GRID partitioning scheme which minimizes query result size among the possible schemes. While it is common to do binary partition for high-dimensional data, we choose less number of dimensions than needed for binary partition and split several times along that dimensions so that we can reduce the number of grid cells touched by a query. Several experimental results show that the proposed estimation model gives accuracy within 0.5% error ratio regardless of query size and dimension. We can also improve the performance of declustering algorithm based on mapping function, called Kronecker Sequence, which has been known to be the best among the mapping functions for high-dimensional data, up to 23 times by applying an efficient GRID partitioning scheme.

Spatial Selectivity Estimation using Cumulative Wavelet Histograms (누적밀도 웨이블릿 히스토그램을 이용한 공간 선택율 추정)

  • Chi, Jeong-Hee;Jeong, Jae-Hyuk;Ryu, Keun-Ho
    • Journal of KIISE:Databases
    • /
    • v.32 no.5
    • /
    • pp.547-557
    • /
    • 2005
  • The purpose of selectivity estimation is to maintain the summary data in a very small memory space and to minimize the error of estimated value and query result. In case of estimating selectivity for large spatial data, the existing works need summary information which reflect spatial data distribution well to get the exact result for query. In order to get such summary information, they require a much memory space. Therefore In this paper, we propose a new technique cumulative density wavelet Histogram, called CDW Histogram, which gets a high accurate selectivity in small memory space. The proposed method is to utilize the sub-histograms created by CD histogram. The each sub-histograms are used to generate the wavelet summary information by applying the wavelet transform. This fact gives us good selectivity even if the memory sire is very small. The experimental results show that the proposed method simultaneously takes full advantage of their strong points - gets a good selectivity using the previous histogram in ($25\%\~50\%$) memory space and is superior to the existing selectivity estimation techniques. The proposed technique can be used to accurately quantify the selectivity of the spatial range query in databases which have very restrictive memory.

An Efficient Data Centric Storage Scheme with Non-uniformed Density of Wireless Sensor Networks (센서의 불균일한 배포밀도를 고려한 효율적인 데이터 중심 저장기법)

  • Seong, dong-ook;Lee, seok-jae;Song, seok-il;Yoo, jae-soo
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2007.11a
    • /
    • pp.135-139
    • /
    • 2007
  • Recently Data Centric Storage (DCS) schemes are variously studied for several applications (e.g. natural environment investigation, military application systems and environmental changes monitoring). In DCS scheme, data is stored at nodes within the network by name. There are several drawbacks in the existing schemes. The first is the inefficiency of the range query processing on not considered the locality of store point. the second is the non-homogeneity of store load of each sensors in case of the sensor distribution density is non-uniformed. In this paper, we propose a novel data centric storage scheme with the sensor distribution density which satisfied with the locality of data store location. This scheme divides whole sensor network area using grid and distributes the density bit map witch consist of the sensor density information of each cell. sensors use the density bit map for storing and searching the data. We evaluate our scheme with existing schemes. As a result, we show improved load balancing and more efficient range query processing than existing schemes in environment which sensors are distributed non-uniform.

  • PDF

Efficient Processing method of OLAP Range-Sum Queries in a dynamic warehouse environment (다이나믹 데이터 웨어하우스 환경에서 OLAP 영역-합 질의의 효율적인 처리 방법)

  • Chun, Seok-Ju;Lee, Ju-Hong
    • The KIPS Transactions:PartD
    • /
    • v.10D no.3
    • /
    • pp.427-438
    • /
    • 2003
  • In a data warehouse, users typically search for trends, patterns, or unusual data behaviors by issuing queries interactively. The OLAP range-sum query is widely used in finding trends and in discovering relationships among attributes in the data warehouse. In a recent environment of enterprises, data elements in a data cube are frequently changed. The problem is that the cost of updating a prefix sum cube is very high. In this paper, we propose a novel algorithm which reduces the update cost significantly by an index structure called the Δ-tree. Also, we propose a hybrid method to provide either approximate or precise results to reduce the overall cost of queries. It is highly beneficial for various applications that need quick approximate answers rather than time consuming accurate ones, such as decision support systems. An extensive experiment shows that our method performs very efficiently on diverse dimensionalities, compared to other methods.

Policies of Trajectory Clustering in Index based on R-trees for Moving Objects (이동체를 위한 R-트리 기반 색인에서의 궤적 클러스터링 정책)

  • Ban ChaeHoon;Kim JinGon;Jun BongGi;Hong BongHee
    • The KIPS Transactions:PartD
    • /
    • v.12D no.4 s.100
    • /
    • pp.507-520
    • /
    • 2005
  • The R-trees are usually used for an index of trajectories in moving-objects databases. However, they need to access a number of nodes to trace same trajectories because of considering only a spatial proximity. Overlaps and dead spaces should be minimized to enhance the performance of range queries in moving-objects indexes. Trajectories of moving-objects should be preserved to enhance the performance of the trajectory queries. In this paper, we propose the TP3DR-tree(Trajectory Preserved 3DR-tree) using clusters of trajectories for range and trajectory queries. The TP3DR-tree uses two split policies: one is a spatial splitting that splits the same trajectory by clustering and the other is a time splitting that increases space utilization. In addition, we use connecting information in non-leaf nodes to enhance the performance of combined-queries. Our experiments show that the new index outperforms the others in processing queries on various datasets.

Data Dissemination Method for Efficient Contents Search in Mobile P2P Networks (모바일 P2P 네트워크에서 효율적인 콘텐츠 검색을 위한 데이터 배포 기법)

  • Bok, Kyoung-Soo;Cho, Mi-Rim;Yoo, Jae-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.8
    • /
    • pp.37-46
    • /
    • 2012
  • In the existing data dissemination methods for mobile P2P networks, the search performance of content that matches the peer profile is very excellent. However, in the search for content that does not match the their profile, additional consideration about case that contents does not match the profile is needed because costs for the query processing will be incurred. To solve these problems, we propose a new data dissemination method for efficient contents search in mobile P2P networks. In the proposed method, peers determine whether they experienced communications by using a timestamp message and then perform data dissemination. We also propose a ranking algorithm to efficiently store dissemination data in a limited memory. The proposed ranking method can reduce query messages by considering the profile matches, the distribution range, and the connectivity to the data distribution peer.