• Title/Summary/Keyword: query execution

Search Result 94, Processing Time 0.02 seconds

Performance Evaluation of Hash Join Algorithm on Flash Memory SSDs (플래쉬 메모리 SSD 기반 해쉬 조인 알고리즘의 성능 평가)

  • Park, Jang-Woo;Park, Sang-Shin;Lee, Sang-Won;Park, Chan-Ik
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.11
    • /
    • pp.1031-1040
    • /
    • 2010
  • Hash join is one of the core algorithms in databases management systems. If a hash join cannot complete in one-pass because the available memory is insufficient (i.e., hash table overflow), however, it may incur a few sequential writes and excessive random reads. With harddisk as the tempoary storage for hash joins, the I/O time would be dominated by slow random reads in its probing phase. Meanwhile, flash memory based SSDs (flash SSDs) are becoming popular, and we will witness in the foreseeable future that flash SSDs replace harddisks in enterprise databases. In contrast to harddisk, flash SSD without any mechanical component has fast latency in random reads, and thus it can boost hash join performance. In this paper, we investigate several important and practical issues when flash SSD is used as tempoary storage for hash join. First, we reveal the va patterns of hash join in detail and explain why flash SSD can outperform harddisk by more than an order of magnitude. Second, we present and analyze the impact of cluster size (i.e., va unit in hash join) on performance. Finally, we emperically demonstrate that, while a commerical query optimizer is error-prone in predicting the execution time with harddisk as temporary storage, it can precisely estimate the execution time with flash SSD. In summary, we show that, when used as temporary storage for hash join, flash SSD will provide more reliable cost estimation as well as fast performance.

A Model-based Performance Study of the EPCglobal Network (모델 기반 EPCglobal 네트워크의 성능 분석)

  • Kang, Yong-Shin;Son, Kyung-Won;Lee, Yong-Han;Rhee, Jong-Tae
    • IE interfaces
    • /
    • v.24 no.2
    • /
    • pp.139-150
    • /
    • 2011
  • The EPCglobal Network is a computer network used to share product data among trading partners. It provides the supply chain with improved visibility and traceability by using Electronic Product Code (EPC), which is stored on an RFID tag. Although this network model is widely accepted as a global standard and the growth of EPCglobal-subscriber base is considerable, the EPC technology adoption process is still in its infancy. This is because some of the critical issues on this model still remain to be verified such as scalability, data management, security, privacy and the economic value of data sharing. In this paper, we focus on scalability issue among the challenges to overcome and we regard performance of the EPCglobal Network only as a track and trace query-processing cost in the network. We developed performance models consisting of three elements of the EPCglobal Network : Discovery Services (DS), EPC Information Services (EPCIS), Object Naming Services (ONS). Then we abstracted out the track and trace query execution model to evaluate performance of the overall EPCglobal Network. Finally using the proposed models, we carried out simulation analysis based on an RFID-based inbound logistics process of automobile parts. This work is an important step towards the EPC technology diffusion and provides guidelines for businesses looking to buy or build the EPCglobal Network-based systems.

Use of Graph Database for the Integration of Heterogeneous Biological Data

  • Yoon, Byoung-Ha;Kim, Seon-Kyu;Kim, Seon-Young
    • Genomics & Informatics
    • /
    • v.15 no.1
    • /
    • pp.19-27
    • /
    • 2017
  • Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data.

Selectivity Estimation Using Compressed Spatial Histogram (압축된 공간 히스토그램을 이용한 선택율 추정 기법)

  • Chi, Jeong-Hee;Lee, Jin-Yul;Kim, Sang-Ho;Ryu, Keun-Ho
    • The KIPS Transactions:PartD
    • /
    • v.11D no.2
    • /
    • pp.281-292
    • /
    • 2004
  • Selectivity estimation for spatial query is very important process used in finding the most efficient execution plan. Many works have been performed to estimate accurate selectivity. Although they deal with some problems such as false-count, multi-count, they can not get such effects in little memory space. Therefore, we propose a new technique called MW Histogram which is able to compress summary data and get reasonable results and has a flexible structure to react dynamic update. Our method is based on two techniques : (a) MinSkew partitioning algorithm which deal with skewed spatial datasets efficiently (b) Wavelet transformation which compression effect is proven. The experimental results showed that the MW Histogram which the buckets and wavelet coefficients ratio is 0.3 is lower relative error than MinSkew Histogram about 5%-20% queries, demonstrates that MW histogram gets a good selectivity in little memory.

Design and Implementation of a Hybrid Equipment Data Acquisition System(HEDAS) for Equipment Engineering System(EES) Framework (EES 프레임워크를 위한 하이브리드 생산설비 데이터 습득 시스템(HEDAS)의 설계 및 구현)

  • Kim, Gyoung-Bae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.2
    • /
    • pp.167-176
    • /
    • 2012
  • In this paper we design and implement a new Hybrid Equipment Data Acquisition System (HEDAS) for data collection of semiconductor and optoelectronic manufacturing equipments in the equipment engineering system(EES) framework. The amount of the data collected from equipments have increased rapidly in equipment engineering system. The proposed HEDAS efficiently handles a large amount of real-time equipment data generated from EES framework. It also can support the real-time ESS applications as well as non real-time ESS applications. For the real-time EES applications, it performs high-speed real-time processing that uses continuous query and filtering techniques based on memory buffers. The HEDAS can optionally store non real-time equipment data using a HEDAS-based database or a traditional DBMS-based database. In particular, The proposed HEDAS offers the compression indexing based on the timestamp of data and query processing technique saving the cost of disks storage against extremely increasing equipment data. The HEDAS is efficient system to collect huge real-time and non real-time equipment data and transmit the collected equipment data to several EES applications in EES framework.

Active Adjustment: An Approach for Improving the Search Performance of the TPR*-tree (능동적 재조정: TPR*-트리의 검색 성능 개선 방안)

  • Kim, Sang-Wook;Jang, Min-Hee;Lim, Sung-Chae
    • The KIPS Transactions:PartD
    • /
    • v.15D no.4
    • /
    • pp.451-462
    • /
    • 2008
  • Recently, with the advent of applications using locations of moving objects, it becomes crucial to develop efficient index schemes for spatio-temporal databases. The $TPR^*$-tree is most popularly accepted as an index structure for processing future-time queries. In the $TPR^*$-tree, the future locations of moving objects are predicted based on the CBR(Conservative Bounding Rectangle). Since the areas predicted from CBRs tend to grow rapidly over time, CBRs thus enlarged lead to serious performance degradation in query processing. Against the problem, we propose a new method to adjust CBRs to be tight, thereby improving the performance of query processing. Our method examines whether the adjustment of a CBR is necessary when accessing a leaf node for processing a user query. Thus, it does not incur extra disk I/Os in this examination. Also, in order to make a correct decision, we devise a cost model that considers both the I/O overhead for the CBR adjustment and the performance gain in the future-time owing to the CBR adjustment. With the cost model, we can prevent unusual expansions of BRs even when updates on nodes are infrequent and also avoid unnecessary execution of the CBR adjustment. For performance evaluation, we conducted a variety of experiments. The results show that our method improves the performance of the original $TPR^*$-tree significantly.

A Genetic Algorithm for Materialized View Selection in Data Warehouses (데이터웨어하우스에서 유전자 알고리즘을 이용한 구체화된 뷰 선택 기법)

  • Lee, Min-Soo
    • The KIPS Transactions:PartD
    • /
    • v.11D no.2
    • /
    • pp.325-338
    • /
    • 2004
  • A data warehouse stores information that is collected from multiple, heterogeneous information sources for the purpose of complex querying and analysis. Information in the warehouse is typically stored In the form of materialized views, which represent pre-computed portions of frequently asked queries. One of the most important tasks of designing a warehouse is the selection of materialized views to be maintained in the warehouse. The goal is to select a set of views so that the total query response time over all queries can be minimized while a limited amount of time for maintaining the views is given(maintenance-cost view selection problem). In this paper, we propose an efficient solution to the maintenance-cost view selection problem using a genetic algorithm for computing a near-optimal set of views. Specifically, we explore the maintenance-cost view selection problem in the context of OR view graphs. We show that our approach represents a dramatic improvement in terms of time complexity over existing search-based approaches that use heuristics. Our analysis shows that the algorithm consistently yields a solution that only has an additional 10% of query cost of over the optimal query cost while at the same time exhibits an impressive performance of only a linear increase in execution time. We have implemented a prototype version of our algorithm that is used to evaluate our approach.

Approximation Methods for Efficient Spatial Operations in Multiplatform Environments (멀티 플랫폼 환경에서 효율적인 공간 연산을 위한 객체의 근사 표현 기법)

  • 강구안;김진덕
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2003.10a
    • /
    • pp.453-456
    • /
    • 2003
  • Spatial database systems achieve filtering steps with MBR(Minimum founding Rectangle) for efficient query processing, and then carry out refinement steps for candidate objects. While most operations require fast execution of filtering, it is necessary to increase the filtering rates and reduce the number of refinement steps in the low computing powered devices. The compact representation method is also needed in the mobile devices with low storage capacity. The paper proposes various approximation methods for efficient spatial operations in the multiplatform environments. This paper also designs a compression technique for MBR, which occupies almost 80% of index data in the two dimensional case. We also analyze the advantages and drawbacks of each method in terms of space utilization, filtering efficiency and speed.

  • PDF

Accelerating Group Fusion for Ligand-Based Virtual Screening on Multi-core and Many-core Platforms

  • Mohd-Hilmi, Mohd-Norhadri;Al-Laila, Marwah Haitham;Hassain Malim, Nurul Hashimah Ahamed
    • Journal of Information Processing Systems
    • /
    • v.12 no.4
    • /
    • pp.724-740
    • /
    • 2016
  • The performance issues of screening large database compounds and multiple query compounds in virtual screening highlight a common concern in Chemoinformatics applications. This study investigates these problems by choosing group fusion as a pilot model and presents efficient parallel solutions in parallel platforms, specifically, the multi-core architecture of CPU and many-core architecture of graphical processing unit (GPU). A study of sequential group fusion and a proposed design of parallel CUDA group fusion are presented in this paper. The design involves solving two important stages of group fusion, namely, similarity search and fusion (MAX rule), while addressing embarrassingly parallel and parallel reduction models. The sequential, optimized sequential and parallel OpenMP of group fusion were implemented and evaluated. The outcome of the analysis from these three different design approaches influenced the design of parallel CUDA version in order to optimize and achieve high computation intensity. The proposed parallel CUDA performed better than sequential and parallel OpenMP in terms of both execution time and speedup. The parallel CUDA was 5-10x faster than sequential and parallel OpenMP as both similarity search and fusion MAX stages had been CUDA-optimized.

A Multi-Stage Approach to Secure Digital Image Search over Public Cloud using Speeded-Up Robust Features (SURF) Algorithm

  • AL-Omari, Ahmad H.;Otair, Mohammed A.;Alzwahreh, Bayan N.
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.12
    • /
    • pp.65-74
    • /
    • 2021
  • Digital image processing and retrieving have increasingly become very popular on the Internet and getting more attention from various multimedia fields. That results in additional privacy requirements placed on efficient image matching techniques in various applications. Hence, several searching methods have been developed when confidential images are used in image matching between pairs of security agencies, most of these search methods either limited by its cost or precision. This study proposes a secure and efficient method that preserves image privacy and confidentially between two communicating parties. To retrieve an image, feature vector is extracted from the given query image, and then the similarities with the stored database images features vector are calculated to retrieve the matched images based on an indexing scheme and matching strategy. We used a secure content-based image retrieval features detector algorithm called Speeded-Up Robust Features (SURF) algorithm over public cloud to extract the features and the Honey Encryption algorithm. The purpose of using the encrypted images database is to provide an accurate searching through encrypted documents without needing decryption. Progress in this area helps protect the privacy of sensitive data stored on the cloud. The experimental results (conducted on a well-known image-set) show that the performance of the proposed methodology achieved a noticeable enhancement level in terms of precision, recall, F-Measure, and execution time.