• Title/Summary/Keyword: data partitioning

Search Result 390, Processing Time 0.023 seconds

Spatial Partitioning using filbert Space Filling Curve for Spatial Query Optimization (공간 질의 최적화를 위한 힐버트 공간 순서화에 따른 공간 분할)

  • Whang, Whan-Kyu;Kim, Hyun-Guk
    • The KIPS Transactions:PartD
    • /
    • v.11D no.1
    • /
    • pp.23-30
    • /
    • 2004
  • In order to approximate the spatial query result size we partition the input rectangles into subsets and estimate the query result size based on the partitioned spatial area. In this paper we examine query result size estimation in skewed data. We examine the existing spatial partitioning techniques such as equi-area and equi-count partitioning, which are analogous to the equi-width and equi-height histograms used in relational databases, and examine the other partitioning techniques based on spatial indexing. In this paper we propose a new spatial partitioning technique based on the Hilbert space filling curve. We present a detailed experimental evaluation comparing the proposed technique and the existing techniques using synthetic as well as real-life datasets. The experiments showed that the proposed partitioning technique based on the Hilbert space filling curve achieves better query result size estimation than the existing techniques for space query size, bucket numbers, skewed data, and spatial data size.

A New Data Partitioning of DCT Coefficients for Error-resilient Transmission of Video (비디오의 에러내성 전송을 위한 DCT 계수의 새로운 분할 기법)

  • Roh, Kyu-Chan;Kim, Jae-Kyoon
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.39 no.6
    • /
    • pp.585-590
    • /
    • 2002
  • In the typical data partitioning for error-resilient video coding, motion and macroblock header information is separated from the texture information. It can be an effective tool for the transmission of video over the error prone environment. For Intra-coded frames, however, the loss of DCT (discrete cosine transform) coefficients is fatal because there is no ther information to reconstruct the corrupted macroblocks by errors. For Inter-coded frames, when error occurs in DCT coefficients, the picture quality is degraded because all DCT coefficients are discarded in those packets. In this paper, we propose an efficient data partitioning and coding method for DCT-based error-resilient video. The quantized DCT coefficients are partitioned into the even-value approximation and the remainder parts. It is shown that the proposed algorithm provides a better quality of the high priority part than the conventional methods.

Dynamic Partitioning Scheme for Large RDF Data in Heterogeneous Environments (이종 환경에서 대용량 RDF 데이터를 위한 동적 분할 기법)

  • Kim, Minsoo;Lim, Jongtae;Bok, Kyoungsoo;Yoo, Jaesoo
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.10
    • /
    • pp.605-610
    • /
    • 2017
  • In distributed environments, dynamic partitioning is needed to resolve the load on a particular server or the load caused by communication among servers. In heterogeneous environments, existing dynamic partitioning schemes can distribute the same load to a server with a low physical performance, which results in a delayed query response time. In this paper, we propose a dynamic partitioning scheme for large RDF data in heterogeneous environments. The proposed scheme calculates the query loads with its frequency and the number of vertices used in the query for load balancing. In addition, we calculate the server loads by considering the physical performance of the servers to allocate less of a load to the servers with a smaller physical performance in a heterogeneous environment. We perform dynamic partitioning to minimize the number of edge-cuts to reduce the traffic among servers. To show the superiority of the proposed scheme, we compare it with an existing dynamic partitioning scheme through a performance evaluation.

A Vertical File Partitioning Method Using SOFM in Database Design (데이터베이스 설계에서 SOFM 을 이용한 화일 수직분할 방법)

  • Shin, K.H.;Kim, J.Y.
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.24 no.4
    • /
    • pp.661-671
    • /
    • 1998
  • It is important to minimize the number of disk accesses which is necessary to transfer data in disk into main memory when processing transactions in physical database design. A vertical file partitioning method is used to reduce the number of disk accesses by partitioning relations vertically and accessing only necessay fragments. In this paper, SOFM(Self-Organizing Feature Maps) network is used to solve vertical partitioning problems. This paper shows that SOFM network is efficient in solving vertical partitioning problem by comparing approximate solution of SOFM network with optimal solution of N-ary branch and bound method. And this paper presents a heuristic algorithm for allocating duplicate attributes to vertically partitioned fragments. As branch and bound method requires particularly much computing time to solve large-sized problems, it is shown that SOFM network is able to overcome this limitation of branch and bound method and solve large-sized problems efficiently in a short time.

  • PDF

Speaker Change Detection Based on a Graph-Partitioning Criterion

  • Seo, Jin-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.2
    • /
    • pp.80-85
    • /
    • 2011
  • Speaker change detection involves the identification of time indices of an audio stream, where the identity of the speaker changes. In this paper, we propose novel measures for the speaker change detection based on a graph-partitioning criterion over the pairwise distance matrix of feature-vector stream. Experiments on both synthetic and real-world data were performed and showed that the proposed approach yield promising results compared with the conventional statistical measures.

CPU-GPU2 Trigeneous Computing for Iterative Reconstruction in Computed Tomography

  • Oh, Chanyoung;Yi, Youngmin
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.5 no.4
    • /
    • pp.294-301
    • /
    • 2016
  • In this paper, we present methods to efficiently parallelize iterative 3D image reconstruction by exploiting trigeneous devices (three different types of device) at the same time: a CPU, an integrated GPU, and a discrete GPU. We first present a technique that exploits single instruction multiple data (SIMD) architectures in GPUs. Then, we propose a performance estimation model, based on which we can easily find the optimal data partitioning on trigeneous devices. We found that the performance significantly varies by up to 6.23 times, depending on how SIMD units in GPUs are accessed. Then, by using trigeneous devices and the proposed estimation models, we achieve optimal partitioning and throughput, which corresponds to a 9.4% further improvement, compared to discrete GPU-only execution.

MLPPI Wizard: An Automated Multi-level Partitioning Tool on Analytical Workloads

  • Suh, Young-Kyoon;Crolotte, Alain;Kostamaa, Pekka
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.4
    • /
    • pp.1693-1713
    • /
    • 2018
  • An important technique used by database administrators (DBAs) is to improve performance in decision-support workloads associated with a Star schema is multi-level partitioning. Queries will then benefit from performance improvements via partition elimination, due to constraints on queries expressed on the dimension tables. As the task of multi-level partitioning can be overwhelming for a DBA we are proposing a wizard that facilitates the task by calculating a partitioning scheme for a particular workload. The system resides completely on a client and interacts with the costing estimation subsystem of the query optimizer via an API over the network, thereby eliminating any need to make changes to the optimizer. In addition, since only cost estimates are needed the wizard overhead is very low. By using a greedy algorithm for search space enumeration over the query predicates in the workload the wizard is efficient with worst-case polynomial complexity. The technology proposed can be applied to any clustering or partitioning scheme in any database management system that provides an interface to the query optimizer. Applied to the Teradata database the technology provides recommendations that outperform a human expert's solution as measured by the total execution time of the workload. We also demonstrate the scalability of our approach when the fact table (and workload) size increases.

Declustering of High-dimensional Data by Cyclic Sliced Partitioning (주기적 편중 분할에 의한 다차원 데이터 디클러스터링)

  • Kim Hak-Cheol;Kim Tae-Wan;Li Ki-Joune
    • Journal of KIISE:Databases
    • /
    • v.31 no.6
    • /
    • pp.596-608
    • /
    • 2004
  • A lot of work has been done to reduce disk access time in I/O intensive systems, which store and handle massive amount of data, by distributing data across multiple disks and accessing them in parallel. Most of the previous work has focused on an efficient mapping from a grid cell to a disk number on the assumption that data space is regular grid-like partitioned. Although we can achieve good performance for low-dimensional data by grid-like partitioning, its performance becomes degenerate as grows the dimension of data even with a good disk allocation scheme. This comes from the fact that they partition entire data space equally regardless of distribution ratio of data objects. Most of the data in high-dimensional space exist around the surface of space. For that reason, we propose a new declustering algorithm based on the partitioning scheme which partition data space from the surface. With an unbalanced partitioning scheme, several experimental results show that we can remarkably reduce the number of data blocks touched by a query as grows the dimension of data and a query size. In this paper, we propose disk allocation schemes based on the layout of the resultant data blocks after partitioning. To show the performance of the proposed algorithm, we have performed several experiments with different dimensional data and for a wide range of number of disks. Our proposed disk allocation method gives a performance within 10 additive disk accesses compared with strictly optimal allocation scheme. We compared our algorithm with Kronecker sequence based declustering algorithm, which is reported to be the best among the grid partition and mapping function based declustering algorithms. We can improve declustering performance up to 14 times as grows dimension of data.

Reproducibility Assessment of K-Means Clustering and Applications (K-평균 군집화의 재현성 평가 및 응용)

  • 허명회;이용구
    • The Korean Journal of Applied Statistics
    • /
    • v.17 no.1
    • /
    • pp.135-144
    • /
    • 2004
  • We propose a reproducibility (validity) assessment procedure of K-means cluster analysis by randomly partitioning the data set into three parts, of which two subsets are used for developing clustering rules and one subset for testing consistency of clustering rules. Also, as an alternative to Rand index and corrected Rand index, we propose an entropy-based consistency measure between two clustering rules, and apply it to determination of the number of clusters in K-means clustering.

Branch-and-bound method for solving vertical partitioning problems in the design of the relational database (관계형 데이터 베이스 설계에서 분지한계법을 이용한 수직분할문제)

  • 윤병익;김재련
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.19 no.37
    • /
    • pp.241-249
    • /
    • 1996
  • In this paper, a 0-1 integer programming model for solving vertical partitioning problem minimizing the number of disk accesses is formulated and a branch-and-bound method is used to solve the binary vertical partitioning problem. In relational databases, the number of disk accesses depends on the amount of data transferred from disk to main memory for processing the transactions. Vertical partitioning of the relation can often result in a decrease in the number of disk accesses, since not all attributes in a tuple are required by each transactions. The algorithm is illustrated with numerical examples and is shown to be computationally efficient. Numerical experiments reveal that the proposed method is more effective in reducing access costs than the existing algorithms.

  • PDF