• Title/Summary/Keyword: Parallel disk I/O

Search Result 29, Processing Time 0.027 seconds

An Efficient Multidimensional Index Structure for Parallel Environments

  • Bok Koung-Soo;Song Seok-Il;Yoo Jae-Soo
    • International Journal of Contents
    • /
    • v.1 no.1
    • /
    • pp.50-58
    • /
    • 2005
  • Generally, multidimensional data such as image and spatial data require large amount of storage space. There is a limit to store and manage those large amounts of data in single workstation. If we manage the data on parallel computing environment which is being actively researched these days, we can get highly improved performance. In this paper, we propose a parallel multidimensional index structure that exploits the parallelism of the parallel computing environment. The proposed index structure is nP(processor)-nxmD(disk) architecture which is the hybrid type of nP-nD and 1P-nD. Its node structure in-creases fan-out and reduces the height of an index. Also, a range search algorithm that maximizes I/O parallelism is devised, and it is applied to k-nearest neighbor queries. Through various experiments, it is shown that the proposed method outperforms other parallel index structures.

  • PDF

Declustering of High-dimensional Data by Cyclic Sliced Partitioning (주기적 편중 분할에 의한 다차원 데이터 디클러스터링)

  • Kim Hak-Cheol;Kim Tae-Wan;Li Ki-Joune
    • Journal of KIISE:Databases
    • /
    • v.31 no.6
    • /
    • pp.596-608
    • /
    • 2004
  • A lot of work has been done to reduce disk access time in I/O intensive systems, which store and handle massive amount of data, by distributing data across multiple disks and accessing them in parallel. Most of the previous work has focused on an efficient mapping from a grid cell to a disk number on the assumption that data space is regular grid-like partitioned. Although we can achieve good performance for low-dimensional data by grid-like partitioning, its performance becomes degenerate as grows the dimension of data even with a good disk allocation scheme. This comes from the fact that they partition entire data space equally regardless of distribution ratio of data objects. Most of the data in high-dimensional space exist around the surface of space. For that reason, we propose a new declustering algorithm based on the partitioning scheme which partition data space from the surface. With an unbalanced partitioning scheme, several experimental results show that we can remarkably reduce the number of data blocks touched by a query as grows the dimension of data and a query size. In this paper, we propose disk allocation schemes based on the layout of the resultant data blocks after partitioning. To show the performance of the proposed algorithm, we have performed several experiments with different dimensional data and for a wide range of number of disks. Our proposed disk allocation method gives a performance within 10 additive disk accesses compared with strictly optimal allocation scheme. We compared our algorithm with Kronecker sequence based declustering algorithm, which is reported to be the best among the grid partition and mapping function based declustering algorithms. We can improve declustering performance up to 14 times as grows dimension of data.

Design of an Efficient Parallel High-Dimensional Index Structure (효율적인 병렬 고차원 색인구조 설계)

  • Park, Chun-Seo;Song, Seok-Il;Sin, Jae-Ryong;Yu, Jae-Su
    • Journal of KIISE:Databases
    • /
    • v.29 no.1
    • /
    • pp.58-71
    • /
    • 2002
  • Generally, multi-dimensional data such as image and spatial data require large amount of storage space. There is a limit to store and manage those large amount of data in single workstation. If we manage the data on parallel computing environment which is being actively researched these days, we can get highly improved performance. In this paper, we propose a parallel high-dimensional index structure that exploits the parallelism of the parallel computing environment. The proposed index structure is nP(processor)-n$\times$mD(disk) architecture which is the hybrid type of nP-nD and lP-nD. Its node structure increases fan-out and reduces the height of a index tree. Also, A range search algorithm that maximizes I/O parallelism is devised, and it is applied to K-nearest neighbor queries. Through various experiments, it is shown that the proposed method outperforms other parallel index structures.

Prefetching Policy based on File Acess Pattern and Cache Area (파일 접근 패턴과 캐쉬 영역을 고려한 선반입 기법)

  • Lim, Jae-Deok;Hwang-Bo, Jun-Hyeong;Koh, Kwang-Sik;Seo, Dae-Hwa
    • The KIPS Transactions:PartA
    • /
    • v.8A no.4
    • /
    • pp.447-454
    • /
    • 2001
  • Various caching and prefetching algorithms have been investigated to identify and effective method for improving the performance of I/O devices. A prefetching algorithm decreases the processing time of a system by reducing the number of disk accesses when an I/O is needed. This paper proposes an AMBA prefetching method that is an extended version of the OBA prefetching method. The AMBA prefetching method will prefetching blocks continuously as long as disk bandwidth is enough. In this method, though there were excessive data request rate, we would expect efficient prefetching. And in the AMBA prefetching method, to prevent the cache pollution, it limits the number of data blocks to be prefetched within the cache area. It can be implemented in a user-level File System based on a Linux Operating System. In particular, the proposed prefetching policy improves the system performance by about 30∼40% for large files that are accessed sequentially.

  • PDF

Enhanced NOW-Sort on a PC Cluster with a Low-Speed Network (저속 네트웍 PC 클러스터상에서 NOW-Sort의 성능향상)

  • Kim, Ji-Hyoung;Kim, Dong-Seung
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.10
    • /
    • pp.550-560
    • /
    • 2002
  • External sort on cluster computers requires not only fast internal sorting computation but also careful scheduling of disk input and output and interprocessor communication through networks. This is because the overall time for the execution is determined by reflecting the times for all the jobs involved, and the portion for interprocessor communication and disk I/O operations is significant. In this paper, we improve the sorting performance (sorting throughput) on a cluster of PCs with a low-speed network by developing a new algorithm that enables even distribution of load among processors, and optimizes the disk read and write operations with other computation/communication activities during the sort. Experimental results support the effectiveness of the algorithm. We observe the algorithm reduces the sort time by 45% compared to the previous NOW-sort[1], and provides more scalability in the expansion of the computing nodes of the cluster as well.

De-duplication of Parity Disk in SSD-Based RAID System (SSD 기반의 RAID 시스템에서 패리티 디스크의 중복 제거)

  • Yang, Yu-Seok;Lee, Seung-Kyu;Kim, Deok-Hwan
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.1
    • /
    • pp.105-113
    • /
    • 2013
  • RAID systems have been widely used by connecting several disks in parallel structure. to resolve the delay and bottleneck of data I/O. Recently, SSD based RAID systems are emerging since SSDs have better I/O performance than HDD. However, endurance and power consumption problems due to frequent write operation in SSD based RAID system should be resolved. In this paper, we propose a de-duplication method of parity disk in SSD based RAID system with expensive update cost. The proposed method segments chunk of parity data into small pieces and removes duplicate data, therefore, it can reduce wear-leveling and power consumption by decreasing write operation for duplicated parity data. Experimental results show that bit update rate of the proposed method is 16% in total disk, 31% in parity disk less than that of existing method in RAID-6 system using EVENODD erasure code, and the power consumption of the proposed method is 30% less than that of existing method. Besides the proposed method is 12% in total disk, 32% in parity disk less than that of existing method in RAID-5 system, and the power consumption of the proposed method is 36% less than that of existing method.

Parallel Processing of Multiple Queries in a Declustered Spatial Database (디클러스터된 공간 데이터베이스에서 다중 질의의 병렬 처리)

  • Seo, Yeong-Deok;Park, Yeong-Min;Jeon, Bong-Gi;Hong, Bong-Hui
    • Journal of KIISE:Databases
    • /
    • v.29 no.1
    • /
    • pp.44-57
    • /
    • 2002
  • Multiple spatial queries are defined as two or more spatial range queries to be executed at the same time. The primary processing of internet-based map services is to simultaneously execute multiple spatial queries. To improve the throughput of multiple queries, the time of disk I/O in processing spatial queries significantly should be reduced. The declustering scheme of a spatial dataset of the MIMD architecture cannot decrease the disk I/O time because of random seeks for processing multiple queries. This thesis presents query scheduling strategies to ease the problem of inter-query random seeks. Query scheduling is achieved by dynamically re-ordering the priority of the queued spatial queries. The re-ordering of multiple queries is based on the inter-query spatial relationship and the latency of query processing. The performance test shows that the time of multiple query processing with query scheduling can be significantly reduced by easing inter-query random seeks as a consequence of enhanced hit ratio of disk cache.

Parallel Spatial Join Method Using Efficient Spatial Relation Partition In Distributed Spatial Database Systems (분산 공간 DBMS에서의 효율적인 공간 릴레이션 분할 기법을 이용한 병렬 공간 죠인 기법)

  • Ko, Ju-Il;Lee, Hwan-Jae;Bae, Hae-Young
    • Journal of Korea Spatial Information System Society
    • /
    • v.4 no.1 s.7
    • /
    • pp.39-46
    • /
    • 2002
  • In distributed spatial database systems, users nay issue a query that joins two relations stored at different sites. The sheer volume and complexity of spatial data bring out expensive CPU and I/O costs during the spatial join processing. This paper shows a new spatial join method which joins two spatial relation in a parallel way. Firstly, the initial join operation is divided into two distinct ones by partitioning one of two participating relations based on the region. This two join operations are assigned to each sites and executed simultaneously. Finally, each intermediate result sets from the two join operations are merged to an ultimate result set. This method reduces the number of spatial objects participating in the spatial operations. It also reduces the scope and the number of scanning spatial indices. And it does not materialize the temporary results by implementing the join algebra operators using the iterator. The performance test shows that this join method can lead to efficient use in terms of buffer and disk by narrowing down the joining region and decreasing the number of spatial objects.

  • PDF

Term Clustering and Duplicate Distribution for Efficient Parallel Information Retrieval (효율적인 병렬정보검색을 위한 색인어 군집화 및 분산저장 기법)

  • 강재호;양재완;정성원;류광렬;권혁철;정상화
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.1_2
    • /
    • pp.129-139
    • /
    • 2003
  • The PC cluster architecture is considered as a cost-effective alternative to the existing supercomputers for realizing a high-performance information retrieval (IR) system. To implement an efficient IR system on a PC cluster, it is essential to achieve maximum parallelism by having the data appropriately distributed to the local hard disks of the PCs in such a way that the disk I/O and the subsequent computation are distributed as evenly as possible to all the PCs. If the terms in the inverted index file can be classified to closely related clusters, the parallelism can be maximized by distributing them to the PCs in an interleaved manner. One of the goals of this research is the development of methods for automatically clustering the terms based on the likelihood of the terms' co-occurrence in the same query. Also, in this paper, we propose a method for duplicate distribution of inverted index records among the PCs to achieve fault-tolerance as well as dynamic load balancing. Experiments with a large corpus revealed the efficiency and effectiveness of our method.

The Design and Implementation of the Distributed Shared Disk for Efficient Parallel I/O (효율적인 병렬 입출력을 지원하기 위한 분산공유디스트의 설계 및 구현)

  • 송창호;남영진;박찬익
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1998.10a
    • /
    • pp.718-720
    • /
    • 1998
  • 병렬파일시스템을 분산 환경에서 구현하고자 할 때 하부기능들을 관리 및 유지하기 위해서는 복잡한 내부 동작이 필요하다. 저 수준의 하드웨어 관리기능들을 고수준의 파일 서비스 기능들과 분리함으로써 병렬파일시스템 구현의 복잡도를 감소시킬수 있다. 이를 위해 본 논문에서는 분산환경상에서 물리적으로 분산되어 있는 디스크들을 하나의 거대한 논리적인 가상 디스크로 보여주는 분산공유디스크개념을 제안한다. 분산 공유디스크는 병렬 파일 시스템을 지원하기 위한 저수준의 인터페이스를 제공함으로써 병렬파일시스템에서 필용로 하는 하부기능들을 잠재적으로 제공할 수 있다. 또한 클러스터 기반 시스템에서 분산공유디스크의 프로토타입을 구현하여 그의 동작을 실험하였다.

  • PDF