• Title/Summary/Keyword: Merge sort

Search Result 18, Processing Time 0.027 seconds

External Merge Sorting in Tajo with Variable Server Configuration (매개변수 환경설정에 따른 타조의 외부합병정렬 성능 연구)

  • Lee, Jongbaeg;Kang, Woon-hak;Lee, Sang-won
    • Journal of KIISE
    • /
    • v.43 no.7
    • /
    • pp.820-826
    • /
    • 2016
  • There is a growing requirement for big data processing which extracts valuable information from a large amount of data. The Hadoop system employs the MapReduce framework to process big data. However, MapReduce has limitations such as inflexible and slow data processing. To overcome these drawbacks, SQL query processing techniques known as SQL-on-Hadoop were developed. Apache Tajo, one of the SQL-on-Hadoop techniques, was developed by a Korean development group. External merge sort is one of the heavily used algorithms in Tajo for query processing. The performance of external merge sort in Tajo is influenced by two parameters, sort buffer size and fanout. In this paper, we analyzed the performance of external merge sort in Tajo with various sort buffer sizes and fanouts. In addition, we figured out that there are two major causes of differences in the performance of external merge sort: CPU cache misses which increase as the sort buffer size grows; and the number of merge passes determined by fanout.

Analysis and Comparison of Sorting Algorithms (Insertion, Merge, and Heap) Using Java

  • Khaznah, Alhajri;Wala, Alsinan;Sahar, Almuhaishi;Fatimah, Alhmood;Narjis, AlJumaia;Azza., A.A
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.12
    • /
    • pp.197-204
    • /
    • 2022
  • Sorting is an important data structure in many applications in the real world. Several sorting algorithms are currently in use for searching and other operations. Sorting algorithms rearrange the elements of an array or list based on the elements' comparison operators. The comparison operator is used in the accurate data structure to establish the new order of elements. This report analyzes and compares the time complexity and running time theoretically and experimentally of insertion, merge, and heap sort algorithms. Java language is used by the NetBeans tool to implement the code of the algorithms. The results show that when dealing with sorted elements, insertion sort has a faster running time than merge and heap algorithms. When it comes to dealing with a large number of elements, it is better to use the merge sort. For the number of comparisons for each algorithm, the insertion sort has the highest number of comparisons.

A study of Time Management System in Data Base (데이터베이스에서의 시간 시스템에 관한 연구)

  • 최진탁
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.21 no.48
    • /
    • pp.185-192
    • /
    • 1998
  • A new algorithm is proposed in this paper which efficiently performs join in the temporal database. The main idea is to sort the smaller relation and to partition the larger relation, and the proposed algorithm reduces the cost of sorting the larger relation. To show the usefulness of the algorithm, the cost is analyzed with respect to the number of accesses to secondary storage and compared with that of Sort-Merge algorithm. Through the comparisons, we present and verify the conditions under which the proposed algorithm always outperforms the Sort-Merge algorithm. The comparisons show that the proposed algorithm achieves 10∼30% gain under those conditions.

  • PDF

Parallel FFT and Quick-Merge Sort on the Reflective Memory Networked Computers and a Cluster of Work-stations

  • Lee, Changhun;Kwon, Wook-Hyun
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2002.10a
    • /
    • pp.94.1-94
    • /
    • 2002
  • This paper is concerned with parallel FFT and Quick-Merge Sort. They are implemented on computers interconnected by VMIC 5579 reflective memory and a cluster of workstations (PCs) interconnected via Fast Ethernet. Message passing interface (MPI) parallel library is used for communication in a cluster of workstations. An improved parallel FFT is also presented to decrease an execution time in the case of a small number of hosts. Distributed shared memory (DSM), VMIC 5579 reflective memory (RM), a cluster of workstations (COW) and message passing interface (MPI) parallel library are described.

  • PDF

Conncetiveity of X-Hypercubes and Its Applications (X-Hypercubes의 연결성과 그 응용)

  • Gwon, Gyeong-Hui
    • The Transactions of the Korea Information Processing Society
    • /
    • v.1 no.1
    • /
    • pp.92-98
    • /
    • 1994
  • The hypercube-like interconncetion network,X-hypercubes,has the same number of nodes and edges as conventional hypercubes.By slightly changing the interconneton way between nodes,however,X-hypercubes reduces the diameter by almost half.Thus the communication delay in X-hypercubes can be expected to be much lower than that in hypercubes. This paper gives a new definition of X-hypercubes establishing clear-cut condition of connection between two nodes.As appliction examples of the new definition,this paper presents simple embeddings of hypercubes in X-hypercubes and vice versa.This means that any programs written for hypercubes can be transported onto X-hypercubes and vice versa with minimal overhead.This paper also present bitonic merge sort for X-hypercubes by simulation that for hypercubes.

  • PDF

A Fast Sorting Strategy Based on a Two-way Merge Sort for Balancing the Capacitor Voltages in Modular Multilevel Converters

  • Zhao, Fangzhou;Xiao, Guochun;Liu, Min;Yang, Daoshu
    • Journal of Power Electronics
    • /
    • v.17 no.2
    • /
    • pp.346-357
    • /
    • 2017
  • The Modular Multilevel Converter (MMC) is particularly attractive for medium and high power applications such as High-Voltage Direct Current (HVDC) systems. In order to reach a high voltage, the number of cascaded submodules (SMs) is generally very large. Thus, in the applications with hundreds or even thousands of SMs such as MMC-HVDCs, the sorting algorithm of the conventional voltage balancing strategy is extremely slow. This complicates the controller design and increases the hardware cost tremendously. This paper presents a Two-Way Merge Sort (TWMS) strategy based on the prediction of the capacitor voltages under ideal conditions. It also proposes an innovative Insertion Sort Correction for the TWMS (ISC-TWMS) to solve issues in practical engineering under non-ideal conditions. The proposed sorting methods are combined with the features of the MMC-HVDC control strategy, which significantly accelerates the sorting process and reduces the implementation efforts. In comparison with the commonly used quicksort algorithm, it saves at least two-thirds of the sorting execution time in one arm with 100 SMs, and saves more with a higher number of SMs. A 501-level MMC-HVDC simulation model in PSCAD/EMTDC has been built to verify the validity of the proposed strategies. The fast speed and high efficiency of the algorithms are demonstrated by experiments with a DSP controller (TMS320F28335).

Finding the Worst-case Instances of Some Sorting Algorithms Using Genetic Algorithms (유전 알고리즘을 이용한 정렬 알고리즘의 최악의 인스턴스 탐색)

  • Jeon, So-Yeong;Kim, Yong-Hyuk
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2010.06b
    • /
    • pp.1-5
    • /
    • 2010
  • 정렬 알고리즘에서 사용한 원소 간 비교횟수를 기준으로, 비교횟수가 많게 되는 순열을 최악의 인스턴스(worst-case instance)라 명명하고 이를 찾기 위해 유전 알고리즘(genetic algorithm)을 사용하였다. 잘 알려진 퀵 정렬(quick sort), 머지 정렬(merge sort), 힙 정렬(heap sort), 삽입 정렬(insertion sort), 쉘 정렬(shell sort), 개선된 퀵 정렬(advanced quick sort)에 대해서 실험하였다. 머지 정렬과 삽입 정렬에 대해 탐색한 인스턴스는 최악의 인스턴스에 거의 근접하였다. 퀵 정렬은 크기가 증가함에 따라 최악의 인스턴스 탐색이 어려웠다. 나머지 정렬에 대해서 찾은 인스턴스는 최악의 인스턴스인지 이론적으로 보장할 수 없지만, 임의의 1,000개 순열을 정렬해서 얻은 비교횟수들의 평균치보다는 훨씬 높았다. 본 논문의 최악의 인스턴스를 탐색하는 시도는 알고리즘의 성능 검증을 위한 테스트 데이터를 생성한다는 점에서 의미가 크다.

  • PDF

A New Method for Efficient in-Place Merging

  • Kim, Pok-Son;Arne Kutzner
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.392-394
    • /
    • 2003
  • There is a well-known simple, stable standard merge algorithm, which uses only linear time but for the price of double space. This extra space consumption has been often remarked as lack of the standard merge sort algorithm that covers a merge process as central operation. In-place merging is a way to overcome this lack and so is a topic with a long tradition of inspection in the area of theoretical computer science. We present an in-place merging algorithm that rear-ranges the elements to be merged by rotation, a special form of block interchanging. Our algorithm is novel, due to its technique of determination of the rotation-areas. Further it has a short and transparent definition. We will give a presentation of our algorithm and prove that it needs in the worst case not more than twice as much comparisons as the standard merge algorithm. Experimental work has shown that our algorithm is efficient and so might be of high practical interest.

  • PDF

Performance Comparison of Join Operations Parallelization by using GPGPU (GPGPU 기반 조인 연산 병렬화 성능 비교)

  • Lee, Jong-Sub;Lee, Sang-Back;Lee, Kyu-Chul
    • Database Research
    • /
    • v.34 no.3
    • /
    • pp.28-44
    • /
    • 2018
  • In a database system, the most expensive operation among relational operations is a join operation. Generally, CPU-based join operations uses parallel processing with either 1 core or 16 cores at most, which does not significantly improve the function. On the other hand, GPGPU(General-Purpose computing on Graphics Processing Units) allows parallel processing through thousands of processing units, greatly reducing the time required to perform join operations. Parallelization of the operation using GPGPU uses NVIDIA's CUDA SDK. In this paper, we implement parallelization of the join operation using GPGPU and compare the performances. The used join operations are Nested Loop Join (NLJ), Sort Merge Join (SMJ) and Hash Join (HJ), and GPGPU equipment uses TITAN Xp, GTX 1080 Ti and GTX 1080. We measure and compare the performance of join operations based on CPU and GPGPU. We compare this performance with the performance of the previous study on the join operation based on GPGPU. The results of experiment show that the performance based on GPGPU is 6~328 times faster than the one based on CPU.

Processing Large Date Using File System On ETL (ETL상에서 파일 시스템을 이용한 대용량 데이터 처리 기법)

  • Jung, Yun-Chun
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2008.06c
    • /
    • pp.127-131
    • /
    • 2008
  • 관계형 DBMS의 보급이 확대되면서 대형 운영시스템 구축 시에 인덱스를 사용하는 관계형 DB의 사용이 증가하고 있다. 이에 따라 Sort의 용도가 대폭 축소되고 DB에서 직접 대형 결산작업이 주로 처리되게 되었다. 그러나 대형 결산 작업 처리시 사용되는 대용량의 데이터의 경우 ETL(Extract Transformation Loading) 작업 시에는 오히려 파일 시스템을 사용하는 경우보다 성능이 저하되는 문제가 발생하기 시작했다. 본 논문에서는 ETL 작업 시 DBMS에 존재하는 대용량 데이터 처리하는 경우에 파일 시스템 상에서 flat 파일을 이용하여 처리 속도를 향상 시키고, 이와 동시에 리소스부하 문제를 해결할 수 있는 방안을 제시했다. 보다 세부적으로 DBMS에서 사용되는 sort, Join, Merge, Summary, 각종 사용자 함수 등의 다양한 기능들을 flat 파일에 적용하는 방법을 제시하였다. 또한 실험을 통해 ETL 작업 시 제안하는 기법이 처리 속도 개선과 리소스 활용성을 향상 시킴을 증명하였다.

  • PDF