• Title/Summary/Keyword: sort buffer size

Search Result 2, Processing Time 0.016 seconds

External Merge Sorting in Tajo with Variable Server Configuration (매개변수 환경설정에 따른 타조의 외부합병정렬 성능 연구)

  • Lee, Jongbaeg;Kang, Woon-hak;Lee, Sang-won
    • Journal of KIISE
    • /
    • v.43 no.7
    • /
    • pp.820-826
    • /
    • 2016
  • There is a growing requirement for big data processing which extracts valuable information from a large amount of data. The Hadoop system employs the MapReduce framework to process big data. However, MapReduce has limitations such as inflexible and slow data processing. To overcome these drawbacks, SQL query processing techniques known as SQL-on-Hadoop were developed. Apache Tajo, one of the SQL-on-Hadoop techniques, was developed by a Korean development group. External merge sort is one of the heavily used algorithms in Tajo for query processing. The performance of external merge sort in Tajo is influenced by two parameters, sort buffer size and fanout. In this paper, we analyzed the performance of external merge sort in Tajo with various sort buffer sizes and fanouts. In addition, we figured out that there are two major causes of differences in the performance of external merge sort: CPU cache misses which increase as the sort buffer size grows; and the number of merge passes determined by fanout.

Multiple Rotating Priority Queue Scheduler to Meet Variable Delay Requriment in Real-Time Communication (실시간 통신에서 가변 지연을 만족하기 위한 Multiple Rotating Priority Queue Scheduler)

  • Hur, Kwon;Kim, Myung-Jun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.8
    • /
    • pp.2543-2554
    • /
    • 2000
  • Packet schedulers for real-time communication must provide bounded delay and efficient use of network resources such as bandwidth, buffers and so on. In order to satisfy them, a large number of packet scheduling methods have been proposed. Among packet scheduling methods, an EDF (Earliest Deadline First) scheduling is the optimal one for a bounded delay service. A disadvantage of EDF scheduling is that queued packets must be sorted according to their deadlines, requiring a search operation whenever a new packet arrives at the scheduler. Although an RPQ (Rotating Priority Queue) scheduler, requiring large size of buffers, does not use such operation, it can closely approximate the schedulability of an EDF scheduler. To overcome the buffer size problem of an RPQ scheduler, this paper proposes a new scheduler named MRPQ (Multiple Rotating Priority Queue). In a MRPQ scheduler, there are several layers with a set of Queues. In a layer, Queues are configured by using a new strategy named block Queue. A MRPQ scheduler needs nearly half of buffer size required in an RPQ scheduler and produces schedulability as good as an RPQ scheduler.

  • PDF