Search | Korea Science

Shortest-Frame-First Scheduling Algorithm of Threads On Multithreaded Models (다중스레드 모델에서 최단 프레임 우선 스레드 스케줄링 알고리즘)

Sim, Woo-Ho;Yoo, Weon-Hee;Yang, Chang-Mo
- Journal of KIISE:Software and Applications
- /
- v.27 no.5
- /
- pp.575-582
- /
- 2000
Because FIFO thread scheduling used in the existing multithreaded models does not consider locality in programs, it may result in the decrease of the performance of execution, caused by the frequent context switching overhead and delay of execution of relatively short frames. Quantum unit scheduling enhances the performance a little, but it still has the problems such as the decrease in the processor utilization and the longer delay due to its heavy dependency on the priority of the quantum units. In this paper, we propose shortest-frame-first(SFF) thread scheduling algorithm. Our algorithm selects and schedules the frame that is expected to take the shortest execution time using thread size and synchronization information analyzed at compile-time. We can estimate the relative execution time of each frame at compile-time. Using SFF thread scheduling algorithm on the multithreaded models, we can expect the faster execution, better utilization of the processor, increased throughput and short waiting time compared to FIFO scheduling.
PDF

SimTBS: Simulator For GPGPU Thread Block Scheduling (SimTBS: GPGPU 스레드블록 스케줄링 시뮬레이터)

Cho, Kyung-Woon;Bahn, Hyokyung
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.20 no.4
- /
- pp.87-92
- /
- 2020
Although GPGPU (General-Purpose GPU) can maximize performance by parallelizing a task with tens of thousands of threads, those threads are internally grouped into a thread block, which is a base unit for processing and resource allocation. A thread block scheduler is a specialized hardware gadget whose role is to allocate thread blocks to GPGPU processing hardware in a round-robin manner. However, round-robin is a sequential allocation policy and is not optimized for GPGPU resource utilization. In this paper, we propose a thread block scheduler model which can analyze and quantify performances for various thread block scheduling policies. Experiment results from the implemented simulator of our model show that the legacy hardware thread block scheduling does not behave well when workload becomes heavy.
https://doi.org/10.7236/JIIBC.2020.20.4.87 인용 PDF KSCI HTML

The Node Scheduling of Multi-Threaded Process for CC-NUMA System (CC-NUMA 시스템을 위한 다중 스레드 프로세스의 노드 스케줄링 설계 및 구현)

Kim, Jeong-Nyeo;Kim, Hae-Jin;Lee, Cheol-Hoon
- The Transactions of the Korea Information Processing Society
- /
- v.7 no.2
- /
- pp.488-496
- /
- 2000
this paper describes the design and implementation of node scheduling for MX Server that is CC-NUMA System COMSIX, the operating system of MX Server, is designed to suit for CC-NUMA Architecture. MX Server consists of up to 8 nodes, and each node is connected by SCI ring. This node scheduling scheme considers data locality for performance improvement of Oracle8i DBMS on the CC-NUMA architecture. For DBMS such as Oracle8i, a multi-threaded process may be run to tie on particular disk. We have developed a CG binding function that the multi-threaded process bound the node. Currently, We don't have an available CC-NUMA Platform. Instead of MX Server, we developed the Node scheduling scheme for multi-threaded process to suit server platform on the PC test-bed and tested completely.
PDF

The Enhanced Thread Partitioning of Conditional Expressions of Non-Strict Programs (Non-Strict 프로그램 조건식의 향상된 스레드 분할)

Jo, Sun-Moon;Yang, Chang-Mo;Yoo, Weon-Hee
- Proceedings of the Korea Information Processing Society Conference
- /
- 2000.04a
- /
- pp.277-280
- /
- 2000
다중스레드 병렬기계(multithreaded parallel machine)를 위하여 함수 프로그램을 번역할 때 스레드 분할이란 수행 순서를 번역시간에 알 수 있어 정적 스케줄링이 가능한 프로그램의 부분을 식별하여 스레드로 모으는 작업을 말한다. 조건식에서 연산의 수행 순서는 판단식 -> 참실행식 또는 판단식 -> 거짓실행식이므로 번역시간에는 수행순서를 결정할 수 없다. 따라서 기존의 분할 알고리즘은 조건식의 판단식, 참실행식, 거짓실행식을 기본 블록으로 나누고 각각에 대하여 지역 분할을 적용한다. 이러한 제약은 스레드의 정의를 약간 수정하여 스레드 내에서의 분기를 허용한다면 좀더 좋은 분할을 얻을 수 있다. 스레드내에서의 분기는 병렬성을 감소시키거나 동기화의 횟수를 증가시키거나 또는 교착상태를 발생시키는 등의 스레드 분할의 기본 원칙을 어기지 않으며 오히려 스레드 길이를 증가시키거나 동기화 횟수를 줄이는 장점을 가질 수 있다. 본 논문에서는 조건식의 세 가지 기본 블록을 하나 또는 두 개의 기본 블록으로 병합함으로서 스레드 분할을 향상시키는 방법을 제안한다.
PDF

Typed Separation Set Partitioning for Thread Partitioning of Non-strict functional Programs (비평가인자 함수 프로그램의 스레드 분할 향상을 위한 자료형 분리 집합 분할알고리즘)

Yang, Chang-Mo;Joo, Hyung-Seok;Yoo, Weon-Hee
- The Transactions of the Korea Information Processing Society
- /
- v.5 no.8
- /
- pp.2127-2136
- /
- 1998
비평가인자 함수 언어는 비평가인자 어의로 인하여 기존의 von Neumann 형 병렬기에서 효율적인 수행을 어렵게 하는 미세수준의 동적 스케줄링 단위로 병합하는 과정이 중요하다. 이러한 과정을 스레드 분할이라 한다. 본 논문에서는 비평가인자 함수 프로그램을 스레드로 분할하는 자료형 분리집합 분할이라는 스레드 분할 알고리즘을 제안한다. 자료형 분리 집합 분할 알고리즘은 자료형을 비교할 수 없는 입력명과 출력명 사이에는 잠재 종속이 존재할 수 없다는 사실을 이용하여 스레드 분할을 수행한다. 이 방법을 사용하면 기존의 스레드 분할 방법에서 실패하는 스레드의 병합이 가능하며, 기존의 분할 알고리즘보다 더 큰 스레드를 생성할 수 있다.
PDF

Thread Block Scheduling for Multi-Workload Environments in GPGPU (다중 워크로드 환경을 위한 GPGPU 스레드 블록 스케줄링)

Park, Soyeon;Cho, Kyung-Woon;Bahn, Hyokyung
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.22 no.2
- /
- pp.71-76
- /
- 2022
Round-robin is widely used for the scheduling of large-scale parallel workloads in the computing units of GPGPU. Round-robin is easy to implement by sequentially allocating tasks to each computing unit, but the load balance between computing units is not well achieved in multi-workload environments like cloud. In this paper, we propose a new thread block scheduling policy to resolve this situation. The proposed policy manages thread blocks generated by various GPGPU workloads with multiple queues based on their computation loads and tries to maximize the resource utilization of each computing unit by selecting a thread block from the queue that can maximally utilize the remaining resources, thereby inducing load balance between computing units. Through simulation experiments under various load environments, we show that the proposed policy improves the GPGPU performance by 24.8% on average compared to Round-robin.
https://doi.org/10.7236/JIIBC.2022.22.2.71 인용 PDF KSCI HTML

Implementation of an LLF Scheduler for the Hard Real-time OS, RT-eCos3.0 (경성 실시간 운영체제 RT-eCos3.0을 위한 LLF 스케줄러의 구현)

Yoo, Hwee-Jae;Kim, Jung-Guk
- Proceedings of the Korean Information Science Society Conference
- /
- 2011.06b
- /
- pp.395-397
- /
- 2011
RT-eCos3.0은 대표적 분산 실시간 객체 모델인 TMO(Time-triggered Message-triggered Object)의 실행을 제공하기 위하여 공개소스 eCos3.0 기반으로 개발된 초경량 경성 실시간 임베디드 운영체제이다. RT-eCos3.0에서는 그간 스레드의 최장 수행 시간 입력이 필요 없는 EDF 및 FIFO 스케줄러를 지원하여 왔다. 본 논문에서는 TMO의 시간 구동 스레드와 메시지 구동 스레드의 스레드 등록 시 최장 수행 시간을 입력 받아 이를 기반으로 마감시간까지의 수행시간 대비 잔여시간을 이용하는 LLF (Least Laxity First) 스케줄러를 클럭 인터럽트 핸들러 내에 구현하고 각 스레드로 하여금 스케줄링 정책을 선택할 수 있도록 구현하였다.

A Novel Cooperative Warp and Thread Block Scheduling Technique for Improving the GPGPU Resource Utilization (GPGPU 자원 활용 개선을 위한 블록 지연시간 기반 워프 스케줄링 기법)

Thuan, Do Cong;Choi, Yong;Kim, Jong Myon;Kim, Cheol Hong
- KIPS Transactions on Computer and Communication Systems
- /
- v.6 no.5
- /
- pp.219-230
- /
- 2017
General-Purpose Graphics Processing Units (GPGPUs) build massively parallel architecture and apply multithreading technology to explore parallelism. By using programming models like CUDA, and OpenCL, GPGPUs are becoming the best in exploiting plentiful thread-level parallelism caused by parallel applications. Unfortunately, modern GPGPU cannot efficiently utilize its available hardware resources for numerous general-purpose applications. One of the primary reasons is the inefficiency of existing warp/thread block schedulers in hiding long latency instructions, resulting in lost opportunity to improve the performance. This paper studies the effects of hardware thread scheduling policy on GPGPU performance. We propose a novel warp scheduling policy that can alleviate the drawbacks of the traditional round-robin policy. The proposed warp scheduler first classifies the warps of a thread block into two groups, warps with long latency and warps with short latency and then schedules the warps with long latency before the warps with short latency. Furthermore, to support the proposed warp scheduler, we also propose a supplemental technique that can dynamically reduce the number of streaming multiprocessors to which will be assigned thread blocks when encountering a high contention degree at the memory and interconnection network. Based on our experiments on a 15-streaming multiprocessor GPGPU platform, the proposed warp scheduling policy provides an average IPC improvement of 7.5% over the baseline round-robin warp scheduling policy. This paper also shows that the GPGPU performance can be improved by approximately 8.9% on average when the two proposed techniques are combined.
https://doi.org/10.3745/KTCCS.2017.6.5.219 인용 PDF KSCI

Thread Block Scheduling for GPGPU based on Fine-Grained Resource Utilization (상세 자원 이용률에 기반한 병렬 가속기용 스레드 블록 스케줄링)

Bahn, Hyokyung;Cho, Kyungwoon
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.22 no.5
- /
- pp.49-54
- /
- 2022
With the recent widespread adoption of general-purpose GPUs (GPGPUs) in cloud systems, maximizing the resource utilization through multitasking in GPGPU has become an important issue. In this article, we show that resource allocation based on the workload classification of computing-bound and memory-bound is not sufficient with respect to resource utilization, and present a new thread block scheduling policy for GPGPU that makes use of fine-grained resource utilizations of each workload. Unlike previous approaches, the proposed policy reduces scheduling overhead by separating profiling and scheduling, and maximizes resource utilizations by co-locating workloads with different bottleneck resources. Through simulations under various virtual machine scenarios, we show that the proposed policy improves the GPGPU throughput by 130.6% on average and up to 161.4%.
https://doi.org/10.7236/JIIBC.2022.22.5.49 인용 PDF KSCI HTML

A Benchmark Suite for Data Race Detection Technique in GPGPU Progrmas (GPGPU 프로그램의 자료경합 탐지기법을 위한 벤치마크 모음)

Lee, Keonpyo;Choi, Eu-Teum;Jun, Yong-Kee
- Proceedings of the Korean Society of Computer Information Conference
- /
- 2019.01a
- /
- pp.7-8
- /
- 2019
자료경합은 두 개 이상의 스레드가 같은 공유메모리에 적절한 동기화 없이 접근하고, 적어도 한 개의 접근사건이 쓰기일 때 발생할 수 있는 동시성 오류이다. 자료경합은 프로그래머가 의도하지 않은 비결정적인 수행결과를 초래하여, 항공기 소프트웨어와 같은 고신뢰성이 요구되는 프로그램에서 치명적인 오류를 발생시켜 인적 물적 손해로 이어질 수 있다. 자료경합 탐지기법은 이러한 문제를 사전에 탐지하여 수정하는데 사용되어진다. 하지만 GPGPU 프로그램에서의 자료경합은 CPU 병행프로그램에서보다 복잡한 실행구조를 가지고 있어 스레드 및 메모리 계층, 스케줄링, 동기화 기법 등의 많은 변수가 존재한다. 이로 인해 실세계 프로그램에 자료경합 탐지기법을 적용하여 검증 시 이러한 변수들을 반영하여 실험하는데 많은 노력이 소요된다. 본 논문은 실세계 프로그램에서의 자료경합을 대표하는 4가지 패턴의 합성프로그램으로 이루어지고 실행 시 스레드 및 메모리 계층, 스레드 구조, 메모리 사용량 및 동기화 방안을 지정할 수 있는 벤치마크 모음을 제시한다.
PDF

Search Result 18, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)