DOI QR코드

DOI QR Code

An Implementation of a Memory Operation System Architecture for Memory Latency Penalty Reduction in SIMT Based Stream Processor

Memory Latency Penalty를 개선한 SIMT 기반 Stream Processor의 Memory Operation System Architecture 설계

  • Received : 2014.09.02
  • Accepted : 2014.09.23
  • Published : 2014.09.30

Abstract

In this paper, we propose a memory operation system architecture for memory latency penalty reduction in SIMT architecture based stream processor. The proposed architecture applied non-blocking cache architecture to reduce cache miss penalty generated by blocking cache architecture. We verified that the proposed memory operation architecture improve the performance of the stream processor by comparing processing performances of various algorithms. We measured the performance improvement rate that was improved in accordance with the ratio of memory instruction in each algorithm. As a result, we confirmed that the performance of stream processor improves up to minimum 8.2% and maximum 46.5%.

본 논문은 Memory Latency Penalty를 개선한 SIMT Architecture 기반 Stream Processor의 Memory Operation System Architecture를 제안한다. 제안하는 구조는 Non-Blocking Cache Architecture를 적용하여 기존의 Blocking Cache Architecture에서 발생하는 Cache Miss Penalty를 개선하였고 다양한 알고리즘의 처리속도를 비교하여 제안하는 Memory Operation System Architecture를 적용한 Stream Processor의 성능 향상을 검증하였다. 실험은 각 알고리즘의 Memory 명령어의 비율에 따라 향상된 성능을 측정하여 Stream Processor의 성능이 최소 8.2%에서 최대 46.5%까지 향상됨을 확인하였다.

Keywords

References

  1. Sung Su Kim, "Table-based thread reconvergence mechanism on SIMT processor", The Graduate School of Yonsei University, 2011
  2. Kwang-Yeob Lee, Tae-Ryong Park, "Method of Multi Thread Management based on Shader Instruction for Mobile GPGPU", Journal of IKEEE. Vol.16, No.4, 310-315, December 2012 https://doi.org/10.7471/ikeee.2012.16.4.310
  3. Jianmin Chen, Xi Tao, Jih-Kwon Peir, "Guided Region-Based GPU Scheduling: Utilizing Multi-thread Parallelism to Hide Memory Latency", 2013 IEEE 27th International Symposium on, 441-451, 2013
  4. Xiaosong Ma, Gomes, B, Quittek, J.W. "Efficient fine-grain thread migration with active threads", Parallel Processing Symposium 1998, 410-414, 1998
  5. Wilson W. L. Fung, Ivan Sham, George Yuan, Tor M., "DynamicWarp Formation and Scheduling for Efficient GPU Control Flow", MICRO 2007, 407--420,2007
  6. Ji Kim, Christoper Torng, Shreesha Srinath, "Microarchitectural mechanisms to exploit value structure in simt architectures", 40th ACM/IEEE Int'l Symp. on Computer Architecture (ISCA), 2013
  7. Seungpil Lee, "Design of a non-blocking instruction and data cache controller for SMT microprocessors", The Graduate School of Yonsei University, 2002
  8. J. A. Stratton et al. parboil, "A Revised Benchmark Suite for Scientific and Commercial Throughput Computing", Technical report, UIUC, IMPACT-12-01, 2009