Fast and Accurate Performance Estimation of Bus Matrix for Multi-Processor System-on-Chip (MPSoC)

멀티 프로세서 시스템-온-칩(MPSoC)을 위한 버스 매트릭스 구조의 빠르고 정확한 성능 예측 기법

  • 김성찬 (서울대학교 전기컴퓨터공학부) ;
  • 하순회 (서울대학교 전기컴퓨터공학부)
  • Published : 2008.12.15

Abstract

This paper presents a performance estimation technique based on queuing analysis for on-chip bus matrix architectures of Multi-Processor System-on-Chips(MPSoCs). Previous works relying on time-consuming simulation are not able to explore the vast design space to cope with increasing time-to-market pressure. The proposed technique gives accurate estimation results while achieving faster estimation time than cycle -accurate simulation by order of magnitude. We consider the followings for the modeling of practical memory subsystem: (1) the service time with the general distribution instead of the exponential distribution and (2) multiple-outstanding transactions to achieve high performance. The experimental results show that the proposed analysis technique has the accuracy of 94% on average and much shorter runtime ($10^5$ times faster at least) compared to simulation for the various examples: the synthetic traces and real-time application, 4-channel DVR.

본 논문은 큐잉 이론을 이용한 멀티 프로세서 시스템-온-칩(MPSoC)의 버스 매트릭스 기반 통신 구조에 대한 성능 예측 기법을 제안한다. 버스 매트릭스 기반 통신 구조는 다양한 설계 인자를 가지고 있어 이에 대한 성능 최적화는 방대한 설계 공간의 탐색을 필요로 하지만, 현재 널리 사용되고 있는 시뮬레이션에 기반한 방법은 많은 시간을 요하기 때문에 점점 짧아지고 있는 시장 적기 출하(time-to-market) 제약 조건을 만족하기 어렵다. 이러한 문제를 해결하기 위하여 본 논문에서는 시뮬레이션보다 훨씬 빠르면서 정확하게 성능을 예측할 수 있는 기법을 개발하였다. 제안한 성능 분석 기법은 고성능의 버스 매트릭스를 위해 사용되는 버스 프로토콜인 multiple outstanding transaction을 고려한다. 또한 지수 분포(exponential distribution)를 이용하여 비현실적으로 메모리 시스템을 모델하였던 기존의 연구들과 달리 실제적인 메모리 시스템 모델을 위하여 일반 분포(general distribution)를 이용하였다. 제안한 성능 예측 기법의 정확도 및 효율성을 검증하기 위하여 무작위로 생성된 버스 트랜잭션들과 4-채널 DVR 예제에 적용하였을 때, 사이클 단위의 정확도를 갖는 시뮬레이션과 비교하여 $10^5$배 이상 빠르면서 평균 94% 이상의 정확도를 갖는 것으로 분석되었다.

Keywords

References

  1. M. Loghiy, F. Angiolini, D. Bertozzi, and L. Benini, "Analyzing on-chip communication in a MPSoC Environment," in Proc. Design Automation and Test in Europe, pp. 752-757, Feb. 2004
  2. Niagara 2 Opens the Floodgates, Microprocessor Report, Nov. 2006
  3. Advanced AMBA 3 Interconnect IP (PL301), ARM, http://www.arm.com/products/solutions/PL301_AMBA3AXI.html
  4. K. Lahiri, A. Raghunathan, and S. Dey, "Design space exploration for optimizing on-chip communication architectures," IEEE Transactions on Computer-Aided Design of integrated circuits and systems, Vol.23, No.6, Jun. 2004
  5. S. Kim and S. Ha, "Efficient exploration of bus-based System-on-Chip architectures," IEEE Transactions on Very Large Scale Integration (VLSI) systems, Vol.14, No.7, pp. 681-692, Jul. 2006 https://doi.org/10.1109/TVLSI.2006.878260
  6. AXI, ARM, http://www.arm.com/products/solutions/ AMBA3AXI.html
  7. J. Yoo, S. Yoo, and K. Choi, "Communication architecture synthesis of cascaded bus matrix," in Proc. ASP-DAC, pp. 171-177, Jan. 2007
  8. S. Pasricha, N. Dutt, and M. Ben-Romdhane, "Constraint-driven bus matrix synthesis for MPSoC," in Proc. Asia and South Pacific Design Automation Conference, pp. 30-35, Jan. 2006
  9. O. Ogawa, S. Bayon de Noyer, P. Chauvet, K. Shinohara, Y. Watanabe, H. Niizuma, T. Sasaki, and Y. Takai, "A practical approach for bus architecture optimization at transaction level," in Proc. Design Automation and Test in Europe, pp. 176-181, Mar. 2003
  10. S. Kim, C. Im, and S. Ha, "Schedule-aware performance estimation of communication architecture for efficient design space exploration," IEEE Transactions on Very Large Scale Integration (VLSI) systems, Vol.13, No.5, pp. 539-552, May 2005 https://doi.org/10.1109/TVLSI.2004.842912
  11. K. Lahiri, A. Raghunathan, and S. Dey, "System- level performance analysis for designing system- on-chip communication architecture," IEEE Transactions on Computer-Aided Design of integrated circuits and systems, Vol.20, pp. 768-783, Jun. 2001 https://doi.org/10.1109/43.924830
  12. P. V. Knudsen and J. Madsen, "Communication estimation for hardware/software codesign," in Proc. International Workshop on Hardware/Software Codesign, pp. 55-59, Mar. 1998
  13. P. Knudsen and J. Madsen, "Integrating communication protocol selection with partitioning in hardware/software codesign," in Proc. International Symposium on System Level Synthesis, pp. 111-116, Dec. 1998
  14. S. Pasricha, N. Dutt, E. Bozorgzadeh, and M. Ben-Romdhane, "FABSYN: Floorplan-aware bus architecture synthesis," IEEE Transactions on Very Large Scale Integration (VLSI) systems, Vol.14, No.3, pp. 241-253, Mar. 2006 https://doi.org/10.1109/TVLSI.2006.871763
  15. S. Murali, L. Benini, and G. De Micheli, "An application-specific design methodology for on-chip crossbar generation," IEEE Transactions on Computer-Aided Design of integrated circuits and systems, Vol.26, No.7, pp. 1283-1296, Jul. 2007 https://doi.org/10.1109/TCAD.2006.888284
  16. E.-G. Jung, J.-G. Lee, S.-H. Kwak, K.-S. Jhang, J.-A Lee, and D.-S. Har, "High performance asynchronous on-chip bus with multiple issue and out-of-order/In-order completion," in Proc. ACM Great Lake Symposium on VLSI, pp. 152-155, Apr. 2005
  17. S. Lee and S.-C. Park, "Transaction analysis of multiprocessor based platform with bus matrix," in Proc. Workshop on System-on-Chip for Real-Time Applications, pp. 552-556, Jul. 2005
  18. J. Hu, U. Y. Ogras, and R. Marculescu, "System- level buffer allocation for application-specific Networks-on-Chip router design," IEEE Transactions on Computer-Aided Design of integrated circuits and systems, Vol.25, No.12, pp. 2919-2933, Dec. 2006 https://doi.org/10.1109/TCAD.2006.882474
  19. L. Kleinrock, "Queueing systems, Volume I: Theory," Wiley Interscience, New York, 1975
  20. SystemC Language Reference Manual, ver 2.1. (2005, May). http://www.systemc.org/web/sitedocs/ lrm_2_1.html
  21. G. Varatkar and R. Marculescu, "On-chip traffic modeling and synthesis for mpeg-2 video applications," IEEE Transactions on Very Large Scale Integration (VLSI) systems, Vol.12, No.1, pp. 108- 119, Jan. 2004 https://doi.org/10.1109/TVLSI.2003.820523