DOI QR코드

DOI QR Code

Scalable Application Mapping for SIMD Reconfigurable Architecture

  • Received : 2015.04.14
  • Accepted : 2015.08.23
  • Published : 2015.12.30

Abstract

Coarse-Grained Reconfigurable Architecture (CGRA) is a very promising platform that provides fast turn-around-time as well as very high energy efficiency for multimedia applications. One of the problems with CGRAs, however, is application mapping, which currently does not scale well with geometrically increasing numbers of cores. To mitigate the scalability problem, this paper discusses how to use the SIMD (Single Instruction Multiple Data) paradigm for CGRAs. While the idea of SIMD is not new, SIMD can complicate the mapping problem by adding an additional dimension of iteration mapping to the already complex problem of operation and data mapping, which are all interdependent, and can thus significantly affect performance through memory bank conflicts. In this paper, based on a new architecture called SIMD reconfigurable architecture, which allows SIMD execution at multiple levels of granularity, we present how to minimize bank conflicts considering all three related sub-problems, for various RA organizations. We also present data tiling and evaluate a conflict-free scheduling algorithm as a way to eliminate bank conflicts for a certain class of mapping problem.

Keywords

References

  1. R. Hartenstein, "A decade of reconfigurable computing: a visionary retrospective," in Proceedings of Design, Automation and Test in Europe, 2001, pp. 642-649.
  2. H. Park, K. Fan, S. A. Mahlke, T. Oh, H. Kim, and H.-s. Kim, "Edge-centric modulo scheduling for coarse-grained reconfigurable architectures," in Proceedings of PACT '08. New York, NY, USA: ACM, 2008, pp. 166-176.
  3. K. Wu, A. Kanstein, J. Madsen, and M. Berekovic, "Mt-ADRES:multithreading on coarse-grained reconfigurable architecture," in ARC'07. Berlin, Heidelberg: Springer-Verlag, 2007, pp. 26-38.
  4. H. Park, Y. Park, and S. Mahlke, "Polymorphic pipeline array: A flexible multicore accelerator with virtualized execution for mobile multimedia applications," in MICRO-42, dec. 2009, pp. 370- 380.
  5. Y. Kim, J. Lee, T. X. Mai, and Y. Paek, "Improving performance of nested loops on reconfigurable array processors," ACM Transactions on Architecture and Code Optimization, 2012.
  6. B. Mei, S. Vernalde, D. Verkest, H. D. Man, and R. Lauwereins, " ADRES: An architecture with tightly coupled VLIW processor and coarsegrained reconfigurable matrix," Lecture Notes in Computer Science, vol. 2778, pp. 61-70, 2003.
  7. Y. Kim, J. Lee, A. Shrivastava, J. Yoon, D. Cho, and Y. Paek, "High throughput data mapping for coarse-grained reconfigurable architectures," Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 30, no. 11, pp. 1599 -1609, nov. 2011. https://doi.org/10.1109/TCAD.2011.2161217
  8. Y. Kim, J. Lee, A. Shrivastava, and Y. Paek, "Operation and data mapping for cgras with multibank memory," SIGPLAN Not., vol. 45, no. 4, pp. 17-26, 2010. https://doi.org/10.1145/1755951.1755892
  9. R. Barua, W. Lee, S. Amarasinghe, and A. Agarawal, "Compiler support for scalable and efficient memory systems," IEEE Trans. Comput., vol. 50, pp. 1234-1247, November 2001. https://doi.org/10.1109/12.966497
  10. M. I. Gordon, W. Thies, and S. Amarasinghe, "Exploiting coarse-grained task, data, and pipeline parallelism in stream programs," in Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2006, pp. 151-162.
  11. H. Singh, M.-H. Lee, G. Lu, F. Kurdahi, N. Bagherzadeh, and E. Chaves Filho, "MorphoSys: an integrated reconfigurable system for dataparallel and computation-intensive applications," IEEE Trans. Comput., vol. 49, no. 5, pp. 465-481, 2000. https://doi.org/10.1109/12.859540
  12. Y. Lin, H. Lee, M. Woh, Y. Harel, S. Mahlke, T. Mudge, C. Chakrabarti, and K. Flautner, "Soda: A high-performance dsp architecture for softwaredefined radio," Micro, IEEE, vol. 27, no. 1, pp. 114-123, jan.-feb. 2007. https://doi.org/10.1109/MM.2007.22
  13. M. Woh, S. Seo, S. Mahlke, T. Mudge, C. Chakrabarti, and K. Flautner, "Anysp: anytime anywhere anyway signal processing," in Proceedings of the 36th annual International Symposium on Computer Architecture. ACM, 2009, pp. 128-139.
  14. G. Dasika, M. Woh, S. Seo, N. Clark, T. Mudge, and S. Mahlke, "Mighty-morphing power-SIMD," in Proceedings of International Conference on Compilers, Architectures and Synthesis for Embedded Systems. ACM, 2010, pp. 67-76.
  15. S. Kyo and S. Okazaki, "IMAPCAR: A 100 gops in-vehicle vision processor based on 128 ring connected four-way VLIW processing elements," J. Signal Process. Syst., vol. 62, pp. 5-16, January 2011. https://doi.org/10.1007/s11265-008-0297-0
  16. H. Fatemi, B. Mesman, H. Corporaal, and P. Jonker, "RC-SIMD: Reconfigurable communication SIMD architecture for image processing applications," Journal of Embedded Computing, vol. 2, pp. 167- 179, 2006.
  17. B. Bougard, B. De Sutter, D. Verkest, L. Van der Perre, and R. Lauwereins, "A coarse-grained array accelerator for software-defined radio baseband processing," IEEE Micro, vol. 28, pp. 41-50, July 2008. https://doi.org/10.1109/MM.2008.49
  18. Y. Kim, J. Lee, J. Lee, T. X. Mai, I. Heo, and Y. Paek, "Exploiting both pipelining and data parallelism with SIMD reconfigurable architecture," in Proceedings of Reconfigurable Computing: Architectures, Tools and Applications, LNCS, vol. 7199. Springer, 2012, pp. 40-52.
  19. Y. Kim, M. Kiemb, C. Park, J. Jung, and K. Choi, "Resource sharing and pipelining in coarse-grained reconfigurable architecture for domain-specific optimization," in Proceedings of Design, Automation and Test in Europe (DATE), 2005, pp. 12-17.
  20. C.-H. O. Chen, S. Park, T. Krishna, and L.-S. Peh, "A low-swing crossbar and link generator for lowpower networks-on-chip," in Proceedings of the International Conference on Computer-Aided Design (ICCAD), 2011, pp. 779-786.
  21. N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 simulator," SIGARCH Comput. Archit. News, vol. 39, pp. 1-7, Aug. 2011.
  22. D. Wang, B. Ganesh, N. Tuaycharoen, K. Baynes, A. Jaleel, and B. Jacob, "Dramsim: a memory system simulator," SIGARCH Comput. Archit. News, vol. 33, pp. 100-107, November 2005. https://doi.org/10.1145/1105734.1105748
  23. R. Gao, D. Xu, and J.P. Bentley, "Reconfigurable Hardware Implementation of an Improved Parallel Architecture for MPEG-4 Motion Estimation in Mobile Applications", Consumer Electronics, IEEE Transactions on, vol. 49, pp. 1383 - 1390, nov. 2003 https://doi.org/10.1109/TCE.2003.1261244
  24. C. Lo, S. Tsai, and M. Shieh, "Reconfigurable Architecture for Entropy Decoding and Inverse Transform in H.264", Consumer Electronics, IEEE Transactions on, vol. 56, pp. 1670 - 1676, aug. 2010 https://doi.org/10.1109/TCE.2010.5606311
  25. C. Lyuh, J. Suk, I. Chun, and T. Roh, "A Novel Reconfigurable Processor Using Dynamically Partitioned SIMD for Multimedia Applications" ETRI J., Volume 31, Number 6, Dec 2009, pp.709- 716 https://doi.org/10.4218/etrij.09.1209.0021
  26. K.S. Choi and S.J. Ko, "Adaptive Scanning Based on a Morphological Representation of Coefficients for H.264/AVC,"ETRI J., vol. 31, no. 5, Oct. 2009, pp. 607-609 https://doi.org/10.4218/etrij.09.0209.0263