Code Size Reduction Through Efficient use of Multiple Load/store Instructions

복수의 메모리 접근 명령어의 효율적인 이용을 통한 코드 크기의 감소

  • 안민욱 (서울대학교 전기컴퓨터공학부) ;
  • 조두산 (서울대학교 전기컴퓨터공학부) ;
  • 백윤흥 (서울대학교 전기컴퓨터공학부) ;
  • 조정훈 (경북대학교 전자전기컴퓨터공학부)
  • Published : 2005.08.01

Abstract

Code size reduction is ever becoming more important for compilers targeting embedded processors because these processors are often severely limited by storage constraints and thus the reduced code size can have a positively significant Impact on their performance. Various code size reduction techniques have different motivations and a variety of application contexts utilizing special hardware features of their target processors. In this work, we propose a novel technique that fully utilizes a set of hardware instructions, called the multiple load/store (MLS), that are specially featured for reducing code size by minimizing the number of memory operations in the code. To take advantage of this feature, many microprocessors support the MLS instructions, whereas no existing compilers fully exploit the potential benefit of these instructions but only use them for some limited cases. This is mainly because optimizing memory accesses with MLS instructions for general cases is an NP-hard problem that necessitates complex assignments of registers and memory off-sets for variables in a stack frame. Our technique uses a couple of heuristics to efficiently handle this problem in a polynomial time bound.

하나의 instruction으로 여러 메모리 블록을 읽거나 쓰는 MLS(Multiple Load/store) 명령어를 사용하면 전체 코드에서 메모리 명령어의 수를 최소화해서 코드 사이즈를 축소할 수 있다. 이러한 장점 때문에 많은 마이크로 프로세서에서 이 명령어를 지원하고 있으나 현재까지 개발되어 있는 컴파일러들은 MLS 명령어의 장점을 효과적으로 이용하고 있지 못하고 있고 오직 제한적인 용도로 MLS 명령어를 사용하고 있다. 기존의 컴파일러에서 MLS 명령어를 효율적으로 지원하지 못하는 것은 일반적으로 MLS 명령어를 효과적으로 이용하기 위해서 해결해야 할 문제가 NP-hard의 범주에 속하기 때문이다. 이것은 stack frame에서 변수들에 대한 최적의 메모리 옵셋을 찾는 문제와 레지스터 할당에 관련된 복합적인 문제이다. 본 논문에서는 heuristic 기법을 효율적으로 이용하여 위에 언급된 문제를 polynomial time bound에 해결할 수 있는 기법을 제안한다.

Keywords

References

  1. S. Liao, S. Devadas, K. Keutzer, and S. Tjiang. Storage Assignment to Decrease Code Size. Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation, pages 186-195, 1995 https://doi.org/10.1145/207110.207139
  2. A. Rao and S. Pande. Storage Assignment Optimizations to Generate Compact and Efficient Code on Embedded DSPs. In Proceedings of the SIGPLAN Conference on Programming Languages Design and Implementation, pages 128-138, May 1999 https://doi.org/10.1145/301618.301653
  3. ARM, www.arm.com. ARM Instruction Set Quick Reference Card
  4. ARM, www.arm.com. ARM Developer Suite Version 1.2, Nov. 2001
  5. D. Bartley. Optimizing Stack Frame Accesses for Processors with Restricted Addressing Modes. Software Practice & Experience, 22(2), 1992 https://doi.org/10.1002/spe.4380220202
  6. R. Leupers and F. David. A Uniform Optimization Technique for Offset Assignment Problems. In International Symposium on Systems Synthesis, pages 3-8, 1998 https://doi.org/10.1109/ISSS.1998.730589
  7. Yoonseo Choi and Taewhan Kim. Address Assignment Combined with Scheduling in DSP Code Generation. In Design Automation Conference, 2002 https://doi.org/10.1145/513918.513975
  8. X. Zhuang, C. Lau, and S. Pande. Storage Assignment Optimizations through Variable Coalescence for Embedded Processors. In Proceedings of the SIGPLAN Conference on Languages, Compiler and Tools for Embedded Systems, pages 220-231, June 2003 https://doi.org/10.1145/780732.780763
  9. Y. Paek, M. Ahn, and S. Lee. Case Studies on Automatic Extraction of Target-specific Architectural Parameters in Complex Code Generation. In Workshop on Software and Compilers for Embedded Systems, Sep. 2003 https://doi.org/10.1007/b13482
  10. V. Nandivada and J. Palsberg. Efficient Spill Code for SDRAM. International Conference on Compilers, Architectures and Synthesis for Embedded Systems, Nov. 2003 https://doi.org/10.1145/951710.951716
  11. A. Buchsbaum, R. Giancarlo, and J. Westbrook. On Reduction via Determinization of speech Recognition Lattices. Technical report, AT&T Bell Labs, 1997
  12. V. Zivojinovic, J.M. Velarde, C. Schager, and H. Meyr, DSPStone-A DSP oriented Benchmarking Methodology. In Proceedings of International Conference on Signal Processing Applications and Technology, 1994
  13. C. Lee, M. Potkonjak, and W Mangione - smith. MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems. In Proceedings of the sth Annual IEEE/ACM International Symposium on Microarchitecture, pages 330-335, Nov. 1997 https://doi.org/10.1109/MICRO.1997.645830
  14. J. Kim, S. Jung, Y. Paek, and G. Uh. Experience with a Retargetable Compiler for a Commercial Network Processor. In International Conference on Compilers, Architectures and Synthesis for Embedded Systems, Oct. 2002 https://doi.org/10.1145/581630.581658
  15. Y. Paek, M. Ahn, and S. Lee. Case Studies on Automatic Extraction of Target-specific Architectural Parameters in Complex Code Generation. In Workshop on Software and Compilers for Embedded Systems, Sep. 2003 https://doi.org/10.1007/b13482
  16. G. Chaitan. Register Allocation and Spilling via Graph Coloring. In Proceedings of the SIGPLAN symposium on Compiler Construction, pages 201-207, June 1982 https://doi.org/10.1145/872726.806984
  17. R. Stallman. Using the GNU Compiler Collection. Free Software Foundations. Dec. 2002
  18. Embedded Concepts & Solutions, Inc., www.goecs.com. ARM Technical Tidbits, 2002
  19. M. Franklin and T. Wolf. Power Considerations in Network Processor Design. In Network Processor Design - Issues & Practices: Volume II. Morgan Kaufmann Pub., Sep. 2003
  20. M. Sanchez-Elez, M. Fernandez, M. Anido, H. Du, N. Bagherzadeh, and R. Hermida. Low Energy Data Management for Different On-Chip Memory Levels in Multi-Context Reconfigurable Architectures. In Design Automation Conference, June 2003