An Efficient Bit Stream Instruction-set for Network Packet Processing Applications

네트워크 패킷 처리를 위한 효율적인 비트 스트림 명령어 세트

  • Yoon, Yeo-Phil (Electrical and Electronic Engineering, Yonsei University) ;
  • Lee, Yong-Surk (Electrical and Electronic Engineering, Yonsei University) ;
  • Lee, Jung-Hee (Electronics and Telecommunications Research Institute)
  • 윤여필 (연세대학교 전기전자공학과) ;
  • 이용석 (연세대학교 전기전자공학과) ;
  • 이정희 (한국전자통신연구원)
  • Published : 2008.10.25

Abstract

This paper proposes a new set of instructions to improve the packet processing capacity of a network processor. The proposed set of instructions is able to achieve more efficient packet processing by accelerating integration of packet headers. Furthermore, a hardware configuration dedicated to processing overlay instructions was designed to reduce additional hardware cost. For this purpose, the basic architecture for the network processor was designed using LISA and the overlay block was optimized based on the barrel shifter. The block was synthesized to compare the area and the operation delay, and allocated to a C-level macro function using the compiler known function (CKF). The improvement in performance was confirmed by comparing the execution cycle and the execution time of an application program. Experiments were conducted using the processor designer and the compiler designer from Coware. The result of synthesis with the TSMC ($0.25{\mu}m$) from Synopsys indicated a reduction in operation delay by 20.7% and an improvement in performance of 30.8% with the proposed set of instructions for the entire execution cycle.

본 논문은 네트워크 프로세서의 패킷 처리 능력 향상을 위한 새로운 명령어 세트를 제한한다. 제안하는 명령어는 패킷 헤더의 결합 연산을 가속화 할 수 있으므로 보다 효율적인 패킷 처리를 수행할 수 있다. 또한 overlay 명령어 처리를 위한 전용 하드웨어 구조를 설계하여 추가 하드웨어로 인한 비용을 최소화 하였다. 이를 위해 LISA 언어를 이용하여 네트워크 프로세서 기본 아키텍처를 설계하고 overlay 블록을 배럴 시프터를 기반으로 최적화 하였다. 이를 합성하여 면적 및 동작 지연시간을 비교하였으며, 컴파일러의 CKF(Compiler Known Function)를 이용하여 C레벨의 매크로 함수에 할당하고 어플리케이션 프로그램에 대한 실행 사이클 및 실행 시간을 비교하여 성능 향상을 확인하였다. Coware사의 processor designer, compiler designer를 이용하여 실험하였으며 Synopsys의 TSMC $0.25{\mu}m$로 합성한 결과 20.7%의 동작 지연시간 감소를 보였고, 전체 실행 사이클에선 제안하는 명령어 세트에 의해 30.8%의 성능 향상을 보였다.

Keywords

References

  1. Haiyong Xie, Li Zhao and Laxmi Bhuyan, "Architectural Analysis and Instruction-set Optimization for Design of Network Protocol Processors.", ACM, October 2003
  2. Matthias Grunewald and 8 person, "Network Application Driven Instruction Set Extention for Embeded Processing Clusters.", in Proceedings of PARELEC, September 2004
  3. Bengu Li and Rajiv Gupta, "Bit Section Instruction Set Extention of ARM Processor Accelator Using Reconfiguration Logic.", ACM, October, 2002
  4. Gokhan Memik, Seda Ogrenci Memik and William H.Mangione-Smith, "Design and Analysis of a Layer Seven Network Processor Accelator using Reconfigurable Logic." in IEEE Symposium, 2002
  5. J.Wagner and R. Leupers, "C Compiler Design for a Network Processor.", in IEEE Transctions on Computer-Aided Design, November 2001
  6. Tilman Wolf, "Design of a Instruction set for Modular Network Processor.", IBM Research Report, 27 October 2000
  7. Network Test Bench, EEMBC Inc. [Online]. Available: http://www.eembc.com
  8. Lal George and Mathias Blume, "Taming the IXP Network Processor", ACM 2003
  9. Woo-Kyeong Jeong and Yong-Surk Lee, "A Universal Shifter with Packed Data Formats", AEU 2003