• Title/Summary/Keyword: EEMBC

Search Result 8, Processing Time 0.021 seconds

Implementation of running an EEMBC Benchmark on Polaris-1 Board (Polaris-1 보드 상에서 EEMBC 벤치마크 동작 구현)

  • Bak, Giseong;Lee, Hokyoon;Kim, Seon Wook
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2010.04a
    • /
    • pp.86-88
    • /
    • 2010
  • MPSoC 는 저렴한 하드웨어 비용으로 신속하게 데이터를 처리할 수 있어 고성능 멀티미디어 프로그램, 이동통신기기, 텔레매틱스, 모바일 엔터테인먼트 기기에 맞는 솔루션을 제공하고 있다. 본 논문은 이러한 MPSoC 연구의 일환으로 ADChips 의 EISC 프로세서와 Zaram 의 DSP 를 이용하여 개발된 Polaris-1 보드에서 EEMBC 벤치마크 프로그램을 EISC 프로세서인 Empress 에서 동작할 수 있도록 하는 구현에 대한 연구를 소개한다. 본 논문에서 제시한 하나의 프로세서에 작업을 할당하는 방법을 확장함으로써, MPSoC 의 멀티코어를 사용하기 위한 프로그램을 개발 할 수 있을 것이다. 또한, 앞으로 지속적으로 연구될 Polaris-1 보드의 연구기반을 마련하였다고 볼 수 있다.

Study of Parallel Network Processor using Global Cache (글로벌 캐시를 이용한 네트워크 병렬 프로세서 구조 연구)

  • Park, Jae-Won;Chung, Won-Young;Kim, Hyun-Pil;Lee, Jung-Hee;Lee, Yong-Surk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.1B
    • /
    • pp.80-85
    • /
    • 2011
  • The mount of network traffic from the Internet is increasing because of the use of Broadband Convergence Networks(BcN). Network traffic is also increasing because of the development of application, especially multimedia traffic from IPTV, VOD, and online games. This multimedia traffic not only has a huge payload but also should be considered a threat in real time. For this reason, this study examines the ways that routers distribute the bandwidth in accordance to traffic properties. To classify the property of the traffic, it is essential to analyze the application layer. However, the general network processor architecture serially processes the L2-4 and L7 layer. We propose a novel parallel network processor architecture with a global cache that processes L2-4 and L7 in parallel. To verify the proposed architecture, we simulated both of the architecture with SystemC. EEMBC and SNORT was used to measure L2-4 and L7 processing time. When multimedia traffic was entered into the network processor in the same flow, the proposed architecture showed about 85% higher performance than general architecture.

Performance Analysis of Embedded Applications (임베디드 응용 프로그램 성능 분석)

  • 김선욱;오재근;한영선;최홍욱;김철우
    • Proceedings of the IEEK Conference
    • /
    • 2003.07d
    • /
    • pp.1355-1358
    • /
    • 2003
  • This paper presents performance analysis of the embedded application, called EEMBC consisting of 5 categories and total 34 applications. We measured various performance metrics, such as code sizes, TLP(Thread Level Parallelism) using OpenMP API and ILP(Instruction Level Parallelism) on ARM-modeled SimpleScalar in detail. We show that the embedded applications have the similar characteristics as integer applications to deliver low ILP and TLP in our environment. They have many small loops, which result in large instruction overhead in TLP and loop control overhead in ILP.

  • PDF

GCC based Compiler Construction for Compact DSP32

  • Cho, Myeong-Jin;Lee, Ho-Kyoon;Huong, Giang Nguyen Thi;Kim, Seon-Wook;Han, Young-Sun;Um, Jung-Young
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.04a
    • /
    • pp.43-45
    • /
    • 2011
  • Very Long Instruction Word (VLIW) executes multiple instructions in parallel. In order to exploit higher performance, i.e., higher parallelism, VLIW compiler groups as many instructions into one word as possible. In this paper, we show how to construct a VLIW C compiler based on GCC for CDSP32 (Compact Digital Signal Processor 32-bit) which is an embedded DSP processor to issue two instructions in one VLIW. Also, we evaluated the compiler on EEMBC benchmark; the experiment result showed that the total number of dynamic instructions of the VLIW compiler was reduced by 18% on average over without VLIW instruction scheduling.

DFA based Instruction Scheduling for Empress Processor (Empress 프로세서를 위한 DFA 기반 명령어 스케줄링)

  • Jung, Dong-Ha;Lee, Ho-Kyoon;Kim, Seon-Wook;Kim, Kwan-Young
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.04a
    • /
    • pp.11-13
    • /
    • 2011
  • 최근 스마트 폰, 타블렛 PC 등 고성능 모바일 기기에 대한 시장수요가 증가함에 따라 임베디드 프로세서에 대한 성능 최적화가 활발히 이루어지고 있다. 고성능 멀티미디어 시스템을 대상으로 설계된 Empress 프로세서는 다수의 기능 유닛을 포함하고 있어 명령어 스케줄링을 통해 성능 향상을 기대할 수 있으나 기존의 컴파일러는 이를 지원하지 않고 있다. 본 연구에서는 GNU C 컴파일러를 이용하여 Empress 프로세서를 위한 DFA 기반의 명령어 스케줄링 최적화를 구현하였다. 그 결과 EEMBC 벤치마크를 이용한 성능 분석에서 실행시간 기준 평균 8%의 향상이 있음을 확인하였다.

Performance Comparison between LLVM and GCC Compilers for the AE32000 Embedded Processor

  • Park, Chanhyun;Han, Miseon;Lee, Hokyoon;Cho, Myeongjin;Kim, Seon Wook
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.3 no.2
    • /
    • pp.96-102
    • /
    • 2014
  • The embedded processor market has grown rapidly and consistently with the appearance of mobile devices. In an embedded system, the power consumption and execution time are important factors affecting the performance. The system performance is determined by both hardware and software. Although the hardware architecture is high-end, the software runs slowly due to the low quality of codes. This study compared the performance of two major compilers, LLVM and GCC on a32-bit EISC embedded processor. The dynamic instructions and static code sizes were evaluated from these compilers with the EEMBC benchmarks.LLVM generally performed better in the ALU intensive benchmarks, whereas GCC produced a better register allocation and jump optimization. The dynamic instruction count and static code of GCCwere on average 8% and 7% lower than those of LLVM, respectively.

Exploiting Thread-Level Parallelism in Lockstep Execution by Partially Duplicating a Single Pipeline

  • Oh, Jaeg-Eun;Hwang, Seok-Joong;Nguyen, Huong Giang;Kim, A-Reum;Kim, Seon-Wook;Kim, Chul-Woo;Kim, Jong-Kook
    • ETRI Journal
    • /
    • v.30 no.4
    • /
    • pp.576-586
    • /
    • 2008
  • In most parallel loops of embedded applications, every iteration executes the exact same sequence of instructions while manipulating different data. This fact motivates a new compiler-hardware orchestrated execution framework in which all parallel threads share one fetch unit and one decode unit but have their own execution, memory, and write-back units. This resource sharing enables parallel threads to execute in lockstep with minimal hardware extension and compiler support. Our proposed architecture, called multithreaded lockstep execution processor (MLEP), is a compromise between the single-instruction multiple-data (SIMD) and symmetric multithreading/chip multiprocessor (SMT/CMP) solutions. The proposed approach is more favorable than a typical SIMD execution in terms of degree of parallelism, range of applicability, and code generation, and can save more power and chip area than the SMT/CMP approach without significant performance degradation. For the architecture verification, we extend a commercial 32-bit embedded core AE32000C and synthesize it on Xilinx FPGA. Compared to the original architecture, our approach is 13.5% faster with a 2-way MLEP and 33.7% faster with a 4-way MLEP in EEMBC benchmarks which are automatically parallelized by the Intel compiler.

  • PDF

GCC2Verilog Compiler Toolset for Complete Translation of C Programming Language into Verilog HDL

  • Huong, Giang Nguyen Thi;Kim, Seon-Wook
    • ETRI Journal
    • /
    • v.33 no.5
    • /
    • pp.731-740
    • /
    • 2011
  • Reconfigurable computing using a field-programmable gate-array (FPGA) device has become a promising solution in system design because of its power efficiency and design flexibility. To bring the benefit of FPGA to many application programmers, there has been intensive research about automatic translation from high-level programming languages (HLL) such as C and C++ into hardware. However, the large gap of syntaxes and semantics between hardware and software programming makes the translation challenging. In this paper, we introduce a new approach for the translation by using the widely used GCC compiler. By simply adding a hardware description language (HDL) backend to the existing state-of- the-art compiler, we could minimize an effort to implement the translator while supporting full features of HLL in the HLL-to-HDL translation and providing high performance. Our translator, called GCC2Verilog, was implemented as the GCC's cross compiler targeting at FPGAs instead of microprocessor architectures. Our experiment shows that we could achieve a speedup of up to 34 times and 17 times on average with 4-port memory over PICO microprocessor execution in selected EEMBC benchmarks.