• Title/Summary/Keyword: Embedded Processors

Search Result 162, Processing Time 0.023 seconds

Shadow Register Scheme for Media Processing in Embedded Processors (내장형 프로세서에서의 미디어 처리를 위한 Shadow Register 기법)

  • 안성수;김현규;이성재;오형철
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.10a
    • /
    • pp.547-549
    • /
    • 2004
  • 비교적 적은 수의 레지스터를 사용하는 내장형 프로세서에서 미디어 데이터를 처리할 때, 레지스터 부족으로 인하여 프로세서의 성능이 현저히 감소하는 경우가 있다. 본 논문에서는 이를 Shadow 레지스터 기법을 사용하여 해결하는 방안을 제안한다. 프로토타입 프로세서를 사용한 비교 실험에서, 제안된 기법은 약 16.7%의 하드웨어 추가로 구현될 수 있으며, 실행기간을 약 16-28%, 감소시키고 실행 프로그램의 크기를 약 3.3-5% 감소시킬 수 있음을 보였다. 본 논문의 실험 결과는 이상적인 메모리 모델 하에서 얻어진 것으로서 실제적인 환경에서는 더욱 큰 이득이 예상된다.

  • PDF

Performance Enhancement and Evaluation of AES Cryptography using OpenCL on Embedded GPGPU (OpenCL을 이용한 임베디드 GPGPU환경에서의 AES 암호화 성능 개선과 평가)

  • Lee, Minhak;Kang, Woochul
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.7
    • /
    • pp.303-309
    • /
    • 2016
  • Recently, an increasing number of embedded processors such as ARM Mali begin to support GPGPU programming frameworks, such as OpenCL. Thus, GPGPU technologies that have been used in PC and server environments are beginning to be applied to the embedded systems. However, many embedded systems have different architectural characteristics compare to traditional PCs and low-power consumption and real-time performance are also important performance metrics in these systems. In this paper, we implement a parallel AES cryptographic algorithm for a modern embedded GPU using OpenCL, a standard parallel computing framework, and compare performance against various baselines. Experimental results show that the parallel GPU AES implementation can reduce the response time by about 1/150 and the energy consumption by approximately 1/290 compare to OpenMP implementation when 1000KB input data is applied. Furthermore, an additional 100 % performance improvement of the parallel AES algorithm was achieved by exploiting the characteristics of embedded GPUs such as removing copying data between GPU and host memory. Our results also demonstrate that higher performance improvement can be achieved with larger size of input data.

MDA(Model Driven Architecture) based Design for Multitasking of Heterogeneous Embedded System (이종 임베디드 시스템의 멀티태스킹을 위한 MDA(Model Driven Architecture) 기반의 설계)

  • Son, Hyun-Seung;Kim, Woo-Yeol;Kim, R. Young-Chul
    • The KIPS Transactions:PartD
    • /
    • v.15D no.3
    • /
    • pp.355-360
    • /
    • 2008
  • The complicated embedded system for multi-tasking requires RTOS(real-time operating system). It uses the optimal OS and processor to each embedded system on the heterogeneous development environment. This paper is proposed to use UML profile of OS API and Processor Configuration, instead of cross-compiling for developing the heterogeneous embedded system. This reduces the development time and cost through generating the automatic source code with the profile information of each embedded system. We generate and port the code after modeling the two heterogeneous real time operating systems (brickOS and uC/OS-II) and the processors (Hitachi H8 and Intel PXA255) with our proposed profile of the heterogeneous embedded system.

An Efficient Adaptive Polarimetric Processor with an Embedded CFAR

  • Park, Hyung-Rae;Kwag, Young-Kil;Wang, Hong
    • ETRI Journal
    • /
    • v.25 no.3
    • /
    • pp.171-178
    • /
    • 2003
  • To improve the detection performance of surveillance radars with polarization diversity, we developed an adaptive polarimetric processor and compared it with other polarimetric processors. We derived our adaptive polarimetric processor, called the polarization discontinuity detector (PDD), from the generalized likelihood ratio (GLR) test principle for the unspecified target component. We derived closed-form expressions of its probabilities of detection and false alarm, and compared its performance to that of the adaptive polarization canceller (APC) and Kelly's GLR processor. The PDD had a performance similar to Kelly's GLR in Gaussian clutter, and both the PDD and Kelly's GLR, which have embedded constant false alarm rates (CFARs), outperformed the APC, especially when the target polarization state was close to the clutter's polarization state. The important difference is that the PDD is much simpler than Kelly's GLR for hardware/software implementation, because the PDD does not require a costly two-parameter filter bank to cover the unknown target polarization state as Kelly's GLR does.

  • PDF

Illuminance Dynamic Range Expansion using Gamma & Multi-Point Knee for Smart Phone Camera (감마 및 다중 포인터 니를 이용한 스마트폰 카메라의 광 다이나믹 영역 확장)

  • Choi, Duk-Kyu;Han, Chan-Ho
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.8 no.1
    • /
    • pp.43-50
    • /
    • 2013
  • The narrow dynamic range of most smart phone cameras is severely limited. It usually is narrower than the dynamic range of most scenes. So we proposes a illuminance dynamic range expansion using multi-point knee for smart phone camera. Such as logarithmic functions the proposed method compress the image sensor output signal. Additionally, the proposed method was merged into the gamma that is essential circuit for any cameras. To justifying multi-point knee effectiveness, we configure the control and quality evaluation system for smart phone camera module. Experimental results show that the lost information by cut off and saturated affectively reconstructed in darker and in brighter areas. Finally this methods have advantage to implement without any changing hardware for conventional smart phones.

Implementation and Verification of a Multi-Core Processor including Multimedia Specific Instructions (멀티미디어 전용 명령어를 내장한 멀티코어 프로세서 구현 및 검증)

  • Seo, Jun-Sang;Kim, Jong-Myon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.8 no.1
    • /
    • pp.17-24
    • /
    • 2013
  • In this paper, we present a multi-core processor including multimedia specific instructions to process multimedia data efficiently in the mobile environment. Multimedia specific instructions exploit subword level parallelism (SLP), while the multi-core processor exploits data level parallelism (DLP). These combined parallelisms improve the performance of multimedia processing applications. The proposed multi-core processor including multimedia specific instructions is implemented and tested using a Xilinx ISE 10.1 tool and SoCMaster3 testbed system including Vertex 4 FPGA. Experimental results using a fire detection algorithm show that multimedia specific instructions outperform baseline instructions in the same multi-core architecture in terms of performance (1.2x better), energy efficiency (1.37x better), and area efficiency (1.23x better).

Implementation of Energy-Efficient Multimedia Embedded System using PXA270 processor (PXA270 프로세서를 사용한 저전력 멀티미디어 임베디드 시스템의 구현)

  • Kim, Sang-Duck;Lee, Hoo-Sung;Park, Seong-Su
    • Proceedings of the IEEK Conference
    • /
    • 2005.11a
    • /
    • pp.945-948
    • /
    • 2005
  • In wireless and handheld platforms area, performance, power and cost are key metrics for product success. This is driving increasing levels of on-chip integration in state-of-the-art application processors. The purpose of this project is to optimize and design the energy-efficient embedded system that properly displays video and audio in real time. The requirements are for the media player to be capable of decoding real-time streaming video and audio with the least possible energy consumption for a variety of different clips at different resolutions. We implemented this Linux based multimedia player on Intel's PXA27x platform.

  • PDF

Voltage Scaling for Reduced Energy Consumption in Real-Time Systems Using Variable Voltage Processor (가변 전압 프로세서를 사용하는 실시간 시스템에서 소비 전력감소를 위한 전압조절)

  • Lee, Yong-Jun;Kim, Yong-Seok
    • Proceedings of the KIEE Conference
    • /
    • 2004.11c
    • /
    • pp.438-440
    • /
    • 2004
  • Energy consumption has become an increasingly important consideration in designing real-time embedded systems. In this paper, we propose a voltage scaling method to reduce energy consumption in fixed priority real-time systems using variable voltage processors. The Hyperperiod of tasks is divided into dimains. The most suitable voltage of each domain is determined off-line and stored in a table. During task execution, the voltage of processor is adjusted according to the information of the table. A simulation result shows that the proposed method can reduce 80% of power consumption in comparison to no power management. The difference to the optimal EDF based method is only 5%.

  • PDF

Performance Evaluation and Analysis for Discrete Wavelet Transform on Many-Core Processors (매니코어 프로세서 상에서 이산 웨이블릿 변환을 위한 성능 평가 및 분석)

  • Park, Yong-Hun;Kim, Jong-Myon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.7 no.5
    • /
    • pp.277-284
    • /
    • 2012
  • To meet the usage of discrete wavelet transform (DWT) on potable devices, this paper implements 2-level DWT using a reference many-core processor architecture and determine the optimal many-core processor. To explore the optimal many-core processor, we evaluate the impacts of a data-per-processing element ratio that is defined as the amount of data mapped directly to each processing element (PE) on system performance, energy efficiency, and area efficiency, respectively. This paper utilized five PE configurations (PEs=16, 64, 256, 1,024, and 4,096) that were implemented in 130nm CMOS technology with a 720MHz clock frequency. Experimental results indicated that maximum energy and area efficiencies were achieved at PEs=1,024. However, the system area must be limited 140mm2 and the power should not exceed 3 watts in order to implement 2-level DWT on portable devices. When we consider these restrictions, the most reasonable energy and area efficiencies were achieved at PEs=256.

A Study on Optimization of Hardware Complexity of a FFT Processor for IEEE 802.11n WLAN (IEEE 802.11n WLAN을 위한 FFT 프로세서의 하드웨어 복잡도 최적화에 대한 연구)

  • Choi, Rakhun;Park, Jungjun;Lim, Taemin;Lee, Jinyong;Kim, Younglok
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.6 no.4
    • /
    • pp.243-248
    • /
    • 2011
  • A FFT/IFFT processor is the key component for orthogonal frequency division multiplexing (OFDM) systems based IEEE 802.11n wireless local area network (WLAN). There exists many radix algorithms according to the structure of butterfly as FFT sub-module, each has the pros and cons on hardware complexity. Here, mixed radix algorithms for 64 and 128 FFT/IFFT processors are proposed, which reduce hardware complexity by using mixture of radix-23 and radix-4 algorithms. The proposed algorithm finish calculation within 3.2${\mu}s$ in order to meet IEEE 802.11n standard requirements and it has less hardware complexity compared with conventional algorithms.