Search | Korea Science

Performance Enhancement and Evaluation of AES Cryptography using OpenCL on Embedded GPGPU (OpenCL을 이용한 임베디드 GPGPU환경에서의 AES 암호화 성능 개선과 평가)

Lee, Minhak;Kang, Woochul
- KIISE Transactions on Computing Practices
- /
- v.22 no.7
- /
- pp.303-309
- /
- 2016
Recently, an increasing number of embedded processors such as ARM Mali begin to support GPGPU programming frameworks, such as OpenCL. Thus, GPGPU technologies that have been used in PC and server environments are beginning to be applied to the embedded systems. However, many embedded systems have different architectural characteristics compare to traditional PCs and low-power consumption and real-time performance are also important performance metrics in these systems. In this paper, we implement a parallel AES cryptographic algorithm for a modern embedded GPU using OpenCL, a standard parallel computing framework, and compare performance against various baselines. Experimental results show that the parallel GPU AES implementation can reduce the response time by about 1/150 and the energy consumption by approximately 1/290 compare to OpenMP implementation when 1000KB input data is applied. Furthermore, an additional 100 % performance improvement of the parallel AES algorithm was achieved by exploiting the characteristics of embedded GPUs such as removing copying data between GPU and host memory. Our results also demonstrate that higher performance improvement can be achieved with larger size of input data.
https://doi.org/10.5626/KTCP.2016.22.7.303 인용 KSCI

Extended BSD Socket API Supporting Kernel-level RTP (커널 레벨 RTP를 지원하는 확장 BSD 소켓 API)

Choi Mun-Seon;Kim Kyung-San;Kim Sung-Jo
- Journal of KIISE:Computer Systems and Theory
- /
- v.33 no.6
- /
- pp.326-336
- /
- 2006
Due to the evolution of wired and wireless communication technologies and the Internet, multimedia services such as Internet broadcast and VOD have been prevalent recently. RTP is designed to be suitable for transmission of real-time multimedia data on the Internet by IETF While a variety of applications have utilized different RTPs implemented as a library, embeddedRTP is RTP-based kernel-level protocol that resolved performance issues of this kind of RTPs. This paper proposes the ExtendedERTP protocol based on existing embeddedRTP. This new protocol resolves a couple of issues such as packet processing overhead and buffer requirement and combines its APIs with BSD socket APIs which have been widely utilized in network applications. This paper demonstrates that this integration makes it possible to transmit real-time multimedia data through the accustomed interface of BSD socket APIs with nominal extra overhead. This paper also proposes a scheme for improving packet processing time by 15$\sim$20% and another scheme for reducing memory requirement for packet processing to about 3.5%, comparing with those of embeddedRTP.
PDF KSCI

Hardware Design for JBIG2 Huffman Coder (JBIG2 허프만 부호화기의 하드웨어 설계)

Park, Kyung-Jun;Ko, Hyung-Hwa
- Journal of Korea Multimedia Society
- /
- v.12 no.2
- /
- pp.200-208
- /
- 2009
JBIG2, as the next generation standard for binary image compression, must be designed in hardware modules for the JBIG2 FAX to be implemented in an embedded equipment. This paper proposes a hardware module of the high-speed Huffman coder for JBIG2. The Huffman coder of JBIG2 uses selectively 15 Huffman tables. As the Huffman coder is designed to use minimal data and have an efficient memory usage, high speed processing is possible. The designed Huffman coder is ported to Virtex-4 FPGA and co-operating with a software modules on the embedded development board using Microblaze core. The designed IP was successfully verified using the simulation function test and hardware-software co-operating test. Experimental results shows the processing time is 10 times faster than that of software only on embedded system, because of hardware design using an efficient memory usage.
PDF

Program Execution Speed Improvement using Executable Compression Method on Embedded Systems (임베디드 시스템에서 실행 가능 압축 기법을 이용한 프로그램 초기 실행 속도 향상)

Jeon, Chang-Kyu;Lew, Kyeung-Seek;Kim, Yong-Deak
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.49 no.1
- /
- pp.23-28
- /
- 2012
The performance improvement of the secondary storage is very slow compared to the main memory and processor. The data is loaded from secondary storage to memory for the execution of an application. At this time, there is a bottleneck. In this paper, we propose an Executable Compression Method to speed up the initial loading time of application. and we examined the performance. So we implemented the two applications. The one is a compressor for Execution Binary File. and The other is a decoder of Executable Compressed application file on the Embedded System. Using the test binary files, we performed the speed test in the six files. At the result, one result showed that the performance was decreased. but others had a increased performance. the average increasing rate was almost 29% at the initial loading time. The level of compression had different characteristics of the file. And the performance level was dependent on the file compressed size and uncompress time. so the optimized compression algorithm will be needed to apply the execution binary file.
PDF KSCI

Design and Verification of Pipelined Face Detection Hardware (파이프라인 구조의 얼굴 검출 하드웨어 설계 및 검증)

Kim, Shin-Ho;Jeong, Yong-Jin
- Journal of Korea Multimedia Society
- /
- v.15 no.10
- /
- pp.1247-1256
- /
- 2012
There are many filter based image processing algorithms and they usually require a huge amount of computations and memory accesses making it hard to attain a real-time performance, expecially in embedded applications. In this paper, we propose a pipelined hardware structure of the filter based face detection algorithm to show that the real time performance can be achieved by hardware design. In our design, the whole computation is divided into three pipeline stages: resizing the image (Resize), Transforming the image (ICT), and finding candidate area (Find Candidate). Each stage is optimized by considering the parallelism of the computation to reduce the number of cycles and utilizing the line memory to minimize the memory accesses. The resulting hardware uses 507 KB internal SRAM and occupies 9,039 LUTs when synthesized and configured on Xilinx Virtex5LX330 FPGA. It can operate at maximum 165MHz clock, giving the performance of 108 frame/sec, while detecting up to 20 faces.
https://doi.org/10.9717/kmms.2012.15.10.1247 인용 PDF KSCI

Code Size Reduction Through Efficient use of Multiple Load/store Instructions (복수의 메모리 접근 명령어의 효율적인 이용을 통한 코드 크기의 감소)

Ahn Minwook;Cho Doosan;Paek Yunheung;Cho Jeonghun
- Journal of KIISE:Software and Applications
- /
- v.32 no.8
- /
- pp.819-833
- /
- 2005
Code size reduction is ever becoming more important for compilers targeting embedded processors because these processors are often severely limited by storage constraints and thus the reduced code size can have a positively significant Impact on their performance. Various code size reduction techniques have different motivations and a variety of application contexts utilizing special hardware features of their target processors. In this work, we propose a novel technique that fully utilizes a set of hardware instructions, called the multiple load/store (MLS), that are specially featured for reducing code size by minimizing the number of memory operations in the code. To take advantage of this feature, many microprocessors support the MLS instructions, whereas no existing compilers fully exploit the potential benefit of these instructions but only use them for some limited cases. This is mainly because optimizing memory accesses with MLS instructions for general cases is an NP-hard problem that necessitates complex assignments of registers and memory off-sets for variables in a stack frame. Our technique uses a couple of heuristics to efficiently handle this problem in a polynomial time bound.
PDF KSCI

A Study on Design and Implementation of Speech Recognition System Using ART2 Algorithm

Kim, Joeng Hoon;Kim, Dong Han;Jang, Won Il;Lee, Sang Bae
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.4 no.2
- /
- pp.149-154
- /
- 2004
In this research, we selected the speech recognition to implement the electric wheelchair system as a method to control it by only using the speech and used DTW (Dynamic Time Warping), which is speaker-dependent and has a relatively high recognition rate among the speech recognitions. However, it has to have small memory and fast process speed performance under consideration of real-time. Thus, we introduced VQ (Vector Quantization) which is widely used as a compression algorithm of speaker-independent recognition, to secure fast recognition and small memory. However, we found that the recognition rate decreased after using VQ. To improve the recognition rate, we applied ART2 (Adaptive Reason Theory 2) algorithm as a post-process algorithm to obtain about 5% recognition rate improvement. To utilize ART2, we have to apply an error range. In case that the subtraction of the first distance from the second distance for each distance obtained to apply DTW is 20 or more, the error range is applied. Likewise, ART2 was applied and we could obtain fast process and high recognition rate. Moreover, since this system is a moving object, the system should be implemented as an embedded one. Thus, we selected TMS320C32 chip, which can process significantly many calculations relatively fast, to implement the embedded system. Considering that the memory is speech, we used 128kbyte-RAM and 64kbyte ROM to save large amount of data. In case of speech input, we used 16-bit stereo audio codec, securing relatively accurate data through high resolution capacity.
https://doi.org/10.5391/IJFIS.2004.4.2.149 인용 PDF KSCI

Development of an Instant On System Using Storage Class Memory (스토리지 클래스 메모리를 활용한 즉각 구동 시스템의 개발)

Moon, Young-Je;Doh, In-Hwan;Park, Jung-Soo;Noh, Sam-H.
- Journal of KIISE:Computing Practices and Letters
- /
- v.16 no.2
- /
- pp.207-211
- /
- 2010
Storage Class Memory (SCM) has both characteristics of non-volatility and random byte addressability. The advent of SCM can bring about novel and innovative features that are not possible in con ventional computing systems. This paper suggests a new system design that turns on/off a system instantly. To do this, we replace the main memory with SCM to retain the volatile system states as the system is turned off. We implement our prototype in an embedded environment and measure its system on/off time.
PDF KSCI

Improvement of Impact Resistance of Composite Structures using Shape Memory Alloys (형상기억합금을 이용한 복합재료 구조물의 저속충격특성 향상)

Kim, Eun-Ho;Rim, Mi-Sun;Lee, In;Kim, Hyung-Won
- Proceedings of the Korean Society of Propulsion Engineers Conference
- /
- 2009.11a
- /
- pp.453-456
- /
- 2009
Impact resistance of shape memory alloy hybrid composite(SMAHC) plates were experimentally investigated. Shape memory alloy(SMA) have large failure strain and failure stress and can absorb large strain energies through phase transformation. SMA wires were embedded in composite plates to improve their weak impact resistance. Tensile tests of SMA wires were performed at various temperature to investigate their thermo-mechanical properties. Low-Velocity impact tests of several types of composite plates with SMA/Al/Fe were performed. Embedding SMA wires was most effective to improve impact resistance of composite plates. The effects of SMA position on impact resistance were also investigated.
PDF

Design of a Sense Amplifier Minimizing bit Line Disturbance for a Flash Memory (비트라인 간섭을 최소화한 플래시 메모리용 센스 앰프 설계)

Kim, Byong-Rok;So, Kyoung-Rok;You, Young-Gab;Kim, Sung-Sik
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.37 no.6
- /
- pp.1-8
- /
- 2000
In this paper, design of sense amplifier for a flash memory minimizing bit line disturbance due to common bit line is presented. There is a disturbance problem at output modes by using common bit line, when the external devices access an internal flash memory. This phenomenon is resulted form hot carrier between floating gates and bit lines by thin oxide thickness. To minimize bit line disturbance, lower it line voltage is required and need sense amplifier to detect data existence in lower bit line voltage. Proposed circuits is operated at lower bit line voltage and we fabricated a embedded flash memory MCU using 0.6u technology.
PDF

Search Result 730, Processing Time 0.032 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)