Search | Korea Science

Floating Point Unit Design for the IEEE754-2008 (IEEE754-2008을 위한 고속 부동소수점 연산기 설계)

Hwang, Jin-Ha;Kim, Hyun-Pil;Park, Sang-Su;Lee, Yong-Surk
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.48 no.10
- /
- pp.82-90
- /
- 2011
Because of the development of Smart phone devices, the demands of high performance FPU(Floating-point Unit) becomes increasing. Therefore, we propose the high-speed single-/double-precision FPU design that includes an elementary add/sub unit and improved multiplier and compare and convert units. The most commonly used add/sub unit is optimized by the parallel rounding unit. The matrix operation is used in complex calculation something like a graphic calculation. We designed the Multiply-Add Fused(MAF) instead of multiplier to calculate the matrix more quickly. The branch instruction that is decided by the compare operation is very frequently used in various programs. We bypassed the result of the compare operation before all the pipeline processes ended to decrease the total execution time. And we included additional convert operations that are added in IEEE754-2008 standard. To verify our RTL designs, we chose four hundred thousand test vectors by weighted random method and simulated each unit. The FPU that was synthesized by Samsung's 45-nm low-power process satisfied the 600-MHz operation frequency. And we confirm a reduction in area by comparing the improved FPU with the existing FPU.
PDF KSCI

Scalable RSA public-key cryptography processor based on CIOS Montgomery modular multiplication Algorithm (CIOS 몽고메리 모듈러 곱셈 알고리즘 기반 Scalable RSA 공개키 암호 프로세서)

Cho, Wook-Lae;Shin, Kyung-Wook
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.22 no.1
- /
- pp.100-108
- /
- 2018
This paper describes a design of scalable RSA public-key cryptography processor supporting four key lengths of 512/1,024/2,048/3,072 bits. The modular multiplier that is a core arithmetic block for RSA crypto-system was designed with 32-bit datapath, which is based on the CIOS (Coarsely Integrated Operand Scanning) Montgomery modular multiplication algorithm. The modular exponentiation was implemented by using L-R binary exponentiation algorithm. The scalable RSA crypto-processor was verified by FPGA implementation using Virtex-5 device, and it takes 456,051/3,496347/26,011,947/88,112,770 clock cycles for RSA computation for the key lengths of 512/1,024/2,048/3,072 bits. The RSA crypto-processor synthesized with a $0.18{\mu}m$ CMOS cell library occupies 10,672 gate equivalent (GE) and a memory bank of $6{\times}3,072$ bits. The estimated maximum clock frequency is 147 MHz, and the RSA decryption takes 3.1/23.8/177/599.4 msec for key lengths of 512/1,024/2,048/3,072 bits.
https://doi.org/10.6109/jkiice.2018.22.1.100 인용 PDF KSCI

Implementation of the Digital Current Control System for an Induction Motor Using FPGA (FPGA를 이용한 유도 전동기의 디지털 전류 제어 시스템 구현)

Yang, Oh
- Journal of the Korean Institute of Telematics and Electronics C
- /
- v.35C no.11
- /
- pp.21-30
- /
- 1998
In this paper, a digital current control system using a FPGA(Field Programmable Gate Array) was implemented, and the system was applied to an induction motor widely used as an industrial driving machine. The FPGA designed by VHDL(VHSIC Hardware Description Language) consists of a PWM(Pulse Width Modulation) generation block, a PWM protection block, a speed measuring block, a watch dog timer block, an interrupt control block, a decoder logic block, a wait control block and digital input and output blocks respectively. Dedicated clock inputs on the FPGA were used for high-speed execution, and an up-down counter and a latch block were designed in parallel, in order that the triangle wave could be operated at 40 MHz clock. When triangle wave is compared with many registers respectively, gate delay occurs from excessive fan-outs. To reduce the delay, two triangle wave registers were implemented in parallel. Amplitude and frequency of the triangle wave, and dead time of PWM could be changed by software. This FPGA was synthesized by pASIC 2SpDE and Synplify-Lite synthesis tool of Quick Logic company. The final simulation for worst cases was successfully performed under a Verilog HDL simulation environment. And the FPGA programmed for an 84 pin PLCC package was applied to digital current control system for 3-phase induction motor. The digital current control system of the 3 phase induction motor was configured using the DSP(TMS320C31-40 MHz), FPGA, A/D converter and Hall CT etc., and experimental results showed the effectiveness of the digital current control system.
PDF

A VLSI Design of High Performance H.264 CAVLC Decoder Using Pipeline Stage Optimization (파이프라인 최적화를 통한 고성능 H.264 CAVLC 복호기의 VLSI 설계)

Lee, Byung-Yup;Ryoo, Kwang-Ki
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.46 no.12
- /
- pp.50-57
- /
- 2009
This paper proposes a VLSI architecture of CAVLC hardware decoder which is a tool eliminating statistical redundancy in H.264/AVC video compression. The previous CAVLC hardware decoder used four stages to decode five code symbols. The previous CAVLC hardware architectures decreased decoding performance because there was an unnecessary idle cycle in between state transitions. Likewise, the computation of valid bit length includes an unnecessary idle cycle. This paper proposes hardware architecture to eliminate the idle cycle efficiently. Two methods are applied to the architecture. One is a method which eliminates an unnecessary things of buffers storing decoded codes and then makes efficient pipeline architecture. The other one is a shifter control to simplify operations and controls in the process of calculating valid bit length. The experimental result shows that the proposed architecture needs only 89 cycle in average for one macroblock decoding. This architecture improves the performance by about 29% than previous designs. The synthesis result shows that the design achieves the maximum operating frequency at 140Mhz and the hardware cost is about 11.5K under a 0.18um CMOS process. Comparing with the previous design, it can achieve low-power operation because this design is implemented with high throughputs and low gate count.
PDF KSCI

LDPC Decoder for WiMAX/WLAN using Improved Normalized Min-Sum Algorithm (개선된 정규화 최소합 알고리듬을 적용한 WiMAX/WLAN용 LDPC 복호기)

Seo, Jin-Ho;Shin, Kyung-Wook
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.18 no.4
- /
- pp.876-884
- /
- 2014
A hardware design of LDPC decoder which is based on the improved normalized min-sum(INMS) decoding algorithm is described in this paper. The designed LDPC decoder supports 19 block lengths(576~2304) and 6 code rates(1/2, 2/3A, 2/3B, 3/4A, 3/4B, 5/6) of IEEE 802.16e mobile WiMAX standard and 3 block lengths(648, 1296, 1944) and 4 code rates(1/2, 2/3, 3/4, 5/6) of IEEE 802.11n WLAN standard. The decoding function unit(DFU) which is a main arithmetic block is implemented using sign-magnitude(SM) arithmetic and INMS decoding algorithm to optimize hardware complexity and decoding performance. The LDPC decoder synthesized using a 0.18-${\mu}m$ CMOS cell library with 100 MHz clock has 284,409 gates and RAM of 62,976 bits, and it is verified by FPGA implementation. The estimated performance depending on code rate and block length is about 82~218 Mbps at 100 MHz@1.8V.
https://doi.org/10.6109/jkiice.2014.18.4.876 인용 PDF KSCI

Architecture Design of High Performance H.264 CAVLC Encoder Using Optimized Searching Technique (최적화된 탐색기법을 이용한 고성능 H.264/AVC CAVLC 부호화기 구조 설계 기법)

Lee, Yang-Bok;Jung, Hong-Kyun;Kim, Chang-Ho;Myung, Je-Jin;Ryoo, Kwang-Ki
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2011.10a
- /
- pp.431-435
- /
- 2011
This paper presents optimized searching technique to improve the performance of H.264/AVC. The proposed CAVLC encoder uses forward and backward searching algorithm to compute the parameters. By zero-block skipping technique and pipelined scheduling, the proposed CAVLC encoder can obtain better performance. The experimental result shows that the proposed architecture needs only 66.6 cycles on average for each $16{\times}16$ macroblock encoding. The proposed architecture improves the performance by 13.8% than that of previous designs. The proposed CAVLC encoder was implemented using VerilogHDL and synthesized with Megnachip $0.18{\mu}m$ standard cell library. The synthesis result shows that the gate count is about 15.6K with 125Mhz clock frequency.
PDF

Development and Verification of SoC Platform based on OpenRISC Processor and WISHBONE Bus (OpenRISC 프로세서와 WISHBONE 버스 기반 SoC 플랫폼 개발 및 검증)

Bin, Young-Hoon;Ryoo, Kwang-Ki
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.46 no.1
- /
- pp.76-84
- /
- 2009
This paper proposes a SOC platform which is eligible for education and application SOC design. The platform, fully synthesizable and reconfigurable, includes the OpenRISC embedded processor, some basic peripherals such as GPIO, UART, debug interlace, VGA controller and WISHBONE interconnect. The platform uses a set of development environment such as compiler, assembler, debugger and RTOS that is built for HW/SW system debugging and software development. Designed SOC, IPs and Testbenches are described in the Verilog HDL and verified using commercial logic simulator, GNU SW development tool kits and the FPGA. Finally, a multimedia SOC derived from the SOC platform is implemented to ASIC using the Magnachip cell library based on 0.18um 1-poly 6-metal technology.
PDF KSCI

Digit-serial VLSI Architecture for Lifting-based Discrete Wavelet Transform (리프팅 기반 이산 웨이블렛 변환의 디지트 시리얼 VLSI 구조)

Ryu, Donghoon;Park, Taegeun
- Journal of the Institute of Electronics and Information Engineers
- /
- v.50 no.1
- /
- pp.157-165
- /
- 2013
In this paper, efficient digit-serial VLSI architecture for 1D (9,7) lifting-based discrete wavelet transform (DWT) filter has been proposed. The proposed architecture computes the DWT in digit basis, so that the required hardware is reduced. Also, the multiplication is replaced with the shift and add operation to minimize the hardware requirement. Bit allocation for input, output, and the internal data has been determined by analyzing the PSNR. We have carefully designed the data feedback latency not to degrade the performance in the recursive folded scheduling. The proposed digit-serial architecture requires small amount of hardware but achieve 100% of hardware utilization, so we try to optimize the tradeoffs between the hardware cost and the performance. The proposed architecture has been designed and verified by VerilogHDL and synthesized by Synopsys Design Compiler with a DongbuHitek $0.18{\mu}m$ STD cell library. The maximum operating frequency is 330MHz with 3,770 gates in equivalent two input NAND gates.
https://doi.org/10.5573/ieek.2013.50.1.157 인용 PDF KSCI

A Design of High Performance Motion Estimation Hardware for H.264/AVC (H.264/AVC를 위한 고성능 움직임 예측 하드웨어 설계)

Park, Seungyong;Ryoo, Kwangki
- Journal of the Institute of Electronics and Information Engineers
- /
- v.50 no.1
- /
- pp.124-130
- /
- 2013
In this paper, a new motion estimation algorithm with low-computational complexity is proposed to improve the performance of H.264/AVC. The proposed architecture uses the directions of the median motion vector which is computed by the motion vectors of the three neighbor macroblocks in Integer Motion Estimation. By using the directions of the vector, the proposed architecture has a single computational level instead of multi-computational levels in Integer Motion Estimation. The proposed motion estimation is synthesized using the TSMC 0.18um standard cell library. The synthesis result shows that the gate count is about 217.92K at 166MHz and it was improved about 69% compared with previous one.
https://doi.org/10.5573/ieek.2013.50.1.124 인용 PDF KSCI

A Cryptographic Processor Supporting ARIA/AES-based GCM Authenticated Encryption (ARIA/AES 기반 GCM 인증암호를 지원하는 암호 프로세서)

Sung, Byung-Yoon;Kim, Ki-Bbeum;Shin, Kyung-Wook
- Journal of IKEEE
- /
- v.22 no.2
- /
- pp.233-241
- /
- 2018
This paper describes a lightweight implementation of a cryptographic processor supporting GCM (Galois/Counter Mode) authenticated encryption (AE) that is based on the two block cipher algorithms of ARIA and AES. It also provides five modes of operation (ECB, CBC, OFB, CFB, CTR) for confidentiality as well as the key lengths of 128-bit and 256-bit. The ARIA and AES are integrated into a single hardware structure, which is based on their algorithm characteristics, and a $128{\times}12-b$ partially parallel GF (Galois field) multiplier is adopted to efficiently perform concurrent processing of CTR encryption and GHASH operation to achieve overall performance optimization. The hardware operation of the ARIA/AES-GCM AE processor was verified by FPGA implementation, and it occupied 60,800 gate equivalents (GEs) with a 180 nm CMOS cell library. The estimated throughput with the maximum clock frequency of 95 MHz are 1,105 Mbps and 810 Mbps in AES mode, 935 Mbps and 715 Mbps in ARIA mode, and 138~184 Mbps in GCM AE mode according to the key length.
https://doi.org/10.7471/ikeee.2018.22.2.233 인용 PDF KSCI

Search Result 427, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)