Search | Korea Science

Design of Reconfigurable Coprocessor for Multimedia Mobile Terminal (멀티미디어 무선 단말기를 위한 재구성 가능한 코프로세서의 설계)

Kim, Nam-Sub;Lee, Sang-Hun;Kum, Min-Ha;Kim, Jin-Sang;Cho, Won-Kyung
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.44 no.4
- /
- pp.63-72
- /
- 2007
In this paper, we propose a novel reconfigurable coprocessor for multimedia mobile terminals. Because most of multimedia operations require fast operations of large amount of data in the limited clock frequency, it is necessary to enhance the performance of the embedded processor that is widely used in current multimedia mobile terminals. Therefore, we proposed and have designed the coprocessor which had the ability of fast operations of multimedia data. The proposed coprocessor was not only reconfigurable, but also flexible and expandable. The proposed coprocessor has been designed by using VHDL and compared with previous reconfigurable coprocessors and a commercial embedded processor in architecture and speed. As a result of the architectural comparison, the proposed coprocessor had better structure in terms of hardware size and flexibility. Also, the simulation results of DCT application showed that the proposed coprocessor was 26 times faster than a commercial ARM processor and 11 times faster than the ARM processor with fast DCT core.
PDF KSCI

Energy-Efficient Signal Processing Using FPGAs (FPGA 상에서 에너지 효율이 높은 병렬 신호처리 기법)

Jang Ju-wook;Hwang Yunil;Scrofano Ronald;Prasanna Viktor K.
- The KIPS Transactions:PartA
- /
- v.12A no.4 s.94
- /
- pp.305-312
- /
- 2005
In this paper, we present algorithm-level techniques for energy-efficient design at the algorithm level using FPGAs. We then use these techniques to create energy-efficient designs for two signal processing kernel applications: fast Fourier transform(FFT) and matrix multiplication. We evaluate the performance, in terms of both latency and energy efficiency, of FPGAs in performing these tasks. Using a Xilinx Virtex-II as the target FPGA, we compare the performance of our designs to those from the Xilinx library as well as to conventional algorithms run on the PowerPC core embedded in the Virtex-II Pro and the Texas Instruments TMS320C6415. Our evaluations are done both through estimation based on energy and latency equations on high-level and through low-level simulation. For FFT, our designs dissipated an average of $50\%$ less energy than the design from the Xilinx library and $56\%$ less than the DSP. Our designs showed an EAT factor of 10 times improvement over the embedded processor. These results provide a concrete evidence to substantiate the idea that FPGAs can outperform DSPs and embedded processors in signal processing. Further, they show that PFGAs can achieve this performance while still dissipating less energy than the other two types of devices.
https://doi.org/10.3745/KIPSTA.2005.12A.4.305 인용 PDF KSCI

Implementation of RSA modular exponentiator using Division Chain (나눗셈 체인을 이용한 RSA 모듈로 멱승기의 구현)

김성두;정용진
- Journal of the Korea Institute of Information Security & Cryptology
- /
- v.12 no.2
- /
- pp.21-34
- /
- 2002
In this paper we propos a new hardware architecture of modular exponentiation using a division chain method which has been proposed in (2). Modular exponentiation using the division chain is performed by receding an exponent E as a mixed form of multiplication and addition with divisors d=2 or $d=2^I +1$ and respective remainders r. This calculates the modular exponentiation in about $1.4log_2$E multiplications on average which is much less iterations than $2log_2$E of conventional Binary Method. We designed a linear systolic array multiplier with pipelining and used a horizontal projection on its data dependence graph. So, for k-bit key, two k-bit data frames can be inputted simultaneously and two modular multipliers, each consisting of k/2+3 PE(Processing Element)s, can operate in parallel to accomplish 100% throughput. We propose a new encoding scheme to represent divisors and remainders of the division chain to keep regularity of the data path. When it is synthesized to ASIC using Samsung 0.5 um CMOS standard cell library, the critical path delay is 4.24ns, and resulting performance is estimated to be abort 140 Kbps for a 1024-bit data frame at 200Mhz clock In decryption process, the speed can be enhanced to 560kbps by using CRT(Chinese Remainder Theorem). Futhermore, to satisfy real time requirements we can choose small public exponent E, such as 3,17 or $2^{16} +1$, in encryption and verification process. in which case the performance can reach 7.3Mbps.
https://doi.org/10.13089/JKIISC.2002.12.2.21 인용 PDF KSCI HTML

Low-Gate-Count 32-Bit 2/3-Stage Pipelined Processor Design (소면적 32-bit 2/3단 파이프라인 프로세서 설계)

Lee, Kwang-Min;Park, Sungkyung
- Journal of the Institute of Electronics and Information Engineers
- /
- v.53 no.4
- /
- pp.59-67
- /
- 2016
With the enhancement of built-in communication capabilities in various meters and wearable devices, which implies Internet of things (IoT), the demand of small-area embedded processors has increased. In this paper, we introduce a small-area 32-bit pipelined processor, Juno, which is available in the field of IoT. Juno is an EISC (Extendable Instruction Set Computer) machine and has a 2/3-stage pipeline structure to reduce the data dependency of the pipeline. It has a simple pipeline controller which only controls the program counter (PC) and two pipeline registers. It offers $32{\times}32=64$ multiplication, 64/32=32 division, $32{\times}32+64=64$ MAC (multiply and accumulate) operations together with 32*32=64 Galois field multiplication operation for encryption processing in wireless communications. It provides selective inclusion of these algebraic logic blocks if necessary in order to reduce the area of the overall processor. In this case, the gate count of our integer core amounts to 12k~22k and has a performance of 0.57 DMIPS/MHz and 1.024 Coremark/MHz.
https://doi.org/10.5573/ieie.2016.53.4.059 인용 PDF KSCI

A Study on Design of High-Speed Parallel Multiplier over GF(2^m) using VCG (VCG를 사용한 GF(2^m)상의 고속병렬 승산기 설계에 관한 연구)

Seong, Hyeon-Kyeong
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.14 no.3
- /
- pp.628-636
- /
- 2010
In this paper, we present a new type high speed parallel multiplier for performing the multiplication of two polynomials using standard basis in the finite fields GF($2^m$). Prior to construct the multiplier circuits, we design the basic cell of vector code generator(VCG) to perform the parallel multiplication of a multiplicand polynomial with a irreducible polynomial and design the partial product result cell(PPC) to generate the result of bit-parallel multiplication with one coefficient of a multiplicative polynomial with VCG circuits. The presented multiplier performs high speed parallel multiplication to connect PPC with VCG. The basic cell of VCG and PPC consists of one AND gate and one XOR gate respectively. Extending this process, we show the design of the generalized circuits for degree m and a simple example of constructing the multiplier circuit over finite fields GF($2^4$). Also, the presented multiplier is simulated by PSpice. The multiplier presented in this paper uses the VCGs and PPCS repeatedly, and is easy to extend the multiplication of two polynomials in the finite fields with very large degree m, and is suitable to VLSL.
https://doi.org/10.6109/jkiice.2010.14.3.628 인용 PDF KSCI

Design of Bit Manipulation Accelerator fo Communication DSP (통신용 DSP를 위한 비트 조작 연산 가속기의 설계)

Jeong Sug H.;Sunwoo Myung H.
- Journal of the Institute of Electronics Engineers of Korea TC
- /
- v.42 no.8 s.338
- /
- pp.11-16
- /
- 2005
This paper proposes a bit manipulation accelerator (BMA) having application specific instructions, which efficiently supports scrambling, convolutional encoding, puncturing, and interleaving. Conventional DSPs cannot effectively perform bit manipulation functions since かey have multiply accumulate (MAC) oriented data paths and word-based functions. However, the proposed accelerator can efficiently process bit manipulation functions using parallel shift and Exclusive-OR (XOR) operations and bit jnsertion/extraction operations on multiple data. The proposed BMA has been modeled by VHDL and synthesized using the SEC $0.18\mu m$ standard cell library and the gate count of the BMA is only about 1,700 gates. Performance comparisons show that the number of clock cycles can be reduced about $40\%\sim80\%$ for scrambling, convolutional encoding and interleaving compared with existing DSPs.
PDF KSCI

Design of QPSK Demodulator Using CMOS BPSK Receiver and Reflection-Type Phase Shifter (CMOS 기반 BPSK 수신기와 반사형 위상 천이기를 이용한 QPSK 복조기 설계)

Moon, Seong-Mo;Park, Dong-Hoon;Yu, Jong-Won;Lee, Moon-Que
- The Journal of Korean Institute of Electromagnetic Engineering and Science
- /
- v.20 no.8
- /
- pp.770-776
- /
- 2009
We propose and demonstrate an I/Q demodulator using four-port BPSK demodulator base on additive mixing and reflection-type phase shifter using hybrid technique. Previously, the conventional I/Q demodulator base on multiplicative or additive mixing method divides I/Q signal path from mixer to parallel-to-serial converter. In this paper, we propose new I/Q demodulator without dividing I/Q baseband signal path. The proposed schematic requires half size in implementation and half power consumption in baseband path compared with the conventional receiver. Also, the proposed receiver eliminates parallel-to-serial converter after data decoding. The proposed circuit has been successfully demodulated a QPSK signal with the L-band carrier frequency and 20 Mbps data rate.
https://doi.org/10.5515/KJKIEES.2009.20.8.770 인용 PDF KSCI

Hybrid FFT processor design using Parallel PD adder circuit (병렬 PD가산회로를 이용한 Hybrid FFT 연산기 설계)

김성대;최전균;안점영;송홍복
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2000.10a
- /
- pp.499-503
- /
- 2000
The use of Multiple-Valued FFT(Fast fourier Transform) is extended from binary to multiple-valued logic(MVL) circuits. A multiple-valued FFT circuit can be implemented using current-mode CMOS techniques, reducing the transitor, wires count between devices to half compared to that of a binary implementation. For adder processing in FFT, We give the number representation using such redundant digit sets are called redundant positive-digit number representation and a Redundant set uses the carry-propagation-free addition method. As the designed Multiple-valued FFT internally using PD(positive digit) adder with the digit set 0,1,2,3 has attractive features on speed, regularity of the structure and reduced complexities of active elements and interconnections. for the mutiplier processing, we give Multiple-valued LUT(Look up table)to facilitate simple mathmatical operations on the stored digits. Finally, Multiple-valued 8point FFT operation is used as an example in this paper to illuatrates how a multiple-valued FFT can be beneficial.
PDF

Low-complexity Timing Synchronization System for IEEE802.11a Wireless LANs (IEEE802.11a 무선 랜 적용을 위한 시간동기 시스템 제안)

하태현;이성주;김재석
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.28 no.11B
- /
- pp.965-971
- /
- 2003
This paper suggests a low-complexity frame timing synchronization system for IEEE802.11a wireless LAN systems. The proposed timing synchronization scheme has been implemented by correlating the received OFDM preamble with quantized coefficients composed of ｛0, ${\pm}$2$^{0}$ , ${\pm}$2$^1$‥‥‥ ${\pm}$2$^{i}$ ), where i is an integer number. The 2$^{i}$ -valued coefficients enable the multipliers in the correlation system to be simplified to i-bit shifters. So we can design the correlation system using shifters instead of multipliers. We estimate the performance of the proposed scheme in comparison with conventional systems under the AWGN and Rayleigh fading channels. In this paper we show that the complexity can be reduced by 90% while still maintaining a performance comparable to that of the conventional system.
PDF KSCI

IEEE-754 Floating-Point Divider for Embedded Processors (내장형 프로세서를 위한 IEEE-754 고성능 부동소수점 나눗셈기의 설계)

Jeong, Jae-Won;Hong, In-Pyo;Jeong, Woo-Kyong;Lee, Yong-Surk
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.39 no.7
- /
- pp.66-73
- /
- 2002
As floating-point operations become widely used in various applications such as computer graphics and high-definition DSP, the needs for fast division become increased. However, conventional floating-point dividers occupy a large hardware area, and bring bottle-becks to the entire floating-point operations. In this paper, a high-performance and small-area floating-point divider, which is suitable for embedded processors, is designed using he series expansion algorithm. The algorithm is selected to utilize two MAC(Multiply-ACcumulate) units for quadratic convergence to the correct quotient. The two MAC units for SIMD-DSP features are shared and the additional area for the division only is very small. The proposed divider supports all rounding modes defined by IEEE 754 standard, and error estimations are performed for appropriate precision.
PDF KSCI

Search Result 535, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)