• 제목/요약/키워드: Low-power Hardware Design

Search Result 201, Processing Time 0.021 seconds

AES-128/192/256 Rijndael Cryptoprocessor with On-the-fly Key Scheduler (On-the-fly 키 스케줄러를 갖는 AED-128/192/256 Rijndael 암호 프로세서)

  • Ahn, Ha-Kee;Shin, Kyung-Wook
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.39 no.11
    • /
    • pp.33-43
    • /
    • 2002
  • This paper describes a design of cryptographic processor that implements the AES (Advanced Encryption Standard) block cipher algorithm "Rijndael". To achieve high throughput rate, a sub-pipeline stage is inserted into a round transformation block, resulting that two consecutive round functions are simultaneously operated. For area-efficient and low-power implementation, the round transformation block is designed to share the hardware resources for encryption and decryption. An efficient on-the-fly key scheduler is devised to supports the three master-key lengths of 128-b/192-b/256-b, and it generates round keys in the first sub-pipeline stage of each round processing. The Verilog-HDL model of the cryptoprocessor was verified using Xilinx FPGA board and test system. The core synthesized using 0.35-${\mu}m$ CMOS cell library consists of about 25,000 gates. Simulation results show that it has a throughput of about 520-Mbits/sec with 220-MHz clock frequency at 2.5-V supply.

Design of FIR filter using direct memory access for voice signal processing module in implantable middle ear hearing devices (이식형 인공중이용 음성신호 처리 모듈을 위한 직접 메모리 억세스 기반의 FIR 필터 설계)

  • Kim, Jong-Min;Park, Il-Yong;Yoon, Young-Ho;Kim, Min-Kyu;Lim, Hyung-Gyu;Han, Ji-Hun;Kim, Myoung-Nam;Cho, Jin-Ho
    • Journal of Sensor Science and Technology
    • /
    • v.15 no.4
    • /
    • pp.223-230
    • /
    • 2006
  • An FIR filter for digital voice signal processing has been designed and implemented using a microcontroller in implantable middle ear hearing devices (IMEHDs). The designed digital voice signal processing filter which has fast and accurate filtering operation and controllable filter characteristics has been implemented using a hardware multiplier and a direct memory access (DMA) in the low power microcontroller, MSP430F169. It has been confirmed that each of the implemented 6-orders Remez FIR filters with 1 channel and 2 channels can be applied to the voice signal processing module of IMEHDs based on the evaluation results of the filtering performance experiment.

Design of an Efficient AES-ARIA Processor using Resource Sharing Technique (자원 공유기법을 이용한 AES-ARIA 연산기의 효율적인 설계)

  • Koo, Bon-Seok;Ryu, Gwon-Ho;Chang, Tae-Joo;Lee, Sang-Jin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.18 no.6A
    • /
    • pp.39-49
    • /
    • 2008
  • AEA and ARIA are next generation standard block cipher of US and Korea, respectively, and these algorithms are used in various fields including smart cards, electronic passport, and etc. This paper addresses the first efficient unified hardware architecture of AES and ARIA, and shows the implementation results with 0.25um CMOS library. We designed shared S-boxes based on composite filed arithmetic for both algorithms, and also extracted common terms of the permutation matrices of both algorithms. With the $0.25-{\mu}m$ CMOS technology, our processor occupies 19,056 gate counts which is 32% decreased size from discrete implementations, and it uses 11 clock cycles and 16 cycles for AES and ARIA encryption, which shows 720 and 1,047 Mbps, respectively.

New VLSI Architecture of Parallel Multiplier-Accumulator Based on Radix-2 Modified Booth Algorithm (Radix-2 MBA 기반 병렬 MAC의 VLSI 구조)

  • Seo, Young-Ho;Kim, Dong-Wook
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.4
    • /
    • pp.94-104
    • /
    • 2008
  • In this paper, we propose a new architecture of multiplier-and-accumulator (MAC) for high speed multiplication and accumulation arithmetic. By combining multiplication with accumulation and devising a hybrid type of carry save adder (CSA), the performance was improved. Since the accumulator which has the largest delay in MAC was removed and its function was included into CSA, the overall performance becomes to be elevated. The proposed CSA tree uses 1's complement-based radix-2 modified booth algorithm (MBA) and has the modified array for the sign extension in order to increase the bit density of operands. The CSA propagates the carries by the least significant bits of the partial products and generates the least significant bits in advance for decreasing the number of the input bits of the final adder. Also, the proposed MAC accumulates the intermediate results in the type of sum and carry bits not the output of the final adder for improving the performance by optimizing the efficiency of pipeline scheme. The proposed architecture was synthesized with $250{\mu}m,\;180{\mu}m,\;130{\mu}m$ and 90nm standard CMOS library after designing it. We analyzed the results such as hardware resource, delay, and pipeline which are based on the theoretical and experimental estimation. We used Sakurai's alpha power low for the delay modeling. The proposed MAC has the superior properties to the standard design in many ways and its performance is twice as much than the previous research in the similar clock frequency.

Digital Logic Extraction from QCA Designs (QCA 설계에서 디지털 논리 자동 추출)

  • Oh, Youn-Bo;Kim, Kyo-Sun
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.1
    • /
    • pp.107-116
    • /
    • 2009
  • Quantum-dot Cellular Automata (QCA) is one of the most promising next generation nanoelectronic devices which will inherit the throne of CMOS which is the domineering implementation technology for large scale low power digital systems. In late 1990s, the basic operations of the QCA cell were already demonstrated on a hardware implementation. Also, design tools and simulators were developed. Nevertheless, its design technology is not quite ready for ultra large scale designs. This paper proposes a new approach which enables the QCA designs to inherit the verification methodologies and tools of CMOS designs, as well. First, a set of disciplinary rules strictly restrict the cell arrangement not to deviate from the predefined structures but to guarantee the deterministic digital behaviors is proposed. After the gate and interconnect structures of. the QCA design are identified, the signal integrity requirements including the input path balancing of majority gates, and the prevention of the noise amplification are checked. And then the digital logic is extracted and stored in the OpenAccess common engineering database which provides a connection to a large pool of CMOS design verification tools. Towards validating the proposed approach, we designed a 2-bit adder, a bit-serial adder, and an ALU bit-slice. For each design, the digital logic is extracted, translated into the Verilog net list, and then simulated using a commercial software.

A Fast Inversion for Low-Complexity System over GF(2 $^{m}$) (경량화 시스템에 적합한 유한체 $GF(2^m)$에서의 고속 역원기)

  • Kim, So-Sun;Chang, Nam-Su;Kim, Chang-Han
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.42 no.9 s.339
    • /
    • pp.51-60
    • /
    • 2005
  • The design of efficient cryptosystems is mainly appointed by the efficiency of the underlying finite field arithmetic. Especially, among the basic arithmetic over finite field, the rnultiplicative inversion is the most time consuming operation. In this paper, a fast inversion algerian in finite field $GF(2^m)$ with the standard basis representation is proposed. It is based on the Extended binary gcd algorithm (EBGA). The proposed algorithm executes about $18.8\%\;or\;45.9\%$ less iterations than EBGA or Montgomery inverse algorithm (MIA), respectively. In practical applications where the dimension of the field is large or may vary, systolic array sDucture becomes area-complexity and time-complexity costly or even impractical in previous algorithms. It is not suitable for low-weight and low-power systems, i.e., smartcard, the mobile phone. In this paper, we propose a new hardware architecture to apply an area-efficient and a synchronized inverter on low-complexity systems. It requires the number of addition and reduction operation less than previous architectures for computing the inverses in $GF(2^m)$ furthermore, the proposed inversion is applied over either prime or binary extension fields, more specially $GF(2^m)$ and GF(P) .

Bit-serial Discrete Wavelet Transform Filter Design (비트 시리얼 이산 웨이블렛 변환 필터 설계)

  • Park Tae geun;Kim Ju young;Noh Jun rye
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.4A
    • /
    • pp.336-344
    • /
    • 2005
  • Discrete Wavelet Transform(DWT) is the oncoming generation of compression technique that has been selected for MPEG4 and JEPG2000, because it has no blocking effects and efficiently determines frequency property of temporary time. In this paper, we propose an efficient bit-serial architecture for the low-power and low-complexity DWT filter, employing two-channel QMF(Qudracture Mirror Filter) PR(Perfect Reconstruction) lattice filter. The filter consists of four lattices(filter length=8) and we determine the quantization bit for the coefficients by the fixed-length PSNR(peak-signal-to-noise ratio) analysis and propose the architecture of the bit-serial multiplier with the fixed coefficient. The CSD encoding for the coefficients is adopted to minimize the number of non-zero bits, thus reduces the hardware complexity. The proposed folded 1D DWT architecture processes the other resolution levels during idle periods by decimations and its efficient scheduling is proposed. The proposed architecture requires only flip-flops and full-adders. The proposed architecture has been designed and verified by VerilogHDL and synthesized by Synopsys Design Compiler with a Hynix 0.35$\mu$m STD cell library. The maximum operating frequency is 200MHz and the throughput is 175Mbps with 16 clock latencies.

Two Design Techniques of Embedded Systems Based on Ad-Hoc Network for Wireless Image Observation (애드 혹 네트워크 기반의 무선 영상 관측용 임베디드 시스템의 두 가지 설계 기법들)

  • LEE, Yong Up;Song, Chang-Yeoung;Park, Jeong-Uk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.39A no.5
    • /
    • pp.271-279
    • /
    • 2014
  • In this paper, the two design techniques of the embedded system which provides a wireless image observation with temporary ad-hoc network are proposed and developed. The first method is based on the embedded system design technique for a nearly real-time wireless short observation application, having a specific remote monitoring node with a built-in image processing function, and having the maximum rate of 1 fps (frame per second) wireless image transmission capability of a $160{\times}128$size image. The second technique uses the embedded system for a general wireless long observation application, consisting of the main node, the remote monitoring node, and the system controller with built-in image processing function, and the capability of the wireless image transmission rate of 1/3 fps. The proposed system uses the wireless ad-hoc network which is widely accepted as a short range, low power, and bidirectional digital communication, the hardware are consisted of the general developed modules, a small digital camera, and a PC, and the embedded software based upon the Zigbee stack and the user interface software are developed and tested on the implemented module. The wireless environment analysis and the performance results are presented.

The viterbi decoder implementation with efficient structure for real-time Coded Orthogonal Frequency Division Multiplexing (실시간 COFDM시스템을 위한 효율적인 구조를 갖는 비터비 디코더 설계)

  • Hwang Jong-Hee;Lee Seung-Yerl;Kim Dong-Sun;Chung Duck-Jin
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.42 no.2 s.332
    • /
    • pp.61-74
    • /
    • 2005
  • Digital Multimedia Broadcasting(DMB) is a reliable multi-service system for reception by mobile and portable receivers. DMB system allows interference-free reception under the conditions of multipath propagation and transmission errors using COFDM modulation scheme, simultaneously, needs powerful channel error's correction ability. Viterbi Decoder for DMB receiver uses punctured convolutional code and needs lots of computations for real-time operation. So, it is desired to design a high speed and low-power hardware scheme for Viterbi decoder. This paper proposes a combined add-compare-select(ACS) and path metric normalization(PMN) unit for computation power. The proposed PMN architecture reduces the problem of the critical path by applying fixed value for selection algorithm due to the comparison tree which has a weak point from structure with the high-speed operation. The proposed ACS uses the decomposition and the pre-computation technique for reducing the complicated degree of the adder, the comparator and multiplexer. According to a simulation result, reduction of area $3.78\%$, power consumption $12.22\%$, maximum gate delay $23.80\%$ occurred from punctured viterbi decoder for DMB system.

Design of a Holter Monitoring System with Flash Memory Card (플레쉬 메모리 카드를 이용한 홀터 심전계의 설계)

  • 송근국;이경중
    • Journal of Biomedical Engineering Research
    • /
    • v.19 no.3
    • /
    • pp.251-260
    • /
    • 1998
  • The Holter monitoring system is a widely used noninvasive diagnostic tool for ambulatory patient who may be at risk from latent life-threatening cardiac abnormalities. In this paper, we design a high performance intelligent holter monitoring system which is characterized by the small-sized and the low-power consumption. The system hardware consists of one-chip microcontroller(68HC11E9), ECG preprocessing circuit, and flash memory card. ECG preprocessing circuit is made of ECG preamplifier with gain of 250, 500 and 1000, the bandpass filter with bandwidth of 0.05-100Hz, the auto-balancing circuit and the saturation-calibrating circuit to eliminate baseline wandering, ECG signal sampled at 240 samples/sec is converted to the digital signal. We use a linear recursive filter and preprocessing algorithm to detect the ECG parameters which are QRS complex, and Q-R-T points, ST-level, HR, QT interval. The long-term acquired ECG signals and diagnostic parameters are compressed by the MFan(Modified Fan) and the delta modulation method. To easily interface with the PC based analyzer program which is operated in DOS and Windows, the compressed data, that are compatible to FFS(flash file system) format, are stored at the flash memory card with SBF(symmetric block format).

  • PDF