• Title/Summary/Keyword: CSA Tree

Search Result 19, Processing Time 0.031 seconds

FPGA Implementation of High Speed RSA Cryptosystem Using Radix-4 Modified Booth Algorithm and CSA (Radix-4 Modified Booth 알고리즘과 CSA를 이용한 고속 RSA 암호시스템의 FPGA 구현)

  • 박진영;서영호;김동욱
    • Proceedings of the IEEK Conference
    • /
    • 2001.06a
    • /
    • pp.337-340
    • /
    • 2001
  • This paper presented a new structure of RSA cryptosystem using modified Montgomery algorithm and CSA(Carry Save Adder) tree. Montgomery algorithm was modified to a radix-4 modified Booth algorithm. By appling radix-4 modified Booth algorithm and CSA tree to modular multiplication, a clock cycle for modular multiplication has been reduced to (n+3)/2 and carry propagation has been removed from the cell structure of modular multiplier. That is, the connection efficiency of full adders is enhanced.

  • PDF

Algorithm for Arthmetic Optimization using Carry-Save Adders (캐리-세이브 가산기를 이용한 연산 최적화 알고리즘)

  • Eom, Jun-Hyeong;Kim, Tae-Hwan
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.26 no.12
    • /
    • pp.1539-1547
    • /
    • 1999
  • 캐리-세이브 가산기 (CSA)는 회로 설계 과정에서 빠른 연산 수행을 위해 가장 널리 이용되는 연산기 중의 하나이다. 그러나, 현재까지 산업체에서 CSA를 이용한 설계는 설계자의 경험에 따른 수작업에 의존하고 있고 그 결과 최적의 회로를 만들기 위해 매우 많은 시간과 노력이 소비되고 있다. 이에 따라 최근 CSA를 기초로 하는 회로 합성 자동화 기법에 대한 연구의 필요성이 대두되고 있는 상황에서, 본 논문은 연산 속도를 최적화하는 효율적인 CSA 할당 알고리즘을 제안한다. 우리는 CSA 할당 문제를 2단계로 접근한다: (1) 연산식의 멀티 비트 입력들만을 고려하여 최소 수행 속도 (optimal-delay)의 CSA 트리를 할당한다; (2) (1)에서 구한 CSA 트리의 수행 속도 증가가 최소화 (minimal increase of delay) 되는 방향으로 CSA들의 캐리 입력 포트들에 나머지 싱글 비트 입력들을 배정한다. 실제 실험에서 우리의 제안된 알고리즘을 적용하여 연산식들의 회로 속도를 회로 면적의 증가 없이 상당한 수준까지 줄일 수 있었다.Abstract Carry-save-adder (CSA) is one of the most widely used implementations for fast arithmetics in industry. However, optimizing arithmetic circuits using CSAs is mostly carried out by the designer manually based on his/her design experience, which is a very time-consuming and error-prone task. To overcome this limitation, in this paper we propose an effective synthesis algorithm for solving the problem of finding an allocation of CSAs with a minimal timing for an arithmetic expression. Specifically, we propose a two step approach: (1) allocating a delay-optimal CSA tree for the multi-bit inputs of the arithmetic expression and (2) determining the assignment of the single-bit inputs to carry inputs of the CSAs which leads to a minimal increase of delay of the CSA tree obtained in step (1). For a number of arithmetic expressions, we found that our approach is very effective, reducing the timing of the circuits significantly without increasing the circuit area.

Design of fast 16-bit multiplier with $0.35\mu m $ CMOS technology (fullcustom $0.35\mu m $ CMOS 공정을 이용한 16*16 bit 고속 승산기의 설계)

  • 박현규;신현철;김종진
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2000.12a
    • /
    • pp.229-232
    • /
    • 2000
  • 각종 범용 컴퓨터 및 디지탈 신호처리에서 중요한 역할을 하는 16비트 정수형, 2의 보수 형태의 곱셈연산을 수행하기 위한 고속 승산기구조를 설계하고 시뮬레이션 하였다. 부분곱을 합하는 부분은 일반적으로 전체 곱셈기 처리 지연시간의 절반정도를 차지하므로 이 부분의 설계방법이 곱셈기의 궁극적인 속도향상에 직접적인 영향을 미친다. 부분곱의 개수를 줄이기 위하여 Booth encoder를 사용하였고, partial product(부분곱)의 덧셈시간을 줄이기 위하여 4:2 CSA(can save adder)와 3:2 CSA로 CSA tree를 구성 하였으며, 최종결과는 carry look- ahead tree로 얻어진다. Hyundai CMOS 0.35$\mu\textrm{m}$ 1-poly 4-metal 공정으로 layout하여 설계하였으며, 곱셈시간은 2.7ns(tipical case)이하로 측정되었다.

  • PDF

New VLSI Architecture of Parallel Multiplier-Accumulator Based on Radix-2 Modified Booth Algorithm (Radix-2 MBA 기반 병렬 MAC의 VLSI 구조)

  • Seo, Young-Ho;Kim, Dong-Wook
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.4
    • /
    • pp.94-104
    • /
    • 2008
  • In this paper, we propose a new architecture of multiplier-and-accumulator (MAC) for high speed multiplication and accumulation arithmetic. By combining multiplication with accumulation and devising a hybrid type of carry save adder (CSA), the performance was improved. Since the accumulator which has the largest delay in MAC was removed and its function was included into CSA, the overall performance becomes to be elevated. The proposed CSA tree uses 1's complement-based radix-2 modified booth algorithm (MBA) and has the modified array for the sign extension in order to increase the bit density of operands. The CSA propagates the carries by the least significant bits of the partial products and generates the least significant bits in advance for decreasing the number of the input bits of the final adder. Also, the proposed MAC accumulates the intermediate results in the type of sum and carry bits not the output of the final adder for improving the performance by optimizing the efficiency of pipeline scheme. The proposed architecture was synthesized with $250{\mu}m,\;180{\mu}m,\;130{\mu}m$ and 90nm standard CMOS library after designing it. We analyzed the results such as hardware resource, delay, and pipeline which are based on the theoretical and experimental estimation. We used Sakurai's alpha power low for the delay modeling. The proposed MAC has the superior properties to the standard design in many ways and its performance is twice as much than the previous research in the similar clock frequency.

A 32${\times}$32-b Multiplier Using a New Method to Reduce a Compression Level of Partial Products (부분곱 압축단을 줄인 32${\times}$32 비트 곱셈기)

  • 홍상민;김병민;정인호;조태원
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.40 no.6
    • /
    • pp.447-458
    • /
    • 2003
  • A high speed multiplier is essential basic building block for digital signal processors today. Typically iterative algorithms in Signal processing applications are realized which need a large number of multiply, add and accumulate operations. This paper describes a macro block of a parallel structured multiplier which has adopted a 32$\times$32-b regularly structured tree (RST). To improve the speed of the tree part, modified partial product generation method has been devised at architecture level. This reduces the 4 levels of compression stage to 3 levels, and propagation delay in Wallace tree structure by utilizing 4-2 compressor as well. Furthermore, this enables tree part to be combined with four modular block to construct a CSA tree (carry save adder tree). Therefore, combined with four modular block to construct a CSA tree (carry save adder tree). Therefore, multiplier architecture can be regularly laid out with same modules composed of Booth selectors, compressors and Modified Partial Product Generators (MPPG). At the circuit level new Booth selector with less transistors and encoder are proposed. The reduction in the number of transistors in Booth selector has a greater impact on the total transistor count. The transistor count of designed selector is 9 using PTL(Pass Transistor Logic). This reduces the transistor count by 50% as compared with that of the conventional one. The designed multiplier in 0.25${\mu}{\textrm}{m}$ technology, 2.5V, 1-poly and 5-metal CMOS process is simulated by Hspice and Epic. Delay is 4.2㎱ and average power consumes 1.81㎽/MHz. This result is far better than conventional multiplier with equal or better than the best one published.

Design of Partial Product Accumulator using Multi-Operand Decimal CSA and Improved Decimal CLA (다중 피연산자 십진 CSA와 개선된 십진 CLA를 이용한 부분곱 누산기 설계)

  • Lee, Yang;Park, TaeShin;Kim, Kanghee;Choi, SangBang
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.11
    • /
    • pp.56-65
    • /
    • 2016
  • In this paper, in order to reduce the delay and area of the partial product accumulation (PPA) of the parallel decimal multiplier, a tree architecture that composed by multi-operand decimal CSAs and improved CLA is proposed. The proposed tree using multi-operand CSAs reduces the partial product quickly. Since the input range of the recoder of CSA is limited, CSA can get the simplest logic. In addition, using the multi-operand decimal CSAs to add decimal numbers that have limited range in specific locations of the specific architecture can reduce the partial products efficiently. Also, final BCD result can be received faster by improving the logic of the decimal CLA. In order to evaluate the performance of the proposed partial product accumulation, synthesis is implemented by using Design Complier with 180 nm COMS technology library. Synthesis results show the delay of the proposed partial product accumulation is reduced by 15.6% and area is reduced by 16.2% comparing with which uses general method. Also, the total delay and area are still reduced despite the delay and area of the CLA are increased.

A Efficient Architecture of MBA-based Parallel MAC for High-Speed Digital Signal Processing (고속 디지털 신호처리를 위한 MBA기반 병렬 MAC의 효율적인 구조)

  • 서영호;김동욱
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.7
    • /
    • pp.53-61
    • /
    • 2004
  • In this paper, we proposed a new architecture of MAC(Multiplier-Accumulator) to operate high-speed multiplication-accumulation. We used the MBA(Modified radix-4 Booth Algorithm) which is based on the 1's complement number system, and CSA(Carry Save Adder) for addition of the partial products. During the addition of the partial product, the signed numbers with the 1's complement type after Booth encoding are converted in the 2's complement signed number in the CSA tree. Since 2-bit CLA(Carry Look-ahead Adder) was used in adding the lower bits of the partial product, the input bit width of the final adder and whole delay of the critical path were reduced. The proposed MAC was applied into the DWT(Discrete Wavelet Transform) filtering operation for JPEG2000, and it showed the possibility for the practical application. Finally we identified the improved performance according to the comparison with the previous architecture in the aspect of hardware resource and delay.

Implementation of Hardware Data Prefetcher Adaptable for Various State-of-the-Art Workload (다양한 최신 워크로드에 적용 가능한 하드웨어 데이터 프리페처 구현)

  • Kim, KangHee;Park, TaeShin;Song, KyungHwan;Yoon, DongSung;Choi, SangBang
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.12
    • /
    • pp.20-35
    • /
    • 2016
  • In this paper, in order to reduce the delay and area of the partial product accumulation (PPA) of the parallel decimal multiplier, a tree architecture that composed by multi-operand decimal CSAs and improved CLA is proposed. The proposed tree using multi-operand CSAs reduces the partial product quickly. Since the input range of the recoder of CSA is limited, CSA can get the simplest logic. In addition, using the multi-operand decimal CSAs to add decimal numbers that have limited range in specific locations of the specific architecture can reduce the partial products efficiently. Also, final BCD result can be received faster by improving the logic of the decimal CLA. In order to evaluate the performance of the proposed partial product accumulation, synthesis is implemented by using Design Complier with 180 nm COMS technology library. Synthesis results show the delay of the proposed partial product accumulation is reduced by 15.6% and area is reduced by 16.2% comparing with which uses general method. Also, the total delay and area are still reduced despite the delay and area of the CLA are increased.

New High Speed Parallel Multiplier for Real Time Multimedia Systems (실시간 멀티미디어 시스템을 위한 새로운 고속 병렬곱셈기)

  • Cho, Byung-Lok;Lee, Mike-Myung-Ok
    • The KIPS Transactions:PartA
    • /
    • v.10A no.6
    • /
    • pp.671-676
    • /
    • 2003
  • In this paper, we proposed a new First Partial product Addition (FPA) architecture with new compressor (or parallel counter) to CSA tree built in the process of adding partial product for improving speed in the fast parallel multiplier to improve the speed of calculating partial product by about 20% compared with existing parallel counter using full Adder. The new circuit reduces the CLA bit finding final sum by N/2 using the novel FPA architecture. A 5.14nS of multiplication speed of the $16{\times}16$ multiplier is obtained using $0.25\mu\textrm{m}$ CMOS technology. The architecture of the multiplier is easily opted for pipeline design and demonstrates high speed performance.

Analysis of the Effect of Tree Roots on Soil Reinforcement Considering Its Spatial Distribution (뿌리의 공간분포를 고려한 수목 뿌리의 토양보강 효과에 대한 분석)

  • Kim, Dongyeob;Lee, Sang Ho;Im, Sangjun
    • Journal of the Korean Society of Environmental Restoration Technology
    • /
    • v.14 no.4
    • /
    • pp.41-54
    • /
    • 2011
  • Tree roots can enhance soil shear strength and slope stability. However, there has been a limited study about root reinforcement of major tree species in Korea because of some experimental difficulties. Thus, this study was conducted to analyze the performance of Japanese larch (Larix kaempferi) and Korean pine (Pinus koraiensis) which are two common plantation species in Korea. Profile wall method was used to measure the spatial distribution of root system and its diameter within 15 soil walls of Japanese larch stand and 13 soil walls of Korean pine stand in Taehwa University Forest, Seoul National University, Korea. Root tensile properties of each species were assessed in the laboratory, and root reinforcements were estimated by Wu model. The study observed that the number and cross-sectional area (CSA) of root in both species could tend to decrease with soil depth. Especially, CSA were well-fitted to exponential functions of soil depth. Mean root area ratios (RAR) were 0.03% and 0.10% for Japanese larch and Korean pine, respectively. Estimated root reinforcement from Wu model were, on the average, 4.04 kPa for Japanese larch and 12.26 kPa for Korean pine. Overall, it was concluded that root reinforcement increased the factor of safety (Fs) of slope for small-scale landslide as the result of two-dimensional (2-D) infinite slope stability analysis considering vegetation effects.