• Title/Summary/Keyword: Low-power multiplication

Search Result 75, Processing Time 0.024 seconds

Adaptive IIR filter designed for the separation of scintillation and rain attenuation phenomena

  • Sangaroon, O.;Chutchavong, V.;Anekpongpun, K.;Benjangkaprasert, C.;Sooraksa, P.;Moriya, Y.
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2001.10a
    • /
    • pp.109.5-109
    • /
    • 2001
  • The separation of scintillation phenomena concurrent with rain attenuation phenomena can be accomplished by filtering. Based on the analysis of satellite signal fading during rain, scintillation and rain attenuation phenomena are examined and extracting from raw data by using adaptive IIR high-pass filter and adaptive IIR low-pass filter. Adaptive IIR filter are designed by using the algorithm of Least Mean p-Power (LMP) Error Criterion which have been modified by Quantizing Gradient technique. This algorithm reduces amount of multiplication computational equal to the length of input data. It is prove here that the convergence speed, variance, bias independence on p values. For this application, p=1 is chosen. The procedure of application ...

  • PDF

A design of 32-bit RISC core for PDA (PDA를 위한 32비트 RISC 코어의 설계)

  • 곽승호;최병윤;이문기
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.22 no.10
    • /
    • pp.2136-2149
    • /
    • 1997
  • This paper describes RISC core that has been designed for embedded and protable applications such as PDA or PCS. This RISC processor offers low power consumption and fast context switching. Processor performance is improved by using conditional instruction execution, block data transfer instruction, and multiplication instruction. This architecture is based on RISC principles. The processor adopts 3-stage instruction execution pipeline and has achieved single cycle execution using a 2-phase 40MHz clock. This results in a high instruction throughput and real-time interrupt response. This chip is implemented with $0.6{\mu}m$ triple metal CMOS technology and consists of about 88K transistors. The estimated power dissipation is 179mW.

  • PDF

New Memristor-Based Crossbar Array Architecture with 50-% Area Reduction and 48-% Power Saving for Matrix-Vector Multiplication of Analog Neuromorphic Computing

  • Truong, Son Ngoc;Min, Kyeong-Sik
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.14 no.3
    • /
    • pp.356-363
    • /
    • 2014
  • In this paper, we propose a new memristor-based crossbar array architecture, where a single memristor array and constant-term circuit are used to represent both plus-polarity and minus-polarity matrices. This is different from the previous crossbar array architecture which has two memristor arrays to represent plus-polarity and minus-polarity connection matrices, respectively. The proposed crossbar architecture is tested and verified to have the same performance with the previous crossbar architecture for applications of character recognition. For areal density, however, the proposed crossbar architecture is twice better than the previous architecture, because only single memristor array is used instead of two crossbar arrays. Moreover, the power consumption of the proposed architecture can be smaller by 48% than the previous one because the number of memristors in the proposed crossbar architecture is reduced to half compared to the previous crossbar architecture. From the high areal density and high energy efficiency, we can know that this newly proposed crossbar array architecture is very suitable to various applications of analog neuromorphic computing that demand high areal density and low energy consumption.

Design and Analysis of a $AB^2$ Systolic Arrays for Division/Inversion in$GF(2^m)$ ($GF(2^m)$상에서 나눗셈/역원 연산을 위한 $AB^2$ 시스톨릭 어레이 설계 및 분석)

  • 김남연;고대곤;유기영
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.1
    • /
    • pp.50-58
    • /
    • 2003
  • Among finite field arithmetic operations, the $AB^2$ operation is known as an efficient basic operation for public key cryptosystems over $GF(2^m)$,Division/Inversion is computed by performing the repetitive AB$^2$ multiplication. This paper presents two new $AB^2$algorithms and their systolic realizations in finite fields $GF(2^m)$.The proposed algorithms are based on the MSB-first scheme using standard basis representation and the proposed systolic architectures for $AB^2$ multiplication have a low hardware complexity and small latency compared to the conventional approaches. Additionally, since the proposed architectures incorporate simplicity, regularity, modularity, and pipelinability, they are well suited to VLSI implementation and can be easily applied to inversion architecture. Furthermore, these architectures will be utilized for the basic architecture of crypto-processor.

A Research on Effective Wi-Fi Easy Connect Protocol Improvement Method Applicable to Wired and Wireless Environments (유·무선 환경에 적용 가능한 효율적인 Wi-Fi Easy Connect 프로토콜 개선방안 연구)

  • Ho-jei Yu;Chan-hee Kim;Sung-sik Im;Seo-yeon Kim;Dong-woo Kim;Soo-hyun Oh
    • Convergence Security Journal
    • /
    • v.23 no.1
    • /
    • pp.45-54
    • /
    • 2023
  • Recently, with the development of the Internet of Things, research on protocols that can easily connect devices without a UI to the network has been steadily conducted. To this end, the Wi-Fi Alliance announced Wi-Fi Easy Connect, which can connect to a network using a QR code. However, since Wi-Fi Easy Connect requires a large amount of computation for safety, it is difficult to apply to low-power and miniaturized IoT devices. In addition, Wi-Fi Easy Connect considering scalability is designed to operate in a wired environment, but problems such as duplicate encryption occur because it does not consider a security environment like TLS. Therefore, in this paper, we analyze the Wi-Fi Easy Connect protocol and propose a protocol that can operate efficiently in the TLS environment. It was confirmed that the proposed protocol satisfies the existing security requirements and at the same time reduces about 67% of ECC scalar multiplication operations with a large amount of computation.

New VLSI Architecture of Parallel Multiplier-Accumulator Based on Radix-2 Modified Booth Algorithm (Radix-2 MBA 기반 병렬 MAC의 VLSI 구조)

  • Seo, Young-Ho;Kim, Dong-Wook
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.4
    • /
    • pp.94-104
    • /
    • 2008
  • In this paper, we propose a new architecture of multiplier-and-accumulator (MAC) for high speed multiplication and accumulation arithmetic. By combining multiplication with accumulation and devising a hybrid type of carry save adder (CSA), the performance was improved. Since the accumulator which has the largest delay in MAC was removed and its function was included into CSA, the overall performance becomes to be elevated. The proposed CSA tree uses 1's complement-based radix-2 modified booth algorithm (MBA) and has the modified array for the sign extension in order to increase the bit density of operands. The CSA propagates the carries by the least significant bits of the partial products and generates the least significant bits in advance for decreasing the number of the input bits of the final adder. Also, the proposed MAC accumulates the intermediate results in the type of sum and carry bits not the output of the final adder for improving the performance by optimizing the efficiency of pipeline scheme. The proposed architecture was synthesized with $250{\mu}m,\;180{\mu}m,\;130{\mu}m$ and 90nm standard CMOS library after designing it. We analyzed the results such as hardware resource, delay, and pipeline which are based on the theoretical and experimental estimation. We used Sakurai's alpha power low for the delay modeling. The proposed MAC has the superior properties to the standard design in many ways and its performance is twice as much than the previous research in the similar clock frequency.

Energy-Efficient Signal Processing Using FPGAs (FPGA 상에서 에너지 효율이 높은 병렬 신호처리 기법)

  • Jang Ju-wook;Hwang Yunil;Scrofano Ronald;Prasanna Viktor K.
    • The KIPS Transactions:PartA
    • /
    • v.12A no.4 s.94
    • /
    • pp.305-312
    • /
    • 2005
  • In this paper, we present algorithm-level techniques for energy-efficient design at the algorithm level using FPGAs. We then use these techniques to create energy-efficient designs for two signal processing kernel applications: fast Fourier transform(FFT) and matrix multiplication. We evaluate the performance, in terms of both latency and energy efficiency, of FPGAs in performing these tasks. Using a Xilinx Virtex-II as the target FPGA, we compare the performance of our designs to those from the Xilinx library as well as to conventional algorithms run on the PowerPC core embedded in the Virtex-II Pro and the Texas Instruments TMS320C6415. Our evaluations are done both through estimation based on energy and latency equations on high-level and through low-level simulation. For FFT, our designs dissipated an average of $50\%$ less energy than the design from the Xilinx library and $56\%$ less than the DSP. Our designs showed an EAT factor of 10 times improvement over the embedded processor. These results provide a concrete evidence to substantiate the idea that FPGAs can outperform DSPs and embedded processors in signal processing. Further, they show that PFGAs can achieve this performance while still dissipating less energy than the other two types of devices.

Design of a Low Power Reconfigurable DSP with Fine-Grained Clock Gating (정교한 클럭 게이팅을 이용한 저전력 재구성 가능한 DSP 설계)

  • Jung, Chan-Min;Lee, Young-Geun;Chung, Ki-Seok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.2
    • /
    • pp.82-92
    • /
    • 2008
  • Recently, many digital signal processing(DSP) applications such as H.264, CDMA and MP3 are predominant tasks for modern high-performance portable devices. These applications are generally computation-intensive, and therefore, require quite complicated accelerator units to improve performance. Designing such specialized, yet fixed DSP accelerators takes lots of effort. Therefore, DSPs with multiple accelerators often have a very poor time-to-market and an unacceptable area overhead. To avoid such long time-to-market and high-area overhead, dynamically reconfigurable DSP architectures have attracted a lot of attention lately. Dynamically reconfigurable DSPs typically employ a multi-functional DSP accelerator which executes similar, yet different multiple kinds of computations for DSP applications. With this type of dynamically reconfigurable DSP accelerators, the time to market reduces significantly. However, integrating multiple functionalities into a single IP often results in excessive control and area overhead. Therefore, delay and power consumption often turn out to be quite excessive. In this thesis, to reduce power consumption of dynamically reconfigurable IPs, we propose a novel fine-grained clock gating scheme, and to reduce size of dynamically reconfigurable IPs, we propose a compact multiplier-less multiplication unit where shifters and adders carry out constant multiplications.

New High Speed Parallel Multiplier for Real Time Multimedia Systems (실시간 멀티미디어 시스템을 위한 새로운 고속 병렬곱셈기)

  • Cho, Byung-Lok;Lee, Mike-Myung-Ok
    • The KIPS Transactions:PartA
    • /
    • v.10A no.6
    • /
    • pp.671-676
    • /
    • 2003
  • In this paper, we proposed a new First Partial product Addition (FPA) architecture with new compressor (or parallel counter) to CSA tree built in the process of adding partial product for improving speed in the fast parallel multiplier to improve the speed of calculating partial product by about 20% compared with existing parallel counter using full Adder. The new circuit reduces the CLA bit finding final sum by N/2 using the novel FPA architecture. A 5.14nS of multiplication speed of the $16{\times}16$ multiplier is obtained using $0.25\mu\textrm{m}$ CMOS technology. The architecture of the multiplier is easily opted for pipeline design and demonstrates high speed performance.

Design and Implementation of low-power short-length running convolution filter using filter banks (필터 뱅크를 사용한 저전력 short-length running convolution 필터 설계 및 구현)

  • Jang Young-Beom
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.7 no.4
    • /
    • pp.625-634
    • /
    • 2006
  • In this paper, an efficient and fast algorithm to reduce calculation amount of FIR(Finite Impulse Responses) filtering is proposed. Proposed algorithm enables arbitrary size of parallel processing, and their structures are also easily derived. Furthermore, it is shown that the number of multiplication/sample is remarkably reduced. For theoretical improvement, numbers of sub filters are compared with those of conventional algorithm. In addition to the theoretical improvement, it is shown that number of element for hardwired implementation are reduced comparison to those of the conventional algorithm.

  • PDF