• Title/Summary/Keyword: bit-serial implementation

Search Result 52, Processing Time 0.028 seconds

Implementation of All-Optical Serial-Parallel Data Converters Using Mach-Zehnder Interferometers and Applications (MZI를 이용한 전광 직렬-병렬 데이터 형식 변환기 구현과 활용 방안)

  • Lee, Sung Chul
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.7 no.2
    • /
    • pp.59-65
    • /
    • 2011
  • All-optical signal processing is expected to offer advantages in speed and power consumption against over electronics signal processing. It has a potential to solve the bottleneck issues of ultra-high speed communication network nodes. All-optical serial-to-parallel and parallel-to-serial data converters would make it possible to easily process the serial data information of a high-speed optical packet without optical-to-electronic-to-optical data conversion. In this paper, we explain the principle of simple and easily expandable all-optical serial-to-parallel and parallel-to-serial data converters based on Mach-Zehnder interferometers. We experimentally demonstrate these data converters at 10Gbit/s serial data rate. They are useful all-optical devices for the all-optical implementations of label decoding, self-routing, control of variable packets, bit-wise logical operation, and data format conversion.

Parallel Implementation of the Recursive Least Square for Hyperspectral Image Compression on GPUs

  • Li, Changguo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.7
    • /
    • pp.3543-3557
    • /
    • 2017
  • Compression is a very important technique for remotely sensed hyperspectral images. The lossless compression based on the recursive least square (RLS), which eliminates hyperspectral images' redundancy using both spatial and spectral correlations, is an extremely powerful tool for this purpose, but the relatively high computational complexity limits its application to time-critical scenarios. In order to improve the computational efficiency of the algorithm, we optimize its serial version and develop a new parallel implementation on graphics processing units (GPUs). Namely, an optimized recursive least square based on optimal number of prediction bands is introduced firstly. Then we use this approach as a case study to illustrate the advantages and potential challenges of applying GPU parallel optimization principles to the considered problem. The proposed parallel method properly exploits the low-level architecture of GPUs and has been carried out using the compute unified device architecture (CUDA). The GPU parallel implementation is compared with the serial implementation on CPU. Experimental results indicate remarkable acceleration factors and real-time performance, while retaining exactly the same bit rate with regard to the serial version of the compressor.

Parallel Implementation of Distributed Sample Scrambler (분산표본혼화기의 병렬구현)

  • 정헌주;김재형정성현박승철
    • Proceedings of the IEEK Conference
    • /
    • 1998.06a
    • /
    • pp.62-65
    • /
    • 1998
  • This paper presents a method and implementation of the parallel distributed sample scrambler(DSS) in the cell-based ATM transmission environment. In the serial processing, it requires very high speed clock because the processing clock of the serial DSS is equal with the data transmission speed. In this paper, we develop a conversion method of the serial SRG(shift register generator) to 8bit parallel realization. In this case, it has a sample data processing problem which is a character of DSS. So, a theory of correction time movement is presented to solve this problem. We has developed a ASIC using this algorithm and verified the recommendation of ITU-T, I.432.

  • PDF

Design and Implementation of a Latency Efficient Encoder for LTE Systems

  • Hwang, Soo-Yun;Kim, Dae-Ho;Jhang, Kyoung-Son
    • ETRI Journal
    • /
    • v.32 no.4
    • /
    • pp.493-502
    • /
    • 2010
  • The operation time of an encoder is one of the critical implementation issues for satisfying the timing requirements of Long Term Evolution (LTE) systems because the encoder is based on binary operations. In this paper, we propose a design and implementation of a latency efficient encoder for LTE systems. By virtue of 8-bit parallel processing of the cyclic redundancy checking attachment, code block (CB) segmentation, and a parallel processor, we are able to construct engines for turbo codings and rate matchings of each CB in a parallel fashion. Experimental results illustrate that although the total area and clock period of the proposed scheme are 19% and 6% larger than those of a conventional method based on a serial scheme, respectively, our parallel structure decreases the latency by about 32% to 65% compared with a serial structure. In particular, our approach is more latency efficient when the encoder processes a number of CBs. In addition, we apply the proposed scheme to a real system based on LTE, so that the timing requirement for ACK/NACK transmission is met by employing the encoder based on the parallel structure.

Implementation of Modular Multiplication and Communication Adaptor for Public Key Crytosystem (공개키 암호체계를 위한 Modular 곱셈개선과 통신회로 구현에 관한 연구)

  • 한선경;이선복;유영갑
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.16 no.7
    • /
    • pp.651-662
    • /
    • 1991
  • An improved modular multiplication algorithm for RSA type public key cryptosystem and its application to a serial communication cricuit are presented. Correction on a published fast modular multiplication algorithm is proposed and verified thru simulation. Cryptosystem for RS 232C communication protocol isdesigned and prototyped for low speed data exchange between computers. The system adops the correct algoroithm and operates successfully using a small size key.

  • PDF

Design and Analysis of a Digit-Serial $AB^{2}$ Systolic Arrays in $GF(2^{m})$ ($GF(2^{m})$ 상에서 새로운 디지트 시리얼 $AB^{2}$ 시스톨릭 어레이 설계 및 분석)

  • Kim Nam-Yeun;Yoo Kee-Young
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.4
    • /
    • pp.160-167
    • /
    • 2005
  • Among finite filed arithmetic operations, division/inverse is known as a basic operation for public-key cryptosystems over $GF(2^{m})$ and it is computed by performing the repetitive $AB^{2}$ multiplication. This paper presents a digit-serial-in-serial-out systolic architecture for performing the $AB^2$ operation in GF$(2^{m})$. To obtain L×L digit-serial-in-serial-out architecture, new $AB^{2}$ algorithm is proposed and partitioning, index transformation and merging the cell of the architecture, which is derived from the algorithm, are proposed. Based on the area-time product, when the digit-size of digit-serial architecture, L, is selected to be less than about m, the proposed digit-serial architecture is efficient than bit-parallel architecture, and L is selected to be less than about $(1/5)log_{2}(m+1)$, the proposed is efficient than bit-serial. In addition, the area-time product complexity of pipelined digit-serial $AB^{2}$ systolic architecture is approximately $10.9\%$ lower than that of nonpipelined one, when it is assumed that m=160 and L=8. Additionally, since the proposed architecture can be utilized for the basic architecture of crypto-processor and it is well suited to VLSI implementation because of its simplicity, regularity and pipelinability.

Efficient Algorithm and Architecture for Elliptic Curve Cryptographic Processor

  • Nguyen, Tuy Tan;Lee, Hanho
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.16 no.1
    • /
    • pp.118-125
    • /
    • 2016
  • This paper presents a new high-efficient algorithm and architecture for an elliptic curve cryptographic processor. To reduce the computational complexity, novel modified Lopez-Dahab scalar point multiplication and left-to-right algorithms are proposed for point multiplication operation. Moreover, bit-serial Galois-field multiplication is used in order to decrease hardware complexity. The field multiplication operations are performed in parallel to improve system latency. As a result, our approach can reduce hardware costs, while the total time required for point multiplication is kept to a reasonable amount. The results on a Xilinx Virtex-5, Virtex-7 FPGAs and VLSI implementation show that the proposed architecture has less hardware complexity, number of clock cycles and higher efficiency than the previous works.

A Study on the Hardware Implementation of A 3${\times}$3 Window Weighted Median Filter Using Bit-Level Sorting Algorithm (비트 레벨 정렬 알고리즘을 이용한 3${\times}$3 윈도우 가중 메디언 필터의 하드웨어 구현에 관한 연구)

  • 이태욱;조상복
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.53 no.3
    • /
    • pp.197-205
    • /
    • 2004
  • In this paper, we studied on the hardware implementation of a 3${\times}$3 window weighted median filter using bit-level sorting algorithm. The weighted median filter is a generalization of the median filter that is able to preserve :,harp changes in signal and is very effective in removing impulse noise. It has been successfully applied in various areas such as digital signal and video/image processing. The weighted median filters are, for the most part, based on word-level sorting methods, which have more hardware and time complexity, However, the proposed bit-serial sorting algorithm uses weighted adder tree to overcome those disadvantages. It also offers a simple pipelined filter architecture that is highly regular with repeated modules and is very suitable for weighted median filtering. The algorithm was implemented by VHDL and graphical environment in MAX+PlusII of ALTERA. The simulation results indicate that the proposed design method is more efficient than the traditional ones.

Hardware Implementation of Minimized Serial-Divider for Image Frame-Unit Processing in Mobile Phone Camera. (Mobile Phone Camera의 이미지 프레임 단위 처리를 위한 소형화된 Serial-Divider의 하드웨어 구현)

  • Kim, Kyung-Rin;Lee, Sung-Jin;Kim, Hyun-Soo;Kim, Kang-Joo;Kang, Bong-Soon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2007.10a
    • /
    • pp.119-122
    • /
    • 2007
  • In this paper, we propose the method of hardware-design for the division operation of image frame-unit processing in mobile phone camera. Generally, there are two types of the data processing, which are the parallel and serial type. The parallel type makes it possible to process in realtime, but it needs significant hardware size due to many comparators and buffer memories. Compare the serial type with the parallel type, the hardware size of the serial type is smaller than the other because it uses only one comparator, but serial type is not able to process in realtime. To use the hardware resources efficiently, we employ the serial divider since frame-unit operation for image processing does not need realtime process. When compared with both in the same bit size and operating frequency, the hardware size of the serial divider is approximately in the ratio of 13 percentage compared with the parallel divider.

  • PDF

Implementation of low power BSPE Core for deep learning hardware accelerators (딥러닝을 하드웨어 가속기를 위한 저전력 BSPE Core 구현)

  • Jo, Cheol-Won;Lee, Kwang-Yeob;Nam, Ki-Hun
    • Journal of IKEEE
    • /
    • v.24 no.3
    • /
    • pp.895-900
    • /
    • 2020
  • In this paper, BSPE replaced the existing multiplication algorithm that consumes a lot of power. Hardware resources are reduced by using a bit-serial multiplier, and variable integer data is used to reduce memory usage. In addition, MOA resource usage and power usage were reduced by applying LOA (Lower-part OR Approximation) to MOA (Multi Operand Adder) used to add partial sums. Therefore, compared to the existing MBS (Multiplication by Barrel Shifter), hardware resource reduction of 44% and power consumption of 42% were reduced. Also, we propose a hardware architecture design for BSPE Core.