• Title/Summary/Keyword: 8-bit parallel processing

Search Result 45, Processing Time 0.027 seconds

Real-time 256-channel 12-bit 1ks/s Hardware for MCG Signal Acquisition (심자도 신호획득을 위한 실시간 256-채널 12-bit 1ks/s 하드웨어)

  • Yoo, Jae-Tack
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.54 no.11
    • /
    • pp.643-649
    • /
    • 2005
  • A heart diagnosis system adopts Superconducting Quantum Interface Device(SQUD) sensors for precise MCG(MagnetoCardioGram) signal acquisitions. Such system needs to deal with hundreds of sensors, requiring fast signal sampling md precise analog-to-digital conversions(ADC). Our development of hardware board, processing 64-channel 12-bit in 1 ks/s speed, is built by using 8-channel ADC chips, 8-bit microprocessors, SPI interfaces, and specially designed parallel data transfers between microprocessors to meet the 1ks/s, i.e. 1 mili-second sampling interval. We extend the design into 256-channel hardware and analyze the speed .using the measured data from the 64-channel hardware. Since our design exploits full parallel processing, Assembly level coding, and NOP(No Operation) instruction for timing control, the design provides expandability and lowest system timing margin. Our result concludes that the data collection with 256-channel analog input signals can be done in 201.5us time-interval which is much shorter than the required 1 mili-second period.

Implementation of Pixel Subword Parallel Processing Instructions for Embedded Parallel Processors (임베디드 병렬 프로세서를 위한 픽셀 서브워드 병렬처리 명령어 구현)

  • Jung, Yong-Bum;Kim, Jong-Myon
    • The KIPS Transactions:PartA
    • /
    • v.18A no.3
    • /
    • pp.99-108
    • /
    • 2011
  • Processor technology is currently continued to parallel processing techniques, not by only increasing clock frequency of a single processor due to the high technology cost and power consumption. In this paper, a SIMD (Single Instruction Multiple Data) based parallel processor is introduced that efficiently processes massive data inherent in multimedia. In addition, this paper proposes pixel subword parallel processing instructions for the SIMD parallel processor architecture that efficiently operate on the image and video pixels. The proposed pixel subword parallel processing instructions store and process four 8-bit pixels on the partitioned four 12-bit registers in a 48-bit datapath architecture. This solves the overflow problem inherent in existing multimedia extensions and reduces the use of many packing/unpacking instructions. Experimental results using the same SIMD-based parallel processor architecture indicate that the proposed pixel subword parallel processing instructions achieve a speedup of $2.3{\times}$ over the baseline SIMD array performance. This is in contrast to MMX-type instructions (a representative Intel multimedia extension), which achieve a speedup of only $1.4{\times}$ over the same baseline SIMD array performance. In addition, the proposed instructions achieve $2.5{\times}$ better energy efficiency than the baseline program, while MMX-type instructions achieve only $1.8{\times}$ better energy efficiency than the baseline program.

A comparative study on the addition architecture of high-speed checksum module (고속 검사합 모듈의 덧셈구조에 관한 비교 연구)

  • 김대현;한상원공진흥
    • Proceedings of the IEEK Conference
    • /
    • 1998.10a
    • /
    • pp.1029-1032
    • /
    • 1998
  • In this paper, a comparative study is presented to evaluate the addition architecture of the high-speed checksum module in TCP/IP processing. In order to speed up TCP/IP processing, H/W implementation offers concurrent and parallel processing to yield high speed computation, with respect to S/W implementation. This research aims at comparing two addition architectures of checksum module, which is the major botteleneck in TCP/IP processing. The 16-bit and 8-bit byte-by-byte addition architecture are implemented by the full custom design, and compared, in analytical and experimental manner, from standpoint of space and performance. For LG $0.6\mu\textrm{m}$ TLM process, the 8-bit addition implementation requires the area, 1.3 times larger than the 16-bit one, and it operates at 80MHz while the 16-bit one runs by 66MHz.

  • PDF

An Architecture for $32{\times}32$ bit high speed parallel multiplier ($32{\times}32 $ 비트 고속 병렬 곱셈기 구조)

  • 김영민;조진호
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.31B no.10
    • /
    • pp.67-72
    • /
    • 1994
  • In this paper we suggest a 32 bit high speed parallel multiplier which plays an important role in digital signal processing. We employ a bit-pair recoding Booth algoritham that gurantees n/2 partial product terms, which uniformly handles the signed-operand case. While partial product terms are generated, a special method is suggested to reduce time delay by employing 1's complement instead of 2's complement. Later when partial products are added, the additional 1 bit's are packed in a single partial product term and added to in the parallel counter. Then 16 partial product terms are reduced to two summands by using successive parallel counters. Final multiplication value is obtained by a BLC adder. When this multiplier is simulated under 0.8$\mu$CMOS standard cell we obtain 30ns multiplier speed.

  • PDF

Design and Implementation of a Latency Efficient Encoder for LTE Systems

  • Hwang, Soo-Yun;Kim, Dae-Ho;Jhang, Kyoung-Son
    • ETRI Journal
    • /
    • v.32 no.4
    • /
    • pp.493-502
    • /
    • 2010
  • The operation time of an encoder is one of the critical implementation issues for satisfying the timing requirements of Long Term Evolution (LTE) systems because the encoder is based on binary operations. In this paper, we propose a design and implementation of a latency efficient encoder for LTE systems. By virtue of 8-bit parallel processing of the cyclic redundancy checking attachment, code block (CB) segmentation, and a parallel processor, we are able to construct engines for turbo codings and rate matchings of each CB in a parallel fashion. Experimental results illustrate that although the total area and clock period of the proposed scheme are 19% and 6% larger than those of a conventional method based on a serial scheme, respectively, our parallel structure decreases the latency by about 32% to 65% compared with a serial structure. In particular, our approach is more latency efficient when the encoder processes a number of CBs. In addition, we apply the proposed scheme to a real system based on LTE, so that the timing requirement for ACK/NACK transmission is met by employing the encoder based on the parallel structure.

Design of an 8-bit Color Adjustor for SDTV Using Verilog HDL (Verilog HDL을 이용한 SDTV용 8bit 색상 보정기의 설계)

  • Jeon, Byoung-Woong;Song, In-Chae
    • Proceedings of the IEEK Conference
    • /
    • 2005.11a
    • /
    • pp.801-804
    • /
    • 2005
  • In this paper, we designed an 8-bit color adjustor for SDTV using Verilog HDL. The conversion block requires a lot of multiplication. So we adopted Booth algorithm to reduce amount of operation and processing time. To improve speed, we designed the system output as parallel structure. We synthesized the designed system using Xilinx ISE and verified the operation through simulation using Modelsim.

  • PDF

Development of 64-Channel 12-bit 1ks/s Hardware for MCG Signal Acquisition (심자도 신호 획득을 위한 실시간 64-Ch 12-bit 1ks/s 하드웨어 개발)

  • Lee, Dong-Ha;Yoo, Jae-Tack
    • Proceedings of the Korean Institute of Electrical and Electronic Material Engineers Conference
    • /
    • 2004.07b
    • /
    • pp.902-905
    • /
    • 2004
  • A heart diagnosis system adopts Superconducting Quantum Interface Device(SQUID) sensors for precision MCG signal acquisitions. Such system is composed of hundreds of sensors, requiring fast signal sampling and precise analog-digital conversions(ADC). Our development of hardware board, processing 64-channel 12-bit 1ks/s, is built by using 8-channel ADC chips, 8-bit microprocessors, SPI interfaces, and parallel data transfers between microprocessors to meet the 1ks/s, i.e. 1 ms speed. The test result shows that the signal acquisition is done in 168 usuc which is much shorter than the required 1 ms period. This hardware will be extended to 256 channel data acquisition to be used for the diagnosis system.

  • PDF

The Algorithm Design and Implemention for Operation using a Matrix Table in the WAVE system (WAVE 시스템에서 행렬 테이블로 연산하기 위한 알고리즘 설계 및 구현)

  • Lee, Dae-Sik;You, Young-Mo;Lee, Sang-Youn;Jang, Chung-Ryong
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37 no.4A
    • /
    • pp.189-196
    • /
    • 2012
  • A WAVE(Wireless Access for Vehicular Environment) system is a vehicle communication technology. The system provides the services to prevent vehicle accidents that might occur during driving. Also, it is used to provide various services such as monitoring vehicle management and system failure. However, the scrambler bit operation of WAVE system becomes less efficient in the organizations of software and hardware design because the parallel processing is impossible. Although scrambler algorithm proposed in this paper has different processing speed depending on input data 8 bit, 16 bit, 32 bit, and 64 bit. it improves the processing speed of the operation because it can make parallel processing possible depending on the input unit.

Research for Improving the Speed of Scrambler in the WAVE System (WAVE 시스템에서 스크램블러의 속도 향상을 위한 연구)

  • Lee, Dae-Sik;You, Young-Mo;Lee, Sang-Youn;Oh, Se-Kab
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37A no.9
    • /
    • pp.799-808
    • /
    • 2012
  • Bit operation of scrambler in the WAVE System become less efficient because parallel processing is impossible in terms of hardware and software. In this paper, we propose algorism to find the starting position of the matrix table. Also, when bit operation algorithm of scrambler and algorithms for matrix table, algorithm used to find starting position of the matrix table were compared with the performance as 8 bit, 16bit, 32 bit processing units. As a result, the number of processing times per second could be done 2917.8 times more in an 8-bit, 5432.1 times in a 16-bit, 10277.8 times in a 32 bit. Therefore, algorithm to find the starting position of the matrix table improves the speed of the scrambler in the WAVE and the receiving speed of a variety of information gathering and precision over the Vehicle to Infra or Vehicle to Vehicle in the Intelligent Transport Systems.

Optimized Implementation of PIPO Lightweight Block Cipher on 32-bit RISC-V Processor (32-bit RISC-V상에서의 PIPO 경량 블록암호 최적화 구현)

  • Eum, Si Woo;Jang, Kyung Bae;Song, Gyeong Ju;Lee, Min Woo;Seo, Hwa Jeong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.6
    • /
    • pp.167-174
    • /
    • 2022
  • PIPO lightweight block ciphers were announced in ICISC'20. In this paper, a single-block optimization implementation and parallel optimization implementation of PIPO lightweight block cipher ECB, CBC, and CTR operation modes are performed on a 32-bit RISC-V processor. A single block implementation proposes an efficient 8-bit unit of Rlayer function implementation on a 32-bit register. In a parallel implementation, internal alignment of registers for parallel implementation is performed, and a method for four different blocks to perform Rlayer function operations on one register is described. In addition, since it is difficult to apply the parallel implementation technique to the encryption process in the parallel implementation of the CBC operation mode, it is proposed to apply the parallel implementation technique in the decryption process. In parallel implementation of the CTR operation mode, an extended initialization vector is used to propose a register internal alignment omission technique. This paper shows that the parallel implementation technique is applicable to several block cipher operation modes. As a result, it is confirmed that the performance improvement is 1.7 times in a single-block implementation and 1.89 times in a parallel implementation compared to the performance of the existing research implementation that includes the key schedule process in the ECB operation mode.