• Title/Summary/Keyword: 8-bit parallel processing

Search Result 45, Processing Time 0.028 seconds

Parallel Implementation of Distributed Sample Scrambler (분산표본혼화기의 병렬구현)

  • 정헌주;김재형정성현박승철
    • Proceedings of the IEEK Conference
    • /
    • 1998.06a
    • /
    • pp.62-65
    • /
    • 1998
  • This paper presents a method and implementation of the parallel distributed sample scrambler(DSS) in the cell-based ATM transmission environment. In the serial processing, it requires very high speed clock because the processing clock of the serial DSS is equal with the data transmission speed. In this paper, we develop a conversion method of the serial SRG(shift register generator) to 8bit parallel realization. In this case, it has a sample data processing problem which is a character of DSS. So, a theory of correction time movement is presented to solve this problem. We has developed a ASIC using this algorithm and verified the recommendation of ITU-T, I.432.

  • PDF

Acceleration Method of Inter Prediction using Advanced SIMD (Advanced SIMD를 이용한 화면 간 예측 고속화방법)

  • Kim, Wan-Su;Lee, Jae-Heung
    • Journal of IKEEE
    • /
    • v.16 no.4
    • /
    • pp.382-388
    • /
    • 2012
  • An H.264/AVC fast motion estimation methodology is presented in this paper. Advanced SIMD based NEON which is one of the parallel processing methods is supported under the ARM Cortex-A9 dual-core platform. NEON is applied to a full search technique with one of the various motion estimation methods and SAD operation count of each macroblock is reduced to 1/4. Pixel values of the corresponding macroblock are assigned to eight 16-bit NEON registers and Intrinsic function in NEON architecture carried out 128 bits arithmetic operations at the same time. In this way, the exact motion vector with the minimum SAD value among the calculated SAD values can be designated. Experimental results show that performance gets improved 30% above average in accordance with the size of image and macroblock.

Parallel BCH Encoding/decoding Method and VLSI Design for Nonvolatile Memory (비휘발성 메모리를 위한 병렬 BCH 인코딩/디코딩 방법 및 VLSI 설계)

  • Lee, Sang-Hyuk;Baek, Kwang-Hyun
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.5
    • /
    • pp.41-47
    • /
    • 2010
  • This paper has proposed parallel BCH, one of error correction coding methods which has been used to NAND flash memory for SSD(solid state disk). To alter error correction capability, the proposed design improved reliability on data block has higher error rate as used frequency increasingly. Decoding parallel process bit width is as two times as encoding parallel process bit width, that could reduce decoding processing time, accordingly resulting in one half reduction over conventional ECC.

Color Media Instructions for Embedded Parallel Processors (임베디드 병렬 프로세서를 위한 칼라미디어 명령어 구현)

  • Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.7
    • /
    • pp.305-317
    • /
    • 2008
  • As a mobile computing environment is rapidly changing, increasing user demand for multimedia-over-wireless capabilities on embedded processors places constraints on performance, power, and sire. In this regard, this paper proposes color media instructions (CMI) for single instruction, multiple data (SIMD) parallel processors to meet the computational requirements and cost goals. While existing multimedia extensions store and process 48-bit pixels in a 32-bit register, CMI, which considers that color components are perceptually less significant, supports parallel operations on two-packed compressed 16-bit YCbCr (6 bit Y and 5 bits Cb, Cr) data in a 32-bit datapath processor. This provides greater concurrency and efficiency for YCbCr data processing. Moreover, the ability to reduce data format size reduces system cost. The reduction in data bandwidth also simplifies system design. Experimental results on a representative SIMD parallel processor architecture show that CMI achieves an average speedup of 6.3x over the baseline SIMD parallel processor performance. This is in contrast to MMX (a representative Intel's multimedia extensions), which achieves an average speedup of only 3.7x over the same baseline SIMD architecture. CMI also outperforms MMX in both area efficiency (a 52% increase versus a 13% increase) and energy efficiency (a 50% increase versus an 11% increase). CMI improves the performance and efficiency with a mere 3% increase in the system area and a 5% increase in the system power, while MMX requires a 14% increase in the system area and a 16% increase in the system power.

A memory management scheme for parallel viterbi algorithm with multiple add-compare-select modules (다중의 Add-compare-select 모듈을 갖는 병렬 비터비 알고리즘의 메모리 관리 방법)

  • 지현순;박동선;송상섭
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.21 no.8
    • /
    • pp.2077-2089
    • /
    • 1996
  • In this paper, a memory organization and its control method are proposed for the implementation of parallel Virterbi decoders. The design is mainly focused on lowering the hardware complexity of a parallel Viterbi decoder which is to reduce the decoding speed. The memories requeired in a Viterbi decoder are the SMM(State Metric Memory) and the TBM(Traceback Memory);the SMM for storing the path metrics of states and the TBM for storing the survial path information. A general parallel Viterbi decoder for high datarate usually consists of multiple ACS (Add-Compare-Select) units and their corresponding memeory modules.for parallel ACS units, SMMs and TBMs are partitioned into smaller independent pairs of memory modules which are separately interleaved to provide the maximum processing speed. In this design SMMs are controlled with addrss generators which can simultaneously compute addresses of the new path metrics. A bit shuffle technique is employed to provide a parallel access to the TBMs to store the survivor path informations from multiple ACS modules.

  • PDF

Optimized Implementation of Block Cipher PIPO in Parallel-Way on 64-bit ARM Processors (64-bit ARM 프로세서 상에서의 블록암호 PIPO 병렬 최적 구현)

  • Eum, Si Woo;Kwon, Hyeok Dong;Kim, Hyun Jun;Jang, Kyoung Bae;Kim, Hyun Ji;Park, Jae Hoon;Song, Gyeung Ju;Sim, Min Joo;Seo, Hwa Jeong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.10 no.8
    • /
    • pp.223-230
    • /
    • 2021
  • The lightweight block cipher PIPO announced at ICISC'20 has been effectively implemented by applying the bit slice technique. In this paper, we propose a parallel optimal implementation of PIPO for ARM processors. The proposed implementation enables parallel encryption of 8-plaintexts and 16-plaintexts. The implementation targets the A10x fusion processor. On the target processor, the existing reference PIPO code has performance of 34.6 cpb and 44.7 cpb in 64/128 and 64/256 standards. Among the proposed methods, the general implementation has a performance of 12.0 cpb and 15.6 cpb in the 8-plaintexts 64/128 and 64/256 standards, and 6.3 cpb and 8.1 cpb in the 16-plaintexts 64/128 and 64/256 standards. Compared to the existing reference code implementation, the 8-plaintexts parallel implementation for each standard has about 65.3%, 66.4%, and the 16-plaintexts parallel implementation, about 81.8%, and 82.1% better performance. The register minimum alignment implementation shows performance of 8.2 cpb and 10.2 cpb in the 8-plaintexts 64/128 and 64/256 specifications, and 3.9 cpb and 4.8 cpb in the 16-plaintexts 64/128 and 64/256 specifications. Compared to the existing reference code implementation, the 8-plaintexts parallel implementation has improved performance by about 76.3% and 77.2%, and the 16-plaintext parallel implementation is about 88.7% and 89.3% higher for each standard.

Development of Curved-Glass Automatic Shaping System using PID Servo-Drivers (PID 서보제어기를 이용한 곡면유리 자동성형 시스템 개발)

  • 유병국;양근호
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2003.06a
    • /
    • pp.161-164
    • /
    • 2003
  • This research presents the parallel control scheme of PID servo-driver for shaping of the curved glass. The designed system consists of a PC, main controller and 11 servo-drivers. Each elements are connected by using RS-232C and 8-bit bus communication. In order to guarantee the stability and the control performance, we use the LM629, a precision PID motion controller, and LMD18200, a H-bridge on the servo-drivers. PC calculates position values of 11 DC motors by using the pre-determined curvature value and offers the user interface environment operator.

  • PDF

From WiFi to WiMAX: Efficient GPU-based Parameterized Transceiver across Different OFDM Protocols

  • Li, Rongchun;Dou, Yong;Zhou, Jie;Li, Baofeng;Xu, Jinbo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.8
    • /
    • pp.1911-1932
    • /
    • 2013
  • Orthogonal frequency-division multiplexing (OFDM) has become a popular modulation scheme for wireless protocols because of its spectral efficiency and robustness against multipath interference. Although the components of various OFDM protocols are functionally similar, they remain distinct because of the characteristics of the environment. Recently, graphics processing units (GPUs) have been used to accelerate the signal processing of the physical layer (PHY) because of their great computational power, high development efficiency, and flexibility. In this paper, we describe the implementation of parameterized baseband modules using GPUs for two different OFDM protocols, namely, 802.11a and 802.16. First, we introduce various modules in the modulator/demodulator parts of the transmitter and receiver and analyze the computational complexity of each module. We then describe the integration of the GPU-based baseband modules of the two protocols using the parameterized method. GPU-based implementations are addressed to explain how to accelerate the baseband processing to archive real-time throughput. Finally, the performance results of each signal processing module are evaluated and analyzed. The experiments show that the GPU-based 802.11a and 802.16 PHY meet the real-time requirement and demonstrate good bit error ratio (BER) performance. The performance comparison indicates that our GPU-based implemented modules have better flexibility and throughput to the current ones.

A Memory Intensive Real-time 3x3 Neighborhood processor for Image Processing (Memory Intensive 실시간 영상신호처리용 3 $\times$ 3 Neighborhood VLSI 처리기)

  • 김진홍;남철우;우성일;김용태
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.27 no.6
    • /
    • pp.963-971
    • /
    • 1990
  • This paper proposes a memory intensive VLSI architecture for the realization of real-time 3x3 neighborhood processor based on the distributed arithmetic. The proposed architecture is characterized by a bit serial and multi-kernel parallel processing which exploits the pixel kernel parallelism and concurrency. The chip implements 8 neighborhood processing elements in parallel with efficirnt input and output modules which operate concurrently. Besides the a4chitectural design of a neighborhood processor, the design methodology using module generator concept has been considered and MOGOT(MOdule Generator Oriented VLSI design Tool) has been constructed based on the workstation. Based on these design environments MOGOT, it has been shown that the main part of the suggested architecture can be designed efficiently using 2\ulcorner double metal CMOS technology. It includes design of input delay and data conversion module, look-up table for inner product operation, carry save accumulator, output data converter and delay module, and control module.

  • PDF

High-speed Design of 8-bit Architecture of AES Encryption (AES 암호 알고리즘을 위한 고속 8-비트 구조 설계)

  • Lee, Je-Hoon;Lim, Duk-Gyu
    • Convergence Security Journal
    • /
    • v.17 no.2
    • /
    • pp.15-22
    • /
    • 2017
  • This paper presents new 8-bit implementation of AES. Most typical 8-bit AES designs are to reduce the circuit area by sacrificing its throughput. The presented AES architecture employs two separated S-box to perform round operation and key generation in parallel. From the simulation results of the proposed AES-128, the maximum critical path delay is 13.0ns. It can be operated in 77MHz and the throughput is 15.2 Mbps. Consequently, the throughput of the proposed AES has 1.54 times higher throughput than the other counterpart although the area increasement is limited in 1.17 times. The proposed AES design enables very low-area design without sacrificing its performance. Thereby, it can be suitable for the various IoT applications that need high speed communication.