• Title/Summary/Keyword: Pipeline Processing Structure

Search Result 74, Processing Time 0.031 seconds

Low-power Radix-4 FFT Structure for OFDM using Distributed Arithmetic (Distributed Arithmetic을 사용한 OFDM용 저전력 Radix-4 FFT 구조)

  • Jang Young-Beom;Lee Won-Sang;Kim Do-Han;Kim Bee-Chul;Hur Eun-Sung
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.43 no.1 s.307
    • /
    • pp.101-108
    • /
    • 2006
  • In this paper, an efficient butterfly structure for Radix-4 FFT algorithm using DA(Distributed Arithmetic) is proposed. It is shown that DA can be efficiently used in twiddle factor calculation of the Radix-4 FFT algorithm. The Verilog-HDL coding results for the proposed DA butterfly structure show $61.02\%$ cell area reduction comparison with those of the conventional multiplier butterfly structure. furthermore, the 64-point Radix-4 pipeline structure using the proposed butterfly and delay commutators is compared with other conventional structures. Implementation coding results show $46.1\%$ cell area reduction. Due to its efficient processing scheme, the proposed FFT structure can be widely used in large size of FFT like OFDM Modem.

Design of Image Extraction Hardware for Hand Gesture Vision Recognition

  • Lee, Chang-Yong;Kwon, So-Young;Kim, Young-Hyung;Lee, Yong-Hwan
    • Journal of Advanced Information Technology and Convergence
    • /
    • v.10 no.1
    • /
    • pp.71-83
    • /
    • 2020
  • In this paper, we propose a system that can detect the shape of a hand at high speed using an FPGA. The hand-shape detection system is designed using Verilog HDL, a hardware language that can process in parallel instead of sequentially running C++ because real-time processing is important. There are several methods for hand gesture recognition, but the image processing method is used. Since the human eye is sensitive to brightness, the YCbCr color model was selected among various color expression methods to obtain a result that is less affected by lighting. For the CbCr elements, only the components corresponding to the skin color are filtered out from the input image by utilizing the restriction conditions. In order to increase the speed of object recognition, a median filter that removes noise present in the input image is used, and this filter is designed to allow comparison of values and extraction of intermediate values at the same time to reduce the amount of computation. For parallel processing, it is designed to locate the centerline of the hand during scanning and sorting the stored data. The line with the highest count is selected as the center line of the hand, and the size of the hand is determined based on the count, and the hand and arm parts are separated. The designed hardware circuit satisfied the target operating frequency and the number of gates.

Design and Implementation of Real-time Moving Picture Encoder Based on the Fractal Algorithm (프랙탈 알고리즘 기반의 실시간 영상 부호화기의 설계 및 구현)

  • Kim, Jae-Chul;Choi, In-Kyu
    • The KIPS Transactions:PartB
    • /
    • v.9B no.6
    • /
    • pp.715-726
    • /
    • 2002
  • In this paper, we construct real-time moving picture encoder based on fractal theory by using general purpose digital signal processors. The constructed encoder is implemented using two fixed-point general DSPs (ADSP2181) and performs image encoding by three stage pipeline structure. In the first pipeline stage, the image grabber acquires image data from NTSC standard image signals and stores digital image into frame memory. In the second stage, the main controller encode image dada using fractal algorithm. The last stage, output controller perform Huffman coding and result the coded data via RS422 port. The performance tests of the constructed encoder shows over 10 frames/sec encoding speed for QCIF data when all the frames are encoded. When we encode the images using the interframe and redundency based on the proposed algorithms, encoding speed increased over 30 frames/sec in average.

Design and Verification of Pipelined Face Detection Hardware (파이프라인 구조의 얼굴 검출 하드웨어 설계 및 검증)

  • Kim, Shin-Ho;Jeong, Yong-Jin
    • Journal of Korea Multimedia Society
    • /
    • v.15 no.10
    • /
    • pp.1247-1256
    • /
    • 2012
  • There are many filter based image processing algorithms and they usually require a huge amount of computations and memory accesses making it hard to attain a real-time performance, expecially in embedded applications. In this paper, we propose a pipelined hardware structure of the filter based face detection algorithm to show that the real time performance can be achieved by hardware design. In our design, the whole computation is divided into three pipeline stages: resizing the image (Resize), Transforming the image (ICT), and finding candidate area (Find Candidate). Each stage is optimized by considering the parallelism of the computation to reduce the number of cycles and utilizing the line memory to minimize the memory accesses. The resulting hardware uses 507 KB internal SRAM and occupies 9,039 LUTs when synthesized and configured on Xilinx Virtex5LX330 FPGA. It can operate at maximum 165MHz clock, giving the performance of 108 frame/sec, while detecting up to 20 faces.

Semantic Role Labeling using Biaffine Average Attention Model (Biaffine Average Attention 모델을 이용한 의미역 결정)

  • Nam, Chung-Hyeon;Jang, Kyung-Sik
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.5
    • /
    • pp.662-667
    • /
    • 2022
  • Semantic role labeling task(SRL) is to extract predicate and arguments such as agent, patient, place, time. In the previously SRL task studies, a pipeline method extracting linguistic features of sentence has been proposed, but in this method, errors of each extraction work in the pipeline affect semantic role labeling performance. Therefore, methods using End-to-End neural network model have recently been proposed. In this paper, we propose a neural network model using the Biaffine Average Attention model for SRL task. The proposed model consists of a structure that can focus on the entire sentence information regardless of the distance between the predicate in the sentence and the arguments, instead of LSTM model that uses the surrounding information for prediction of a specific token proposed in the previous studies. For evaluation, we used F1 scores to compare two models based BERT model that proposed in existing studies using F1 scores, and found that 76.21% performance was higher than comparison models.

A Low-power DIF Radix-4 FFT Processor for OFDM Systems Using CORDIC Algorithm (CORDIC을 이용한 OFDM용 저전력 DIF Radix-4 FFT 프로세서)

  • Jang, Young-Beom;Choi, Dong-Kyu;Kim, Do-Han
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.45 no.3
    • /
    • pp.103-110
    • /
    • 2008
  • In this paper, an efficient butterfly structure for 8K/2K-Point Radix-4 FFT algorithm using CORDIC(coordinate rotation digital computer) is proposed. It is shown that CORDIC can be efficiently used in twiddle factor calculation of the Radix-4 FFT algorithm. The Verilog-HDL coding results for the proposed CORDIC butterfly structure show 36.9% cell area reduction comparison with those of the conventional multiplier butterfly structure. Furthermore, the 8K/2K-point Radix-4 pipeline structure using the proposed butterfly and delay commutators is compared with other conventional structures. Implementation coding results show 11.6% cell area reduction. Due to its efficient processing scheme, the proposed FFT structure can be widely used in large size of FFT like OFDM Modem.

The Design of 10-bit 200MS/s CMOS Parallel Pipeline A/D Converter (10-비트 200MS/s CMOS 병렬 파이프라인 아날로그/디지털 변환기의 설계)

  • Chung, Kang-Min
    • The KIPS Transactions:PartA
    • /
    • v.11A no.2
    • /
    • pp.195-202
    • /
    • 2004
  • This paper introduces the design or parallel Pipeline high-speed analog-to-digital converter(ADC) for the high-resolution video applications which require very precise sampling. The overall architecture of the ADC consists of 4-channel parallel time-interleaved 10-bit pipeline ADC structure a]lowing 200MSample/s sampling speed which corresponds to 4-times improvement in sampling speed per channel. Key building blocks are composed of the front-end sample-and-hold amplifier(SHA), the dynamic comparator and the 2-stage full differential operational amplifier. The 1-bit DAC, comparator and gain-2 amplifier are used internally in each stage and they were integrated into single switched capacitor architecture allowing high speed operation as well as low power consumption. In this work, the gain of operational amplifier was enhanced significantly using negative resistance element. In the ADC, a delay line Is designed for each stage using D-flip flops to align the bit signals and minimize the timing error in the conversion. The converter has the power dissipation of 280㎽ at 3.3V power supply. Measured performance includes DNL and INL of +0.7/-0.6LSB, +0.9/-0.3LSB.

A Study on Performance Improvement of Mobile Rake Finger for Multirate (Multirate를 위한 이동국 Rake Finger의 성능 개선에 관한 연구)

  • Kim, Jong-Youb;Lee, Seon-Keun;Park, Hyoung-Keun;Park, Hwan-Young
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.38 no.12
    • /
    • pp.66-74
    • /
    • 2001
  • In this paper, we proposed the new structure of the Rake Finger using Walsh Switch, the shared accumulator, and the pipeline FWHT(Fast Walsh Hadamard Transform) algorithm for reducing the signal processing complexity resulting from the increase of the number of data correlators. The function simulation of the proposed architecture is performed by Synopsys tool and the timing simulation is performed by Compass tool. The number of computational operation in the proposed data correlators is 160 additions and the conventional ones is 512 additions when the number of walsh code channels is 4. As a result, it is reduced about 3.2 times other than the number of computational operation of the conventional ones. Also, the result shows that the data processing time of the proposed Rake Finger architecture is 90,496[ns] and the conventional ones is 110,696[ns]. It is 18.3% faster than the data processing time of the conventional Rake Finger architecture.

  • PDF

On the Conceptual Design of the SIMD Vector Machine Attachable to SISD Machine (SISD 머신에 부착 가능한 SIMD 벡터 머신의 개념적 설계)

  • Cho Young-Il;Ko Young-Woong
    • The KIPS Transactions:PartA
    • /
    • v.12A no.3 s.93
    • /
    • pp.263-272
    • /
    • 2005
  • The addressing mode for data is performed by the software in yon Neumann-concept(SISD) computer a priori without hardware design of an address counter for operands. Therefore, in the addressing mode for the vector the corresponding variables as much as the number of the elements should be specified and used also in the software method. This is because not for operand but only for an instructions, quasi PC(program counter) is designed in hardware physically. A vector has a characteristic of a structural dimension. In this paper we propose to design a hardware unit physically external to the CPU for addressing only the elements of a vector unit with the structure and dimension. Because of the high speed performance for a vector processing it should be designed in the SIMD pipeline mechanics. The proposed mechanics is evaluated through a simulation. Our result shows $12\%$ to $30\%$ performance enhancement over CRAY architecture under the same hardware consideration(processing unit).

An Implementation of the $5\times5$ CNN Hardware and the Pre.Post Processor ($5\times5$ CNN 하드웨어 및 전.후 처리기 구현)

  • Kim Seung-Soo;Jeon Heung-Woo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.5
    • /
    • pp.865-870
    • /
    • 2006
  • The cellular neural networks have shown a vast computing power for the image processing in spite of the simplicity of its structure. However, it is impossible to implement the CNN hardware which would require the same enormous amount of cells as that of the pixels involved in the practical large image. In this parer, the $5\times5$ CNN hardware and the pre post processor which can be used for processing the real large image with a time-multiplexing scheme are implemented. The implemented $5\times5$ CNN hardware and pre post processor is applied to the edge detection of $256\times256$ lena image to evaluate the performance. The total number of block. By the time-multiplexing process is about 4,000 blocks and to control pulses are needed to perform the pipelined operation or the each block. By the experimental resorts, the implemented $5\times5$ CNN hardware and pre post processor can be used to the real large image processing.