• Title/Summary/Keyword: 덧셈기

Search Result 164, Processing Time 0.026 seconds

Efficient Frame Synchronizer Architecture Using Common Autocorrelator for DVB-S2 (공통 자기 상관기를 이용한 효율적인 디지털 위성 방송 프레임 동기부 회로 구조)

  • Choi, Jin-Kyu;SunWoo, Myng-Hoon;Kim, Pan-Soo;Chang, Dae-Ig
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.4
    • /
    • pp.64-71
    • /
    • 2009
  • This paper presents an efficient frame synchronizer architecture using the common autocorrelator for Digital Video Broadcasting via Satellite, Second generation(DVB-S2). To achieve the satisfactory performance under severe channel conditions and the efficient hardware resource utilization of functional synchronization blocks which have been implemented, we propose a new efficient common autocorrelator structure. The proposed architecture can improve the performance of the frame and frequency synchronizer since each block operates jointly in parallel and significantly reduce the complexity of the frame synchronizer. Hence, The proposed architecture can ensure the decrease by about 92% multipliers and 81% adders compared with the direct implementation. Moreover, it has been thoroughly verified with an FPGA board and R&STM SFU broadcast test equipment and consists of 29,821 LUTs with XilinxTM Virtex IV LX200.

Performance of the CDMA Receiver with PN Sequence Orthogonal Reception Process (PN 부호의 직교 수신 방식을 이용한 CDMA 수신기 성능)

  • Hyun, Kwang-Min;Yoon, Dong-Weon;Park, Sang-Kyu
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.28 no.4A
    • /
    • pp.200-207
    • /
    • 2003
  • This paper proposes a CDMA receiver structure with time-shifted m-sequence orthogonal reception process, and analyzes the output SNR performance and the characteristics of the orthogonal receiver. This structure can be simply implemented with the converntional receiver adding an additional integrator path in parallel and an adder sums the conventional path and the new path output signals. The structure provides to reference user signal not only increment of signal component but also perfect orthogonal characteristic, canceling the accumulated cross-correlated value out to zero between the reference user and other user signals. Hence, the proposed structure can be applied for channel impulse response measurement, and used for multi-user interference signal cancellation and channel capacity increment by flexible structural inter-working operation of the added path, connection or disconnection, to conventional receiver structure.

The Efficient 32×32 Inverse Transform Design for High Performance HEVC Decoder (고성능 HEVC 복호기를 위한 효율적인 32×32 역변환기 설계)

  • Han, Geumhee;Ryoo, Kwangki
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.4
    • /
    • pp.953-958
    • /
    • 2013
  • In this paper, an efficient hardware architecture is proposed for $32{\times}32$ inverse transform HEVC decoder. HEVC is a new image compression standard to deal with much larger image sizes compared with conventional image codecs, such as 4k, 8k images. To process huge image data effectively, it adopts various new block structures. Theses blocks consists of $4{\times}4$, $8{\times}8$, $16{\times}16$, and $32{\times}32$ block. This paper suggests an effective structures to process $32{\times}32$ inverse transform. This structure of inverse transform adopts the decomposed $16{\times}16$ matrixes of $32{\times}32$ matrix, and simplified the operations by implementing multiplying with shifters and adders. Additionally the operations frequency is downed by using multicycle paths. Also this structure can be easily adopted to a multi-size transform or a forward transform block in HEVC codec.

Design of Radix-4 FFT Processor Using Twice Perfect Shuffle (이중 완전 Shuffle을 이용한 Radix-4 FFT 프로세서의 설계)

  • Hwang, Myoung-Ha;Hwang, Ho-Jung
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.27 no.2
    • /
    • pp.144-150
    • /
    • 1990
  • This paper describes radix-4 Fast Fourier Transform (FFT) Processor designed with the new twice perfect shuffle developed from a perfect shuffle used in radix-2 FFT algorithm. The FFT Processor consists of a butterfly arithmetic circuit, address generators for input, output and coefficient, input and output registers and controller. Also, it requires the external ROM for storage of coefficient and RAM for input and output. The butterfly circuit includes 12 bit-serial ($16{\times}8$) multipliers, adders, subtractors and delay shift registers. Operating on 25 MHz two phase clock, this processor can compute 256 point FFT in 6168 clocks, i.e. 247 us and provides flexibility by allowing the user to select any size among 4,16,64,and256points. Being fabricated with 2-um double metal CMOS process, it includes about 28000 transistors and 55 pads in $8.0{\times}8.2mm^2$area.

  • PDF

Compact CNN Accelerator Chip Design with Optimized MAC And Pooling Layers (MAC과 Pooling Layer을 최적화시킨 소형 CNN 가속기 칩)

  • Son, Hyun-Wook;Lee, Dong-Yeong;Kim, HyungWon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.9
    • /
    • pp.1158-1165
    • /
    • 2021
  • This paper proposes a CNN accelerator which is optimized Pooling layer operation incorporated in Multiplication And Accumulation(MAC) to reduce the memory size. For optimizing memory and data path circuit, the quantized 8bit integer weights are used instead of 32bit floating-point weights for pre-training of MNIST data set. To reduce chip area, the proposed CNN model is reduced by a convolutional layer, a 4*4 Max Pooling, and two fully connected layers. And all the operations use specific MAC with approximation adders and multipliers. 94% of internal memory size reduction is achieved by simultaneously performing the convolution and the pooling operation in the proposed architecture. The proposed accelerator chip is designed by using TSMC65nmGP CMOS process. That has about half size of our previous paper, 0.8*0.9 = 0.72mm2. The presented CNN accelerator chip achieves 94% accuracy and 77us inference time per an MNIST image.

A Efficient Architecture of MBA-based Parallel MAC for High-Speed Digital Signal Processing (고속 디지털 신호처리를 위한 MBA기반 병렬 MAC의 효율적인 구조)

  • 서영호;김동욱
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.7
    • /
    • pp.53-61
    • /
    • 2004
  • In this paper, we proposed a new architecture of MAC(Multiplier-Accumulator) to operate high-speed multiplication-accumulation. We used the MBA(Modified radix-4 Booth Algorithm) which is based on the 1's complement number system, and CSA(Carry Save Adder) for addition of the partial products. During the addition of the partial product, the signed numbers with the 1's complement type after Booth encoding are converted in the 2's complement signed number in the CSA tree. Since 2-bit CLA(Carry Look-ahead Adder) was used in adding the lower bits of the partial product, the input bit width of the final adder and whole delay of the critical path were reduced. The proposed MAC was applied into the DWT(Discrete Wavelet Transform) filtering operation for JPEG2000, and it showed the possibility for the practical application. Finally we identified the improved performance according to the comparison with the previous architecture in the aspect of hardware resource and delay.

Design of an Efficient Coarse Frequency Estimator Using a Serial Correlator for DVB-S2 (직렬 상관기를 이용한 디지털 위성방송 주파수 추정회로 설계)

  • Yun, Hyoung-Jin;SunWoo, Myung-Hoon
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.33 no.4A
    • /
    • pp.434-439
    • /
    • 2008
  • This paper proposes an efficient coarse frequency synchronizer for digital video broadcasting - second generation (DVB-S2). The input signal requirement of acquisition range for coarse frequency estimator in the DVB-S2 is around ${\pm}1.5625Mhz$, which corresponds to 6.25% of the symbol rate at 25Mbaud. At the process of analyzing the robust algorithm among data-aided approaches, we find that the Luise & Reggiannini (L&R) algorithm is the most promising one for coarse frequency estimation with respect to estimation performance and complexity. However, it requires many multipliers and adders to compute output values of correlators. We propose an efficient architecture identifying the serial correlator with the buffer and multiplexers. The proposed coarse frequency synchronizer can reduce the hardware complexity about 92% of the direct implementation. The proposed architecture has been implemented and verified on the Xilinx Virtex II FPGA.

Efficient Frame Synchronization Detector and Low Complexity Automatic Gain Controller for DVB-S2 (효율적인 디지털 위성 방송 프레임 동기 검출 회로 및 낮은 복잡도의 자동 이득 제어 회로)

  • Choi, Jin-Kyu;Sunwoo, Myung-Hoon;Kim, Pan-Soo;Chang, Dae-Ig
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.2
    • /
    • pp.31-37
    • /
    • 2009
  • This paper presents an efficient frame synchronization strategy with the identification of modulation type for Digital Video Broadcasting-Satellite second generation (DVB-S2). To detect the Start Of Frame (SOF) and identify a modulation mode at low SNR, we propose a new correlator structure and a low complexity Automatic Gain Controller (AGC). The proposed frame synchronization architecture can reduce about 93% multipliers and 89% adders compared with the direct implementation of the Differential - Generalized Post Detection Integration (D-GPDI) algorithm which is very complex and the proposed a low complexity AGC consists of only 5 multipliers and 3 adders. The proposed architecture has been thoroughly verified on the Xilinx Virtex II FPGA board.

Design and Implementation of Lok-up Table for Pre-scaling in Very-High Radix Divider (높은 자릿수 나눗셈 연산기에서의 영역변환상수를 위한 검색테이블 설계 및 구현)

  • 이병석;송문식;이정아
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1999.10c
    • /
    • pp.3-5
    • /
    • 1999
  • 나눗셈 알고리즘은 다른 덧셈이나 곱셈 알고리즘에 비해 복잡하고, 수행 빈도수가 적다는 이유로 그동안 고속 나눗셈의 하드웨어 연구는 활발하지 않았다. 그러나 멀티미디어의 발전 및 고성능의 그래픽 랜더링을 위한 보다 빠른 부동소수점연산기(FPU)가 필요하게 되었으며, 이에 따라서 고속의 나눗셈 연산기의 필요성이 증가하게 되었다. 특히, 전체의 수행 시간 향상을 위해서라도 고속 나눗셈 연산기의 중용성은 더욱 부각되고 있다. 그러나 고속 나눗셈 연산기는 연산 속도와 크기라는 서로 상반되는 요소를 가지고 있다. 즉, 연산 속도가 빠르면 크기는 늘어나고, 크기를 줄이면 연산 속도는 늦어지게 된다. 본 논문은 높은 자릿수(Very-High Radix) 나눗셈 알고리즘에서 영역변환상수를 구하는 방법으로 연산이 아닌 검색테이블(Look-up Table)을 이용한다. 그리고 검색테이블의 크기를 줄이는 방법으로 영역변환상수의 범위 분석 및 캐리 저장형을 이용한 검색테이블 분할 방법을 이용하였다. 전체적으로는 영역변환상수를 구하는 연산주기가 필요없게 되므로 나눗셈 연산기의 영역 크기의 변화가 적으면서 연산 속도는 빨라졌음을 알 수 있다.

  • PDF

8B/10B Encoder Design by Coding Table Reduction (코딩테이블 축소방법에 의한 8B/10B 인코더 설계)

  • Shin, Beom-Seok;Kim, Yong-Woo;Yoon, Kwang-Sub;Kang, Jin-Ku
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.4
    • /
    • pp.43-48
    • /
    • 2008
  • This paper presents a design of 8B/10B encoder by the coding table reduction. The proposed encoder has reduced coding table modified disparity control block. Logic simulation and synthesis have been done for the proposed design. After synthesized using Magna CMOS $0.18{\mu}m$ process, the proposed design achieved the operating frequency of 343MHz and chip area of $1886{\mu}m^2$.