• Title/Summary/Keyword: 곱셈적 구조

Search Result 229, Processing Time 0.024 seconds

Efficient Frame Synchronization Detector and Low Complexity Automatic Gain Controller for DVB-S2 (효율적인 디지털 위성 방송 프레임 동기 검출 회로 및 낮은 복잡도의 자동 이득 제어 회로)

  • Choi, Jin-Kyu;Sunwoo, Myung-Hoon;Kim, Pan-Soo;Chang, Dae-Ig
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.2
    • /
    • pp.31-37
    • /
    • 2009
  • This paper presents an efficient frame synchronization strategy with the identification of modulation type for Digital Video Broadcasting-Satellite second generation (DVB-S2). To detect the Start Of Frame (SOF) and identify a modulation mode at low SNR, we propose a new correlator structure and a low complexity Automatic Gain Controller (AGC). The proposed frame synchronization architecture can reduce about 93% multipliers and 89% adders compared with the direct implementation of the Differential - Generalized Post Detection Integration (D-GPDI) algorithm which is very complex and the proposed a low complexity AGC consists of only 5 multipliers and 3 adders. The proposed architecture has been thoroughly verified on the Xilinx Virtex II FPGA board.

Design and Implementation of a Low-Complexity and High-Throughput MIMO Symbol Detector Supporting up to 256 QAM (256 QAM까지 지원 가능한 저 복잡도 고 성능의 MIMO 심볼 검파기의 설계 및 구현)

  • Lee, Gwang-Ho;Kim, Tae-Hwan
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.6
    • /
    • pp.34-42
    • /
    • 2014
  • This paper presents a low-complexity and high-throughput symbol detector for two-spatial-stream multiple-input multiple-output systems based on the modified maximum-likelihood symbol detection algorithm. In the proposed symbol detector, the cost function is calculated incrementally employing a multi-cycle architecture so as to eliminate the complex multiplications for each symbol, and the slicing operations are performed hierarchically according to the range of constellation points by a pipelined architecture. The proposed architecture exhibits low hardware complexity while supporting complicated modulations such as 256 QAM. In addition, various modulations and antenna configurations are supported flexibly by reconfiguring the pipeline for the slicing operation. The proposed symbol detector is implemented with 38.7K logic gates in a $0.11-{\mu}m$ CMOS process and its throughput is 166 Mbps for $2{\times}$3 16-QAM and 80Mbps for $2{\times}3$ 64-QAM where the operating frequency is 478 MHz.

Design of Variable Average Operation without the Divider for Various Image Sizes (다양한 영상크기에 적합한 나눗셈기를 사용하지 않은 가변적 평균기의 설계)

  • Yang, Jeong-Ju;Jeong, Hyo-Won;Lee, Sung-Mok;Choi, Won-Tae;Kang, Bong-Soon
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.10 no.4
    • /
    • pp.267-273
    • /
    • 2009
  • In this paper, we proposed a variable average operation for a WDR(Wide Dynamic Range). The previously proposed average operation [5] improves hardware efficiency and complexity by replacing divider with multiplier. However, the previously proposed method has some weak-points. For example, there are counting horizontal and vertical length, and then the multiplier selects a Mode set by the user when the lengths exactly correspond with the image's size in the Mode. To compensate some weak-points, we change a Mode selection methods as a using the image's total size. Also, we propose another feature that it can be applied to various image sizes. To get a more accurate average, we add an external compensation value. We design the variable average operation using a Verilog-HDL and confirm that the Serial Multiplier's structure is better efficiency than Split Multiplier's structure.

  • PDF

Efficient IFFT Design Using Mapping Method (Mapping 기법을 이용한 효율적인 IFFT 설계)

  • Jang, In-Gul;Kim, Yong-Eun;Chung, Jin-Gyun
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.44 no.11
    • /
    • pp.11-18
    • /
    • 2007
  • FFT(Fast Fourier Transform) processor is one of the key components in the implementation of OFDM systems such as WiBro, DAB and UWB systems. Most of the researches on the implementation of FFT processors have focused on reducing the complexities of multipliers, memory and control circuits. In this paper, to reduce the memory size required for IFFT(Inverse Fast Fourier Transform), we propose a new IFFT design method based on a mapping method. By simulations, it is shown that the reposed IFFT design method achieves more than 60% area reduction and much SQNR(Signal-to-Quantization-Noise Ratio) gain compared with previous IFFT circuits.

An Analytical Evaluation of 2D Mesh-connected SIMD Architecture for Parallel Matrix Multiplication (2D Mesh SIMD 구조에서의 병렬 행렬 곱셈의 수치적 성능 분석)

  • Kim, Cheong-Ghil
    • Journal of The Institute of Information and Telecommunication Facilities Engineering
    • /
    • v.10 no.1
    • /
    • pp.7-13
    • /
    • 2011
  • Matrix multiplication is a fundamental operation of linear algebra and arises in many areas of science and engineering. This paper introduces an efficient parallel matrix multiplication scheme on N ${\times}$ N mesh-connected SIMD array processor, called multiple hierarchical SIMD architecture (HMSA). The architectural characteristic of HMSA is the hierarchically structured control units which consist of a global control unit, N local control units configured diagonally, and $N^2$ processing elements (PEs) arranged in an N ${\times}$ N array. PEs are communicating through local buses connecting four adjacent neighbor PEs in mesh-torus networks and global buses running across the rows and columns called horizontal buses and vertical buses, respectively. This architecture enables HMSA to have the features of diagonally indexed concurrent broadcast and the accessibility to either rows (row control mode) or columns (column control mode) of 2D array PEs alternately. An algorithmic mapping method is used for performance evaluation by mapping matrix multiplication on the proposed architecture. The asymptotic time complexities of them are evaluated and the result shows that paralle matrix multiplication on HMSA can provide significant performance improvement.

  • PDF

GPGPU Acceleration of SAT Algorithm with Propagation Routine Parallelization (전달 루틴의 병렬화를 통한 SAT 알고리즘의 GPGPU 가속화)

  • Kang, Hyeong-Ju
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.10
    • /
    • pp.1919-1926
    • /
    • 2016
  • Because of the enormous processing ability, General-Purpose Graphics Processing Unit(GPGPU) has been applied to many fields including electronics design automation. The SAT algorithm is one of the core algorithm in many electronics design automation tools. There has been some efforts to apply GPGPU to the SAT algorithm, but it is difficult to parallelize the SAT algorithm because of its characteristics. In this paper, I applied GPGPU to the SAT algorithm by parallelizing the propagation routine that is relatively suitable to parallel processing. On the basis of the similarity of the propagation routine to the sparse matrix multiplication, the data structure for the SAT problem is constituted, and the parallel propagation routine is described. To prevent data loss between paralllel threads, atomic operations are exploited. The experimental results for some benchmark SAT problems show that the proposed algorithm is superior to the previous GPGPU-based SAT solver.

Bit-serial Discrete Wavelet Transform Filter Design (비트 시리얼 이산 웨이블렛 변환 필터 설계)

  • Park Tae geun;Kim Ju young;Noh Jun rye
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.4A
    • /
    • pp.336-344
    • /
    • 2005
  • Discrete Wavelet Transform(DWT) is the oncoming generation of compression technique that has been selected for MPEG4 and JEPG2000, because it has no blocking effects and efficiently determines frequency property of temporary time. In this paper, we propose an efficient bit-serial architecture for the low-power and low-complexity DWT filter, employing two-channel QMF(Qudracture Mirror Filter) PR(Perfect Reconstruction) lattice filter. The filter consists of four lattices(filter length=8) and we determine the quantization bit for the coefficients by the fixed-length PSNR(peak-signal-to-noise ratio) analysis and propose the architecture of the bit-serial multiplier with the fixed coefficient. The CSD encoding for the coefficients is adopted to minimize the number of non-zero bits, thus reduces the hardware complexity. The proposed folded 1D DWT architecture processes the other resolution levels during idle periods by decimations and its efficient scheduling is proposed. The proposed architecture requires only flip-flops and full-adders. The proposed architecture has been designed and verified by VerilogHDL and synthesized by Synopsys Design Compiler with a Hynix 0.35$\mu$m STD cell library. The maximum operating frequency is 200MHz and the throughput is 175Mbps with 16 clock latencies.

A 8192-Point FFT Processor Based on the CORDIC Algorithm for OFDM System (CORDIC 알고리듬에 기반 한 OFDM 시스템용 8192-Point FFT 프로세서)

  • Park, Sang-Yoon;Cho, Nam-Ik
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.8B
    • /
    • pp.787-795
    • /
    • 2002
  • This paper presents the architecture and the implementation of a 2K/4K/8K-point complex Fast Fourier Transform(FFT) processor for Orthogonal Frequency-Division Multiplexing (OFDM) system. The architecture is based on the Cooley-Tukey algorithm for decomposing the long DFT into short length multi-dimensional DFTs. The transposition memory, shuffle memory, and memory mergence method are used for the efficient manipulation of data for multi-dimensional transforms. Booth algorithm and the COordinate Rotation DIgital Computer(CORDIC) processor are employed for the twiddle factor multiplications in each dimension. Also, for the CORDIC processor, a new twiddle factor generation method is proposed to obviate the ROM required for storing the twiddle factors. The overall 2K/4K/8K-FFT processor requires 600,000 gates, and it is implemented in 1.8 V, 0.18 ${\mu}m$ CMOS. The processor can perform 8K-point FFT in every 273 ${\mu}s$, 2K-point every 68.26 ${\mu}s$ at 30MHz, and the SNR is over 48dB, which are enough performances for the OFDM in DVB-T.

Deep Learning-based Real-Time Super-Resolution Architecture Design (경량화된 딥러닝 구조를 이용한 실시간 초고해상도 영상 생성 기술)

  • Ahn, Saehyun;Kang, Suk-Ju
    • Journal of Broadcast Engineering
    • /
    • v.26 no.2
    • /
    • pp.167-174
    • /
    • 2021
  • Recently, deep learning technology is widely used in various computer vision applications, such as object recognition, classification, and image generation. In particular, the deep learning-based super-resolution has been gaining significant performance improvement. Fast super-resolution convolutional neural network (FSRCNN) is a well-known model as a deep learning-based super-resolution algorithm that output image is generated by a deconvolutional layer. In this paper, we propose an FPGA-based convolutional neural networks accelerator that considers parallel computing efficiency. In addition, the proposed method proposes Optimal-FSRCNN, which is modified the structure of FSRCNN. The number of multipliers is compressed by 3.47 times compared to FSRCNN. Moreover, PSNR has similar performance to FSRCNN. We developed a real-time image processing technology that implements on FPGA.

Improvement in efficiency on ID-based Delegation Network (ID 기반 위임 네트워크의 성능 개선방안)

  • Youn, Taek-Young;Jeong, Sang-Tae;Park, Young-Ho
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.17 no.3
    • /
    • pp.17-25
    • /
    • 2007
  • Delegation of signing capability is a common practice in various applications. Mambo et al. proposed a proxy signatures as a solutions for delegation of signing capability. Proxy signatures allow a designated proxy signer to sign on behalf of an original signer. After the concept of proxy signature scheme is proposed, many variants are proposed to support more general delegation setting. To capture all possible delegation structures, the concept of delegation network was proposed by Aura. ID-based cryptography, which is suited for flexible environment, is desirable to construct a delegation network. Chow et al proposed an ID-based delegation network. In the computational point of view, their solution requires E pairing operations and N elliptic curve scalar multiplications where E and N are the number of edges and nodes in a delegation structure, respectively. In this paper, we proposed an efficient ID-based delegation network which requires only E pairing operations. Moreover, we can design a modified delegation network that requires only N pairing operations.