• Title/Summary/Keyword: 곱셈기법

Search Result 120, Processing Time 0.026 seconds

Accelerated Convolution Image Processing by Using Look-Up Table and Overlap Region Buffering Method (Loop-Up Table과 필터 중첩영역 버퍼링 기법을 이용한 컨벌루션 영상처리 고속화)

  • Kim, Hyun-Woo;Kim, Min-Young
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.49 no.4
    • /
    • pp.17-22
    • /
    • 2012
  • Convolution filtering methods have been widely applied to various digital signal processing fields for image blurring, sharpening, edge detection, and noise reduction, etc. According to their application purpose, the filter mask size or shape and the mask value are selected in advance, and the designed filter is applied to input image for the convolution processing. In this paper, we proposed an image processing acceleration method for the convolution processing by using two-dimensional Look-up table (LUT) and overlap-region buffering technique. First, based on the fixed convolution mask value, the multiplication operation between 8 or 10 bit pixel values of the input image and the filter mask values is performed a priori, and the results memorized in LUT are referred during the convolution process. Second, based on symmetric structural characteristics of the convolution filters, inherent duplicated operation region is analysed, and the saved operation results in one step before in the predefined memory buffer is recalled and reused in current operation step. Through this buffering, unnecessary repeated filter operation on the same regions is minimized in sequential manner. As the proposed algorithms minimize the computational amount needed for the convolution operation, they work well under the operation environments utilizing embedded systems with limited computational resources or the environments of utilizing general personnel computers. A series of experiments under various situations verifies the effectiveness and usefulness of the proposed methods.

Design and Performance Evaluation of Selective DFT Spreading Method for PAPR Reduction in Uplink OFDMA System (OFDMA 상향 링크 시스템에서 PAPR 저감을 위한 선택적 DFT Spreading 기법의 설계와 성능 평가)

  • Kim, Sang-Woo;Ryu, Heung-Gyoon
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.18 no.3 s.118
    • /
    • pp.248-256
    • /
    • 2007
  • In this paper, we propose a selective DFT spreading method to solve a high PAPR problem in uplink OFDMA system. A selective characteristic is added to the DFT spreading, so the DFT spreading method is mixed with SLM method. However, to minimize increment of computational complexity, differently with common SLM method, our proposed method uses only one DFT spreading block. After DFT, several copy branches are generated by multiplying with each different matrix. This matrix is obtained by linear transforming the each phase rotation in front of DFT block. And it has very lower computational complexity than one DFT process. For simulation, we suppose that the 512 point IFFT is used, the number of effective sub-carrier is 300, the number of allowed sub-carrier to each user's is 1/4 and 1/3 and QPSK modulation is used. From the simulation result, when the number of copy branch is 4, our proposed method has more than about 5.2 dB PAPR reduction effect. It is about 1.8 dB better than common DFT spreading method and 0.95 dB better than common SLM which uses 32 copy branches. And also, when the number of copy branch is 2, it is better than SLM using 32 copy branches. From the comparison, the proposed method has 91.79 % lower complexity than SLM using 32 copy branches in similar PAPR reduction performance. So, we can find a very good performance of our proposed method. Also, we can expect the similar performance when all number of sub-carrier is allocated to one user like the OFDM.

An Implementation of Low Power MAC using Improvement of Multiply/Subtract Operation Method and PTL Circuit Design Methodology (승/감산 연산방법의 개선 및 PTL회로설계 기법을 이용한 저전력 MAC의 구현)

  • Sim, Gi-Hak;O, Ik-Gyun;Hong, Sang-Min;Yu, Beom-Seon;Lee, Gi-Yeong;Jo, Tae-Won
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.37 no.4
    • /
    • pp.60-70
    • /
    • 2000
  • An 8$\times$8+20-bit MAC is designed with low power design methodologies at each of the system design levels. At algorithm level, a new method for multipl $y_tract operation is proposed, and it saves the transistor counts over conventional methods in hardware realization. A new Booth selector circuit using NMOS pass-transistor logic is also proposed at circuit level. It is superior to other circuits designed by CMOS in power-delay-product. And at architecture level, we adopted an ELM adder that is known to be the most efficient in power consumption, operating frequency, area and design regularity as the final adder. For registers, dynamic CMOS single-edge triggered flip-flops are used because they need less transistors per bit. To increase the operating frequency 2-stage pipeline architecture is adopted, and fast 4:2 compressors are applied in Wallace tree block. As a simulation result, the designed MAC in 0.6${\mu}{\textrm}{m}$ 1-poly 3-metal CMOS process is operated at 200MHz, 3.3V and consumed 35㎽ of power in multiply operation, and operated at 100MHz consuming 29㎽ in MAC operations, respectively.ly.

  • PDF

SIMD MAC Unit Design for Multimedia Data Processing (멀티미디어 데이터 처리에 적합한 SIMD MAC 연산기의 설계)

  • Hong, In-Pyo;Jeong, Woo-Kyong;Jeong Jae-Won;Lee Yong-Surk
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.38 no.12
    • /
    • pp.44-55
    • /
    • 2001
  • MAC(Multiply and ACcumulate) is the core operation of multimedia data processing. Because MAC units implemented on traditional DSP units or embedded processors have latency of three cycles and cannot operate on multiple data simultaneously, then, performances are seriously limited. Many high end general purpose microprocessors have SIMD MAC unit as a functional unit. But these high end MAC units must support pipeline structure for various operation modes and high clock frequency, which makes control logic complex and increases chip area. In this paper, a 64bit SIMD MAC unit for embedded processors is designed. It is implemented to have a latency of one clock cycle to remove pipeline control logics and a minimal area overhead for SIMD support is added to existing Booth multipliers.

  • PDF

New Pipeline Architecture for Low Power FIR Filter (저전력 FIR 필터를 위한 새로운 파이프라인 아키텍쳐)

  • Paik, Woo-Hyun;Ki, Hoon-Jae;Yoo, Jang-Sik;Lee, Sang-Won;Kim, Soo-Won
    • Journal of the Korean Institute of Telematics and Electronics D
    • /
    • v.36D no.1
    • /
    • pp.63-73
    • /
    • 1999
  • This paper presents new pipeline architecure for low power and high speed digital FIR filters. The proposed architecture based on retiming technique achieves enhancement on speed by sharing the input delay stage with multiplication of input data and on power combined with supply voltage scaling down technique. An 8-tap digital FIR filter for PRML disk-drive read channels adopting the proposed pipeline architecture has been designed and fabricated with 0.8${\mu}m$ CMOS double metal process technology. Measured results show that the designed FIR filter operates to 192 MHz in average and dissipates 1.22 mW/MHz at 3.3.V power supply. As a result, the proposed architecture improves speed by about 16% and reduces power dissipation by about 23% when operating at the same throughput.

  • PDF

An Efficient 2D Discrete Wavelet Transform Filter Design Using Lattice Structure (Lattice 구조를 갖는 효율적인 2차원 이산 웨이블렛 변환 필터 설계)

  • Park, Tae-Geun;Jeong, Seon-Gyeong
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.39 no.6
    • /
    • pp.59-68
    • /
    • 2002
  • In this paper, we design the two-dimensional Discrete Wavelet Transform (2D DWT) filter that is widely used in various applications such as image compression because it has no blocking effects and relatively high compression rate. The filter that we used here is two-channel four-taps QMF(Quadrature Mirror Filter) Lattice filter with PR (Perfect Reconstruction) property. The proposed DWT architecture, with two consecutive inputs shows an efficient performance with a minimum of such hardware resources as multipliers, adders, and registers due to a simple scheduling. The proposed architecture was verified by the RTL simulation, and utilizes the hardware 100%. Our architecture shows a relatively high performance with a minimum hardware when compared with other approaches. An efficient memory mapping and address generation techniques are introduced and the fixed-point arithmetic analysis for minimizing the PSNR degradation due to quantization is discussed.

On Designing 4-way Superscalar Digital Signal Processor Core (4-way 수퍼 스칼라 디지털 시그널 프로세서 코어 설계)

  • 김준석;유선국;박성욱;정남훈;고우석;이근섭;윤대희
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.23 no.6
    • /
    • pp.1409-1418
    • /
    • 1998
  • The recent audio CODEC(Coding/Decoding) algorithms are complex of several coding techniques, and can be divided into DSP tasks, controller tasks and mixed tasks. The traditional DSP processor has been designed for fast processing of DSP tasks only, but not for controller and mixed tasks. This paper presents a new architecture that achieves high throughput on both controller and mixed tasks of such algorithms while maintaining high performance for DSP tasks. The proposed processor, YSP-3, operates four algorithms while maintaining high performance for DSP tasks. The proposed processor, YSP-3, operates functional units (Multiplier, two ALUs, Load/Store Unit) in parallel via 4-issue super-scalar instruction structure. The performance evaluation of YSP-3 has been done through the implementation of the several DSP algorithms and the part of the AC-3 decoding algorithms.

  • PDF

A Neural Network Design using Pulsewidth-Modulation (PWM) Technique (펄스폭변조 기법을 이용한 신경망회로 설계)

  • 전응련;전흥우;송성해;정금섭
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.6 no.1
    • /
    • pp.14-24
    • /
    • 2002
  • In this paper, a design of the pulsewidth-modulation(PWM) neural network with both retrieving and learning function is proposed. In the designed PWM neural system, the input and output signals of the neural network are represented by PWM signals. In neural network, the multiplication is one of the most commonly used operations. The multiplication and summation functions are realized by using the PWM technique and simple mixed-mode circuits. Thus, the designed neural network only occupies the small chip area. By applying some circuit design techniques to reduce the nonideal effects, the designed circuits have good linearity and large dynamic range. Moreover, the delta learning rule can easily be realized. To demonstrate the learning capability of the realized PWM neural network, the delta learning nile is realized. The circuit with one neuron, three synapses, and the associated learning circuits has been designed. The HSPICE simulation results on the two learning examples on AND function and OR function have successfully verified the function correctness and performance of the designed neural network.

A Study about Time-sharing Method in ADC Sampling for Analysis of Breeding Pig's Feeding (모돈 섭식 분석을 위한 ADC 샘플링 시분할 방법 연구)

  • Cho, Jinho;Oh, Jong-woo;Cho, Yongjin;Lee, DongHoon
    • Proceedings of the Korean Society for Agricultural Machinery Conference
    • /
    • 2017.04a
    • /
    • pp.164-164
    • /
    • 2017
  • 스마트 돈사 환경의 복지 및 생산성 향상을 위하여 정량 분석법을 기반으로 한 모돈 관리의 중요성이 증가하고 있다. 모돈은 교배, 임신, 분만, 포유, 이유를 순환적 반복하여 이루어지는데 모돈의 관리는 돈사 농장의 생산성 및 경제성과 직결된다. 모돈 관리에 필요한 환경 및 계측정보를 획득하고 이 정보로부터 모돈의 개체관리를 극대화시키고 최적의 방안을 찾고자 지속적으로 계측이 가능한 모돈의 돈사 모니터링 시스템이 필요하다. 모돈의 행동특성 계측이 가능한 시스템이 필요한 이유는 모돈의 행동 특성(섭식 및 지제불량 등)에 상응하는 대사 불량, 질병 및 발정 징후 등을 조기에 발견할 수 있기 때문이다. 돈사 내에서 정지 상태로 판별이 되는 모돈의 지제상태(기립상태, 누운 상태, 앉은 상태)와 다르게 연속적인 움직임으로부터 판별되는 모돈의 섭식상태를 분석하기 위해서는 계측 시스템과 이를 분석해주는 시스템간의 시간적 차이를 최소화 할 수 있는 실시간 신호 처리 기술이 필수적이다. 모돈의 섭식을 정량적으로 지수화하기 위한 센서의 최소 SPS(sample per second)는 600 Hz($100Hz{\times}6$개)로서 최소 6개 ADC 채널과 최소 1,200 Hz 이상으로 샘플링 할 수 있는 마이크로 컨트롤러가 필요하다. 또한 16 비트의 분해능으로 1분 동안 연속 계측을 수행할 경우 필요한 정보량은 153,600 KByte ($1,200sample/s{\times}16bit/sample{\times}8Byte/bit$)으로 실시간 처리를 수행하기에 매우 큰 정보량이라 판단할 수 있다. 수행하고자 하는 정보처리 기법에 따라 다소 상이할 수 있으나, 1분을 주기로 모돈의 섭식 분석을 수행하고자 할 경우 최도 150 MByte의 정보량을 처리하기 위한 최소의 클럭수는 단순 대입의 경우 2.5 Mhz (clock/second) ($=1clock/Byte{\times}150MByte/60seconds$) 이며 덧셈(4 clock)의 경우 10 Mhz, 곱셈(16 clock)의 경우 40 Mhz의 클럭이 필요하다. 또한 정보의 저장 및 도시를 위해 필요한 부가적인 회로(LCD, SD메모리) 구동을 위해 필요한 클럭을 고려할 경우 추가적인 클럭이 필요하다. 이를 종합적으로 고려하여 120 Mhz ($= 40Mhz{\times}3$) 이상의 클럭이 필요하다고 판단할 수 있다. 또한 센서 계측 주기의 시간 분해능을 균등하게 유지하기 위해선 계측->도시->저장의 과정을 교차적으로 수행해야 한다. 이러한 과정을 거처 최종적으로 선정한 마이크로 프로세서는 ARM Cortex-M4이며 168 MHz로 연산 수행이 가능하여 목표하고자 하는 신호처리를 수행 할 수 있다. 현장 예비 실험을 통해 기대 성능을 만족하였으며, 시간 복잡도가 높은 연산을 대비하여 최적 시분할 스케쥴링 기법에 대한 보완이 필요하다고 판단되었다.

  • PDF

Design of a computationally efficient frame synchronization scheme for wireless LAN systems (무선랜 시스템을 위한 계산이 간단한 초기 동기부 설계)

  • Cho, Jun-Beom;Lee, Jong-Hyup;Han, Jin_Woo;You, Yeon-Sang;Oh, Hyok-Jun
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.49 no.12
    • /
    • pp.64-72
    • /
    • 2012
  • Synchronization including timing recovery, frequency offset compensation, and frame synchronization is most important signal processing block in all wireless/wired communication systems. In most communication systems, synchronization schemes based on training sequences or preambles are used. WLAN standards of 802.11a/g/n released by IEEE are based on OFDM systems. OFDM systems are known to be much more sensitive to frequency and timing synchronization errors than single carrier systems. A loss of orthogonality between the multiplexed subcarriers can result in severe performance degradations. The starting position of the frame and the beginning of the symbol and training symbol can be estimated using correlation methods. Correlation processing functionality is usually complex because of large number of multipliers in implementation especially when the reference signal is non-binary. In this paper, a simple correlation based synchronization scheme is proposed for IEEE 802.11a/g/n systems. Existing property of a periodicity in the training symbols are exploited. Simulation and implementation results show that the proposed method has much smaller complexity without any performance degradation than the existing schemes.